Saturday, November 24, 2012

The Death of an Advanced Statistic: Plus Minus Injected with Dirty Box Score Stats

Recently, the highly respected adjusted plus-minus model by Jeremias Engelmann was tossed out in favor of a hybrid method that uses not just plus-minus but conventional box score stats too (along with height and as the author has indicated more non-conventional stats like charges in the future). You can see the new stats and layout here, but the old stats are lost in the internet's foggy past.

For decades, quantifying individual basketball players was limited to the simple box score, and for most of that duration the focus was mainly on the major counting stats of points, rebounds, and assists where the secondary stats were blocks, steals, and field-goal percentage (technically a derived stat from field goal attempts and makes). There are also tertiary stats: free throws including percentage, three pointers, offensive and defensive rebounds on their own, fouls, minutes, and turnovers. What's obvious, however, is how arbitrary some of the stats are. No reasonable person would conclude you can fully account for individual defense from rebounds, blocks, steals, and fouls (though some do anyway.) And why don't passes to lead to a player drawing a shooting foul count as assists? There's also a family of pseudo box score stats, ones that aren't included in traditional box scores: charges, hockey assists, and-1's, dunks made, shots challenged, etc.

There are obviously problems only looking at the box score, and the thinking of the basketball community was dominated by the primary stats, especially scoring, and to lesser extents the secondary and tertiary stats. Having the greatest amount of an arbitrary stat was tantamount to undisputed greatest. Your name was etched forever by collecting the most assists in a specified time period, and that was that. You could have a fluke game or perform in the luckiest circumstances, grabbing rebounds against an undersized player with a hurt ankle, but it didn't matter as long as you hit a nice round number. 

The basketball statistics revolution took these numbers further, delving into them and creating complicated outputs like the all in one measures of PER, Win Shares, and Wins Produced. There are also clever but intuitive measures like true-shooting percentage, which is like field-goal percentage but adjusted for free throws and three pointers, and rebounding percentage, which is how many rebounds you grab out of how many there were available and is simpler to understand than total rebounds with no bias for high pace games or opponents who keep missing, creating more opportunities. But those complicated stats are still dependent on the box score, and as such they have blind spots. There are guys who are great at collecting stats but don't seem to help their team win and their weaknesses aren't recorded by the box score like Jose Calderon or Carlos Boozer. Then there are guys who don't really tally up big numbers but appear to be very valuable like Shane Battier and Nick Collison. Coaches and their greatest cheerleaders of conventional wisdom, most NBA "analysts", complained that people focused too much on stats and lost the important details, the nitty-gritty like a good screen, boxing out, proper double teaming, or the effect of floor spacing. Overall, box score models worked pretty well and aligned with previous thinking because they relied on what people were focusing on anyway, but the coaches were right. The game is more nuanced.

Plus minus is simple. It's about how your team scores and gives up points when you're on the court versus off the court. It's a direct measure of what people really want to know and it bypasses the middle-men of the box score stats: winning. The NBA game is a billion-dollar business, but it all reduces to a score at the end of a game. You can estimate who accounts for a win by who takes the shots, but this is imprecise and is like measuring the amount of rainfall by how many clouds are in the sky. Plus minus focuses on the only thing that matters: outscoring your opponent.

There are obviously problems with plus minus. The first is who else is on the court. If you're a fifth Beatle in a great starting lineup, you'll seem better than you are, while feasting against weak competition in garbage time doesn't necessitate greatness. But with modern computers an adjustment is possible. This is the typical adjusted plus minus model. The second problem is harder to get around: some players tend to only play with certain players, and in the case of a starting center and a backup center if they only replace each other it's impossible to estimate their own values. As a result, the plus minus numbers even when adjusted are noisy with strange answers and larger numbers. A technique to reduce the variation and the extreme values is ridge regression, where instead of just minimizing the sum of the squared errors the coefficients are also minimized. (The coefficients are basically the player ratings or an adjusted plus minus.) Engelmann's model, using ridge regression, became popular for its intriguing and reasonable results. It was the latest generation in basketball statistics.

However, that's been replaced by a method now called xRAPM that uses both adjusted plus minus and a box score metric. With more information prediction can obviously be improved, but the problem is this new way to look at the game is discarded for what people have been essentially doing for decades. It's not completely rejected because it's a hybrid model, but the basketball community already has enough statistics based on the box score. The new model isn't perfect, and the same flaws exist as in other box score metrics like Al Jefferson being a positive on defense because he grabs rebounds and picks up blocks but forgets to defend. Adjusted plus minus is a new perspective, and now the most publicly available method is gone.

There's also an issue in how models are evaluated. As I've written before with Wins Produced, the problem with testing individual player models by the outcome of a team event is that you're somewhat arbitrarily assigning a value to a discrete event like a rebound. You're also falling into the trap of accidentally testing the efficacy of offensive and defensive efficiency, which is already known: the points you score per possession, and the number of your possessions along with the opponents totals, will predict wins with 95% accuracy over a season. You can calculate your offensive and defensive efficiency with box score stats, so creating a model with box score stats to predict wins is in effect building a model for team efficiency. But that's the problem: it's team efficiency. Who gets the credit for a rebound? The player who defends the star player well, causing a miss, or the player rebounding? How much credit should each have? Obviously, there is no concrete answer, and it depends on the context of how the shot was taken and where the rebound was.

Players who are good at collecting box score stats have been generally overrated for years. This is compounded by the fact that a player without a team is useless and a fit or a role is just as important as who the player is. Teams know this, and the smart ones at least won't let a bunch of stat-grabbers on the court at the same time without a, say, competent defender who doesn't call for the ball each possession. Without caring about team synergy you get the Washington Wizards of the past couple seasons.

One method is test the validity of a model is to see how a team does when a player is removed for various reasons from injuries to a trade. If those rebounds can't be grabbed by someone else, you'll be able to see it; if the scoring is indeed beneficial the offense should suffer; and if a glue guy is actually important structural glue the numbers should bear it out.

Fortunately, sciences work with incremental improvements and fits and starts. Someone could fill in the void with a different model. The data are available. A new technique could shed light on notoriously hard to track players like rookies, fringe players with little minutes, and the outliers. I've already looked at Jose Calderon breaks the Wins Produced model, and I'm planning a companion piece not with one player but the now common slightly undersized power forwards who can score and pick up rebounds but offer little resistance on defense and rely on put-backs for their high-percentage plays. Perhaps this new xRAPM method needs to be tested with its weaknesses located and battered mercilessly. It's the only way toward growth.


  1. I think the most important factor in a model is its predictive power. If someone added player nose size into the old RAPM model, and this somehow produced a substantial increase in the model's ability to predict future outcomes, then I would be praising the one who has creative enough to test this out, not complaining about how we had lost out on the purity of plus minus. Likewise, the box score measures were not added on a whim; I believe they help improve the model's predictive power. As such, it doesn't make sense for one to call the new additions a weakness just because he or she doesn't like them.

    1. Here's the problem in a nutshell:
      For decades the box score has dominated the perception of NBA players, and to date there are many box score metrics for evaluating players. Really my problem is that we don't need another metric influenced by simple box score stats. We have enough of that and generations persuaded by that kind of analysis.

    2. Sure the current box score metrics don't work terrifically well, but that doesn't mean the box score isn't a source of potentially valuable information. Using the box score just for the sake of using the box score would be a problem, sure, but to stay away from it just for the sake of staying away from it is just as bad. RAPM wasn't perfect and xRAPM isn't perfect either, but what's wrong with trying to improve a metric? Isn't this the same reason the WoW crows is often seen as arrogant?

    3. You did not understand my point or you chose to ignore it.

      We already have lots and lots and lots of metrics that use the box score, and we have generations influenced by the box score measuring a player's value. It was good to have something completely distinct from that.

      That was the point.