Sunday, October 7, 2012

Player Value and Fit: Deconstructing Advanced Plus/Minus

Numbers and basketball have progressed to the point where advanced statistical techniques are publicly available for years of seasons of hundreds of players. There's enough intellectual heft behind the movement where one cannot simply excuse advanced stats for being limited representations of the court; they are comprehensive and there's a stat for nearly everything. The problem, however, is the hidden complexity of the game, which is virtually impossible to cover with a limited model.

Let's consider the most widely respected "all in one number" stat: the regularized +/- statistics found here where the difference in how a team scores when a player is on or off the court is used to construct player "values" with a complex regularized, or ridge, regression technique, which basically just means extreme values are penalized. The important point is that every player is given a number for each year corresponding to value, and this is true of past +/- methods. What many critics incorrectly assume is that basketball stat-nerds are mining mere box score statistics like points or blocks to assess genuine player worth, missing the small things you can do to help your team win that aren't easily captured by a number, when in actuality +/- is looking at the overall impact -- this includes hockey assists, setting picks, man-to-man defense, spacing the floor, etc. What's they're looking for is a Theory of Everything for players.

There is a major violation, however, in the +/- field that is rarely discussed: players having stable, certain numbers for their contribution or value. For example, according to regularized +/-, LeBron James in 2010 was a +9.8, which is supposed to objectively state his value in a single number signifying this is his effect when he's on the court. (Note: the 9.8 doesn't mean 9.8 points because the regression technique doesn't output a number in a traditional unit, and I believe it's the highest value in their year-to-year studies.) This type of metric is pretty intuitive to use. If player X is +5.2 and player Y is +2.1, you can state (ignoring standard deviations) X is better than Y. The data are segmented by year, and sometimes they're grouped together into a, say, five-year study, but generally players are assigned specific numbers. The thinking is, LeBron James in 2010 had a measurable and logically quantifiable effect on the court you can compare to others; but there's a fatal assumption.

A player's value can't be contained by a single number because role or fit has a substantial effect on impact where a player on one team (or lineup) could be worth more than on another.

Going back to LeBron, after a historic 2010 in terms of +/- (and his regular season stats) he crashed to a "mere" +6.4 with a decrease of 3.5 units where only 21 players in the league were even +3.5 for the season. This is partly surprising because LeBron was at an age when most NBA players are still improving, especially in terms of +/-. Of course, this was during his move from Cleveland to Miami, joining two other all-stars and, most importantly, another wing superstar. Wade also lost a bit of his "value" from 2010 to 2011 -- from 8.4, one of the highest recorded in the ten years data are available, to +5.3, meaning he had a similar decrease of 3.1. One of the criticisms of the LeBron-Wade pairing buried by the public onslaught of the "decision" was that it was a bad fit between two players with overlapping skillsets, which are tricky to navigate since there's only one ball on the court. They're dominant wings who need the ball in their hands to shoot at a rate rivaled by only a few and handle the ball as much as, or more than, a point guard through their penetration and passing. Even after he adopted a post game and summoned all his skills to win a title and an MVP (and a gold medal) at the ripe age of 27, normally the peak year for a pro, his adjusted +/- was still better in 2010.

There's a third member, however, of the Miami Big Three: Chris Bosh. His offensive value plummeted like the others, but his defense picked up the slack, perhaps showing his adaptability to a reduced role where he devoted more of his resources and energy to other aspects of his games. This is another complication to assessing value. Not only is context important, but skills and players interact in fluid ways as injuries pile on or new tricks are learned over time. While most +/- models assume a player is a fixed value throughout an entire year, some simply aren't the same at the beginning of the year compared to the end; that's pretty normal.

Here's another angle for the issue. Imagine a team of 11 Shaq's .. and one confused Steve Blake. Shaq is an imposing figure, but you don't need 11 of him, especially when it means he's the backup point guard. He's a limited player who only occupies the paint and would obviously have trouble bringing the ball up the court. When Steve plays, however, there are drastic differences because he's the only thing resembling a point guard, the only perimeter player, and the only three-point shooter. Normally, Blake would have little value, but here his effect on the court should be substantial. This goes beyond the bad backup effect, however. Backup point guard Shaq has substantial negative value, even though we know from NBA history he's a tremendous force. Obviously, there's a paradox here -- Shaq is a hall-of-fame behemoth, not a detriment to a team. In order to solve this hypothetical dilemma  you have to assign role and context to a player's value: Shaq as a center and focal point of the offense is a legend; Shaq as a backup point guard is hilarious at an all-star game but ultimately damaging. This is not a pointless exercise in abstraction; a player being forced into a role not suiting his skills is pretty common though not as extreme.

(This brings up some weird philosophical basketball issues on accident: If you have 11 Shaq's, are they any different? Is one assigned to backup point or do they rotate? But if they rotate, can you really tell the difference?)

Another great example -- and this one actually happened; 11 Shaq's and Blake only happened in a nightmare -- is that of a Steve Nash teammate: Boris Diaw. If you glance at his graph for adjusted +/-, you'll immediately notice an odd spike in 2006. Well, it's not totally odd -- it was his first year with the Phoenix Suns, but it was also the year he was completely unleashed. Diaw is a 6' 8" passing savant who can drive to the basket pretty well and is a strangely fantastic defender. Given his height, he's normally cast as a small forward, but he worked pretty well as the backup point guard and when Amare got injured he was slotted as the center, where the result was a career year and the Suns hitting 54 wins and getting to the conference finals despite losing their leading scorer in Stoudemire. Diaw, who's a poor outside shooter and a bit hefty and slow to guard the perimeter, performs better as a frontcourt player where he suddenly has a foot speed advantage and his girth and strength are rightly utilized. After Stoudemire came back Diaw's numbers plummeted and he was cast to basketball's hinterlands, finding himself on arguably the worst team of all-time losing his starting job before being traded to the top team in the west, the Spurs, finding more playing time there, paradoxically, as they used him in a correct manner, bringing his career to a complete circle.

As more and more basketball research is published and scrutinized, we can't lose sight of the players behind the numbers. This is not a plea for ignorance; this is a careful consideration of the bevy of assumptions made when forming statistical measures. How player value interacts with context and fit, how there are in actuality a continuous, infinite set of players based on their ever-changing situations, cannot be ignored. There are some new studies out there about team synergy and roles, but they are not enough. (And I'm specifically thinking of one such study where a researcher assigned a number of categories, like seven to ten, to players over a long-term study that was maybe +/- based, but I can't find it.) Five-man unit +/- scores should be as closely evaluated as individual measures. As we march into the uncertain future armed with tools of certainty, let us not lose sight of our ability to mentally adjust, to use our higher-order thinking, our real gifts from Heaven and Earth, to try to comprehend the beautiful chemistry dancing on the court, ever elusive from real understanding.

1 comment:

  1. "A player's value can't be contained by a single number because role or fit has a substantial effect on impact where a player on one team (or lineup) could be worth more than on another."

    Precisely. I have a problem with metrics which assign a single numerical statistic to gauge a player's "value" (e.g. PER, Wins Produced, Win Shares, even +/-) based no what we have at the moment. The value of a player is defined by his role for the team, hence players with different roles have different forms of contributions for their team and assigning each player a uniformly-derived value with which the players are then compared against each other seems like a wrong concept.

    For instance, Reggie Evans' role is practically to rebound, and he's very good at it. In contrast, Jose Calderon's (a player you don't like, but I do) role is to facilitate his team's offense (pass and put points up on the board occasionally). Would it then be justifiable to compare these two by conjuring a uniform metric to quantify their individual contributions? Theoretically, that shouldn't be the case. It's like the apples and oranges argument.

    What I'm arriving at is that advanced stats should be instead used to assess the contributions of players separately, not through the use of a single aggregate number. Hence, I prefer metrics like Reb%, Ast%, Ast/TO ratio, which focus on the separate contributions that players make rather than mixing them all together into one number just so it would be much easier to compare player A with player B. I think that this specific-contribution approach provides more insight to player evaluation -- saying player A is a much better rebounder (or passer, or shooter) than player B would make more sense than saying player A has higher PER than player B, hence he's automatically better. Even +/-, which is not based on individual stats like PER or WP, doesn't avoid this dilemma, because it gives basketball decision makers (i.e. GMs, coaches, etc., the end users of advanced stats) with absolutely nothing to go by other than a single number.