Today's followup article considered defense and gambling. My issue, however, was more with a residual analysis. Basically, the model output you usually see is the estimation of the average marginal value of a stat. But the value of a steal changes from player to player. There are players who are great at "picking your pocket" without sacrificing much on help defense like John Stockton, Blaylock, and Chris Paul. They have supremely fast hands, and get less of their steals from jumping passing lanes and going out of position on defense. By contrast, there are serial gamblers like DeMarcus Cousins, Westbrook, and Ellis who are not great defenders despite their gaudy steal totals. Cousins reaches too often instead of protecting the rim, and Westbrook and Ellis are too aggressive and can blow defensive schemes. Asserting that a player is good or great, as was done in the original article with Rubio, whose shooting has derailed the season, is dangerous. Not all steals are the same, and players collect them differently. It's an era with a wealth of video and stats, and we don't need to go back to this brand of basic analysis.
The benefits of the steal appear to outweigh the costs, on average, and this is nothing new. We do quibble on the exact value, and the now infamous article asserts that it's worth nine times as much as points. Continuing in backwards fashion, let's start with the end of the article concerning irreplacibility. This has been done before with a different method by his fellow FiveThirtyEight writer Neil Paine, finding that steals are average in responding to a change in role and a team's own steal rate also changes at an average rate. Though if the numbers in the FiveThirtyEight article are correct, that steals are almost perfectly irreplaceable, they're not entirely useful: a stat being irreplaceable doesn't necessarily mean it's valuable. DeAndre Jordan's FT% shooting is not replaceable in the sense that so few can match his awfulness, as only 1 other players in NBA history with at least 500 attempts have shot under 50%. After all, even if you can't replace the steal, it's not a huge loss since there's only a small correlation between team success and steals.
The body of the article concerned an empirical value of steals, and it opens by taking a swipe at the popular metric PER:
"The most famous attempt is John Hollinger’s player efficiency rating, which ostensibly includes steals in its calculation but values them about as much as two-point baskets."
Hollinger's metric is ancient in the NBA statistics movement, and it wasn't tested to provide the most powerful results. That's nothing new. The original model in the FiveThirtyEight piece appears to have used a regression model with points, rebounds, assists, blocks, and steals where the dependent variable (what they're trying to predict) is the effect on a team's point differential when a player is absent. It's termed "With-or-without you." This is actually even more basic than PER, which corrects for league averages, pace, and includes more stats, and without testing we don't know which one is more useful -- the article criticizes Hollinger for guessing statistic coefficients, but we don't see the predictive ability of "with-or-without you."
However, what's problematic is that we know little of what his data was actually like. I've actually used the same method before. In fact, my findings indicate that Rondo was overrated, even though he's a high steal rate player. I know that method's usefulness, but I also know how finicky and prone to random errors it is. It's essentially like raw plus/minus, probably a little better though, and is not the most robust method for testing the power of box score stats. I'm all for alternative means of testing from all angles, but if we have no way of testing and verifying the findings it's not practical and enlightening research.
As I had hoped from Nate Silver's website, I expected more transparency and, basically, more statistics -- we should see the results from the regression in the liner notes at the very least, to read over p-values, adjusted R's, F-statistics, degrees of freedom, etc. One important question I have is, what exactly do the data consist of? The article only devotes a single sentence to this issue:
"For this article, I’m using team game “with and without you” (WOWY) comparisons from all player seasons from 1986 to 2011 where a player missed and played at least 20 games."
Which players? Every player? What about the players who sometimes miss games because they're not very good and not needed? Was the line drawn at starters? How is it controlled for when a player misses the same set of games as another, or vice versa: a player goes down only when the superstar played that season?
The crux of the article, however, is on the "marginal" value of steals versus points. This is a pretty common tactic in modern stat-speak now, particularly with the Wins Produced clan and their "yay points!" thesis: deride the common folk for focusing too much on points per game. But with my background in engineering, I am hyper-vigilant when it comes to unit analysis. When I saw how steals were compared with points, I immediately saw a disconnect in juxtaposing steals and points. A player who averages one more steal a game than the average player is not the same situation as a player who averages one more point per game. There's a preferred method of comparing coefficients (at least in many circles) -- elasticity, an economics term for how a change affects the outcome (in this case, a team's point differential.) Typically, a median or mean value for the independent variable is chosen, and it's then increased by percentage points to illustrate how the change dictates the dependent variable.
In other words, we don't compare dissimilar units and instead look at a more realistic range. For the mean value, I simply used the season totals for 2014 and converted them to per 36 minute rates. This is in one sense the "average" player, and without a clear understanding of which players he used in the study it will be sufficient. The average scoring rate was 15.0 points per 36 minutes and 1.1 steals. Those aren't bad estimates, as doubling the rate gives you what are near typical leader results: 30 points and 2.2 steals. What happens when you apply the given coefficients to a ten percent increase? Well, points actually look more valuable. A ten-percent increase in the mean value of points per 36 minutes causes a greater positive effect on a team's scoring margin, per Morris' coefficients, than a ten-percent increase in steals per 36 minutes.
|
Coefficient
|
Mean value
|
Elasticity change at 10%
|
Points
|
1
|
15.0
|
1.50
|
Rebounds
|
1.7
|
6.4
|
1.08
|
Assists
|
2.2
|
3.3
|
0.72
|
Turnovers
|
5.4
|
2.2
|
1.18
|
Blocks
|
6.1
|
0.7
|
0.43
|
Steals
|
9.1
|
1.1
|
1.04
|
Essentially, the main reason steals looked so valuable in the original article is that they are rarer, and there was a misapplication of unit analysis in comparing stats. Now, I'm not saying steals are unimportant and we should bow down to points. There are a lot of different effects interacting here, like how steals are an indication of awareness and athleticism. Morris cited a statistical plus/minus model for the veracity of steals, but that model and other similar ones have found the product of the usage (how often you shoot) and assist rate to be significant and important. That entangled with points to an extent, for just one example. But a lot of great players, like Olajuwon and David Robinson, have high steal rates, and it's true of some underrated players like Millsap and Thaddeus Young. There are certainly good reasons in valuing steals, but asserting that they are nine times valuable as points is misleading.
No comments:
Post a Comment