Background
If one ignores this effect, then one is implying that players in a lineup with others who don't shoot often will not suffer from any efficiency losses  that "shot creators" in basketball are not needed because everyone can create shots and that all shots are the same. Thinking about basketball from a practical perspective, if you send out a team composed of guys who have the offensive creation skills of Reggie Evans, you shouldn't expect good results if the guys are efficient in most other lineups. Your offensive game can subsist on putbacks and wideopen layups from defensive breakdowns mostly caused by double teams, and you can have a healthy shooting percentage. But a player pressed into a greater role can no longer rely on that environment. Also, because of the shot clock invention, at some point a shot will need to be taken even if it's contested in a broken offense, and it is far better for Carmelo Anthony to take that shot than Kenyon Martin despite Martin's advantage in TS%. To argue otherwise is a misunderstanding of basketball.
As outlined by Eli in his excellent article, this research works by comparing the expected and actual offensive efficiency of each lineup with respect to usage. The usage of a lineup is found by taking the average of each player season usage (via basketballreference.) Defined here, the average player usage is 20, but some lineups will have a higher average usage (think Carmelo and JR Smith on the court at the same time) than others (whenever Carmelo and JR Smith are on the bench while Kidd and Prigioni are playing.) For the lineup data I used basketballvalue's publicly accessible matchup data from seasons 2008 to 2012, finding every possible lineup and totaling their pertinent statistics.
The expected offensive efficiency is taking the weighted average of basketballreference's offensive rating and usage (it's a weighted average so Carmelo has a greater effect on the lineup expected offensive efficiency because he's going to take more of the shots.) I also included a simpler and slightly different metric (simple offensive efficiency) because basketballreference's is a bit of a black box and it incorporates offensive rebounds, which aren't exactly the point  this simple measure is points / (FGA + .44*FTA + TO). For actual offensive efficiency, I used points per possession from the lineup data, while for the simple offensive efficiency I had to change it to points / (possessions + offensive rebounds.) As a sidenote, this is actually an interesting diversion  shouldn't we ignore offensive rebounds in the tradeoff between usage and efficiency? Or are they a part of the discussion?
Results
This research uses five regular seasons with 52,900 lineups with usable possessions (lineups in separate seasons are counted as unique.) with 1,073,818 total possessions and 601,431 in lineups with at least 50 possessions. Many of these lineups were rarely used  end of quarter threepoint bomber lineups, end of season tanking rookie lineups, etc.  but there were still 3882 lineups with at least 50 possessions and 1683 with 100 possessions. The most efficient lineup with at least 50 possessions was Jason Kidd, Jason Terry, Josh Howard, Brandon Bass, and Dirk Nowitzki with 1.63 points per possession, along with Carlos Arroyo, Maurice Evans, Turkoglu, Brian Cook, and Dwight Howard close behind  both in 2008. The most efficient lineup with 100 possessions with 1.49 points per possession Jameer Nelson, JJ Redick, Matt Barnes, Ryan Anderson, and Dwight Howard in 2010. Least efficient? With at least 50 possessions it's Brandon Jennings, Michael Redd, Carlos Delfino, Ersan Ilyasova, and Andrew Bogut at 0.59 points per possession, and with at least 100 possessions it's CJ Watson, Kyle Korver, Luol Deng, Carlos Boozer, and Joakim Noah at 0.70 points per possession.
Below is a graph of all lineups above 50 total possessions with the average usage of the lineup on the xaxis and the difference between the actual offensive efficiency (points per possession from the lineup data) versus the expected offensive efficiency (from season totals of offensive efficiency of each player.) The dots are semitransparent so it's not one huge blob, and wherever it's darker you know that's where more observations are. A positive differential (yaxis) means the lineup is more efficient than expected. There are two main comments that come to mind immediately: one, there is a clear but slight trend where a lineup is more efficient as usage increases; and two, there is a huge amount of variation in offensive efficiency.
One method to smooth out the results is to pool the data together into bins. Eli did this with his graph, and it's pretty straightforward  all lineups between 15.5 and 16.5 are combined to form the 16 usage % datum, for example. This has the advantage of using all the low possessions lineups that can't be used in a basic regression or a larger graph because there would be too much noise.
Below is said graph. The colors denote the total possessions of all observations for a given usage range. The colors work in terms of breaks: the lightest one0 will appear for any number of possessions between 0 and 15,000, for example. I included color (or, well, shade) as a third dimension to explain why the dots exhibit wackier behavior at the endpoints. The middle sets of points all have over 100,000 possessions, while at 15 usage % there were only 6260 possessions. While it's unclear at the extremes, there appears to be a positive linear relationship with the average usage of a lineup and offensive efficiency.
Regression
To explain offensive efficiency (predicted versus actual) regular old OLS regression was used with a few different tweaks thrown in. One important factor to explain all the noise seen in the first graph is homecourt advantage. A variable called "HCA" was created, ranging from 1 for a lineup only used in away games to 1 for a lineup only in home games. Another possible factor is the season, but for every iteration of the regression seasons were not a significant variable. Since for each year all the possessions were combined for each lineup, the regression is weighted by number of possessions. Note that each lineup is one separate dependent variable. Usage in the regression models is setup as player usage minus 20, so that the results center around zero (the average NBA usage is 20% because there are five players on the court since 100%/5 = 20%.)
For an interpretation of the results, you have to know the key variable well: OffRtgDiff has units in terms of points per possession, and as discussed previously it's the difference between the actual points per possession and the expected points per game. Each lineup in a specific year is its own observation for the regression. Going back to the first graph, there are a few lineups with an OffRtgDiff of +/0.4, but most observations are within +/0.2; and as for usage, the extreme is a usage of 16 or 24, but again most are between 18 and 22. Points per possession can also be translated into the offensive efficiency leaderboard you typically see: 1.12 points per possession (112 offensive rating) is what the Thunder had to lead the league, and 1.02 (102 offensive rating) is what Washington had to place last.
The first set of results include an intercept. Two minimum gates for total possessions were used. Homecourt advantage was significant in both models, although in the second it's not as strong. Note that the accepted value for homecourt advantage is roughly +3 points, and translating the coefficients to a full game confirms this: 0.015*2*92 possessions = 2.76 points or 0.0161*2*92 possessions = 2.96 points (HCA as a variable goes from 1 to 1, so a 2 is needed as a multiplier, and 92 possessions is roughly the league average for a full game.) One can also see how small the R^2 measure is. Only a small amount of the variation is explained by these variables.
OffRtgDiff = HCA + Usage, 50 possessions

OffRtgDiff = HCA + Usage, 100 possessions


Coefficient

St. error

pvalue

Coefficient

St. error

pvalue


Intercept

0.00259

0.00148

0.0800

0.00588

0.00178

0.000997

HCA

0.0150

0.00466

0.00127

0.0161

0.00671

0.0162

Usage

0.00895

0.00105

< 2e16

0.00904

0.00132

1.17e11

Adj. R^2

0.01996

0.02848


Standard error

1.119

1.126


Deg. of freedom

3954

1680

Usage, however, was a highly significant variable in both models where the coefficients were extremely similar at near 0.009. If I had used points per 100 possessions for a variable, this would mean the coefficients would translate to a near one to one relationship between average lineup usage and efficiency. That is a handy, easy to interpret result. For example, in 2010 Denver's most popular lineup was BillupsAfflaloAnthonyMartinNene. The average usage was 21.14 and the expected efficiency (weighted average of player usage and offensive rating) was 1.135 with 1033 possessions. Another one of their most used lineups was the same except Anthony, a 33.4 usage player and an offensive rating of 1.10, was replaced with Joey Graham, a 16.2 usage player with an offensive rating of 1.06. Because Joey Graham doesn't have a terrible offensive rating and the other options were efficient, the lineup's expected offensive efficiency barely decreased without Anthony (from 1.1352 to 1.1346.) However, the lineup's efficiency actually crashed down to 1.0346 points per possession. In applying the coefficient, the lineup's change in average usage (3.44) translates to a loss in efficiency of roughly 0.031 points per possession, which is in fact a huge change as it's the same difference between the 9th ranked offense in LAL and the 18th ranked offense in Atlanta. The drop in usage doesn't entirely cover the difference in the two lineup's efficiency since there's a lot of noise in this data, but it's illustrative of the effect of usage and efficiency.
Lineup

BillupsAfflaloAnthonyMartinNene

BillupsAfflaloGrahamMartinNene

Expected efficiency

1.1352

1.1348

Actual efficiency

1.0958

1.0346

Diff. efficiency

0.0394

0.1001

Average lineup usage

21.1

17.7

One might have noticed the placement of an intercept in the model without a clear need for one. The data should center around zero, although due to a few problems in reading the play by play data and how player offensive rating is calculated this is not actually true (but it's close.) Adding an intercept when it's not needed could be argued as model overfitting, so I repeated the regression without one. The coefficients are a little smaller, but part of that is because the average OffRtgDiff is a slightly negative, and not exactly zero. But every model coefficient is within a couple standard errors of each other and usually a lot closer, so the results appear to be reliable.
OffRtgDiff = HCA + Usage, 50 possessions

OffRtgDiff = HCA + Usage, 100 possessions


Coefficient

St. error

pvalue

Coefficient

St. error

pvalue


HCA

0.0147

0.00466

0.0016

0.0149

0.00672

0.0271

Usage

0.00856

0.00102

< 2e16

0.00765

0.00129

1.46e9

Adj. R^2

0.01921

0.02285


Standard error

1.120

1.129


Deg. of freedom

3955

1681

Another set of comparisons is with the simple offensive rating metric, which in fact treats an offensive rebound as a new possession so it's only looking at shooting percentages and turnovers. I wouldn't recommend these results for the rudimentary measures, but it's nonetheless important to see another set of results suggesting a highly significant relationship between usage and efficiency. The coefficients are smaller, and perhaps offensive rebounds interact with usage but that would warrant further examination.
SimOffRtgDiff = HCA + Usage, 50 possessions

SimOffRtgDiff = HCA + Usage, 100 possessions


Coefficient

St. error

pvalue

Coefficient

St. error

pvalue


Intercept

0.0110

0.00136

5.81e16

0.0127

0.00165

2.01e14

HCA

0.0120

0.00428

0.00490

0.0126

0.00621

0.0418

Usage

0.00629

0.000962

6.81e11

0.00602

0.00122

9.45e7

Adj. R^2

0.01204

0.01502


Standard error

1.028

1.041


Deg. of freedom

3954

1680

One last set of results, and then I swear I'm done. This was conducted on the data you can see in the second graph (the pooled lineups.) While there are only nine data points, there's an advantage that nearly every possession over five years is included (the ones that aren't are at the extremes.) The adjusted R^2 is finally respectable, as 0.9507 is actually outstanding, and the usage coefficients are within the same range as in the above tables. The standard errors are both quite small, suggesting a high degree of confidence in the coefficients given despite the nine observations.
OffRtgDiff = Usage, intercept

OffRtgDiff = Usage, no intercept


Coefficient

St. error

pvalue

Coefficient

St. error

pvalue


Intercept

0.00779

0.00126

0.000265

N/A

N/A

N/A

Usage

0.0103

0.000781

< 1.03e6

0.0105

0.00177

0.000212

Adj. R^2

0.9507

0.7756


Standard error

1.302

2.95


Deg. of freedom

8

9

Conclusions
For using these results, I would recommend using a coefficient of 0.01 for usage and points per possession (the coefficient would be 1 for points per 100 possessions) for simple estimates. For models or anything more sophisticated, something more conservative would probably be more appropriate (0.008 to 0.009, possibly.) One has to remember that the average lineup usage and efficiency were used. If a lineup with a usage of 20 adds a player with a usage of 30 (replacing a player with a usage of 20), this means the new average usage will be 22. Then assuming a coefficient of 0.01 the points per possession for the lineup will increase by 0.02 points. That is not a minuscule result: it's the difference between the offensive rating of the leagueleading (for 2013) Oklahoma City Thunder at 112.4 points per 100 possession and the fifthranked Nuggets at 110.4.
The results are similar to what was found by Eli. His usage variable was relative to 100 instead of 20 like mine so you have to translate the numbers, but his coefficient was roughly 0.0125 compared to mine around 0.009 to 0.01. However, he only looked at part of the 2008 season, while I had five complete ones.
Another point is to take this to the individual level: on average and roughly speaking, when a player decreases his usage by 1%, his efficiency increases roughly by 0.2 (in offensive rating), while increasing his usage by 1% decreases his efficiency by 0.2.
The relationship between usage and efficiency was assumed to be linear. Judging the graphs, this is an acceptable decision, but I would warn about the results at the extremes where few lineups were available to display precise trends. Lineups with an average usage roughly from 17 to 23 seem to be linear, but outside that range the results start to have nonlinear effects. That is probably due to the inherent noise in the data without an appropriate sample size, but this is not clear and it's reasonable to assume that at some point the linear relationship will break down. For another omission, the opposing lineup's defense is ignored. While this is an oversight, it is far more likely that lineups with a high average usage would face tougher defenses than vice versa. When I update the study or do refinements I'll try to incorporate this, although defensive estimates are either hilariously simple (only rebounds/steals/blocks) or not precise (+/ estimates, but regularized +/ is fairly reliable.)
An important point about how the study was setup is that usage is relative to a team. If you put LeBron, Wade, Anthony, and Kobe Bryant on the same team, then their respective usages will decrease and the average team usage will still be 20. Rather than looking at players changing teams and seeing the effects, this study is comparing offensive efficiency at a team level through lineups. One complication, however, is that trades within a season lead to players who collect usage stats in a different environment  for example, a player on the Bobcats will have a higher usage rate because no one else is shooting, and when he's traded he will appear to be more of a shot creator than he actually is. A further refinement would be to ignore lineups with traded players, use only the usage rate they had for their specific team, or to actually study the change in usage and efficiency due to a new team.
The dialogue will continue into the future, but it's hard to argue that a relationship between usage and efficiency is nonexistent after five seasons of data and thousands of lineups. The extent to which it's important and the actual mechanics can be further studied, however, but we should be careful in making such proclamations as "Tyson Chandler is better than Kevin Durant" just because Chandler has an otherworldly shooting percentage.
Furthermore, we have to be very careful with our terms before debating this topic. "Shot creator" strictly means a passer or, heck, a coach, and low usage doesn't imply low skilled players. Chandler, for example, is an astoundingly good at rolling to the rim and finishing, and he's great at the line for his size. What matters is the sum of a whole host of factors  putting pressure on the defense is the biggest one, with related factors like double teams created. Watch how a modern defense reacts to a star scorer, especially a good defense: they flood the strong side and perimeter guys leak off a few feet to help before scurrying back to their guy. Lots of threepoint shots are created after a double team causes a slow rotation to the corner three or a wing.
Also remember that not all shots are equal. Sometimes you'll get lucky and find an open man for a layup inside after a defensive breakdown, but there will be a large number of possessions where a less than ideal shot is needed to be taken like a midrange jump shot from 18 feet or a fallaway jumper from 10 on the baseline. Players who can take those shots with decent efficiency (relative to those situations) are indeed valuable, and what they do is reduce the number of possessions where, say, a low usage player has to hoist an awkward hook shot from outside the paint. Yes, there are lots of low usage players who are very efficient, but they take different shots than players like Westbrook. You cannot go an entire game with Tyson Chandler lobs or Jason Kidd three's. At some point other shots will need to be taken, and you need players who can take those shots. Obviously, you don't want someone hoisting awkward 20footers while ignoring wide open players, but that is not what this is about and that is not what happens in every high usage player.
To use the results of this study to state, "Monta Ellis increases his team's efficiency by jacking up terrible shots" is a fundamental misunderstanding of the research. First of all, it was already presented that there is a huge amount of variation in efficiency differential, so to state in absolute terms about a specific case is wrong. Secondly, the results refer to the average result. There are players who increase the lineup's usage and help the team even further, and there are players who increase the lineup's usage and hurt their team, even after adjusting for the offensive rating from the new player. Basketball is not a simple game where you apply a single number to each player and call it a day. The interactions are complex and form a dense, tangled set of unseen, incomprehensible machinations. A star understanding his role in distracting the defense, knowing when to take the shot, and when to pass to a low usage player is vitally important, a subtle knowledge few have mastered  LeBron James is one recent member; watch how he decides to attack or pass off to a teammate. At the other extreme, you can have selfish players who only shoot when open and pass the ball with three seconds on the shot clock to some unlucky soul who has an even worse shot. But those are subjective evaluations and need to be supported by numbers.
We could spin a web of wellformed arguments in prose about the horrors of Carmelo Anthony or the beauty of a usage/efficiency curve, but the truth comes down to the real numbers. And the numbers agree: usage and efficiency are significantly related, where on average inserting a high usage player into your lineup will increase your efficiency all other things equal.
Tyson Chandler, by the way, agrees: his shooting efficiency increased with Carmelo Anthony on the court.
Really awesome stuff here.
ReplyDeleteI'm planning on doing a study using the same methodology using %Ast'd instead of usage.
What's interesting is that even if you adjust Carmelo's ORtg up and Chandler's down to account for the differences in usage (even using the liberal coefficient of 0.2), Chandler is still better offensively than Carmelo this year. Over the past 3 years, Chandler has been consistently ridiculous and much better than Melo, but before that it seems like Melo was better overall on a usageadjusted basis. But Melo has been much more consistent on the whole, while Chandler prior to 3 seasons ago was very up and down.
ReplyDeleteSo the real question for me is how did Chandler elevate his game to near elite status recently? (I think its pretty clear that he's near elite now, especially when you factor in his defense, which everyone agrees to be among the best in the league) Three years is not exactly a small sample size either, so it doesn't seem to be a fluke.
Very nice work. Good to see Eli's conclusions validated by a much larger set of data. And accounting for defense will be a great improvement, and may strengthen the relationship.
ReplyDeleteA few questions and comments:
With your data, can you separate the usage impact on turnovers from the impact on shooting percentage? There has been some speculation that lowusage players forced to use more possessions will have a high TO rate.
While I think these results are right, there is one potential objection to the methodology: we don't know how shots are actually allocated when players are forced to increase or decrease usage. Theoretically, it could be that a 19usage lineup adjusts by having their worst offensive player increase his usage by 5%, while everyone else plays their normal game  that could explain the efficiency drop you find. Can you examine the actual usage patterns within these lineups, to see if players do change their usage proportionately. Or, compare their individual efficiencies?
If you do more work on this, I'd consider changing your usage metric to be the *change* in usage. For example, a 21 lineup is really a 1 lineup (or a 5% lineup)  these are players who on average reduced their usage by 1 point (or 5%) below their own season average. And that's what you are really measuring: the impact of a forced change in players' usage. In terms of presentation, you will then show the expected negative relationship between usage and efficiency, rather than showing a counterintuitive positive association.