Thursday, May 16, 2013

Usage Versus Efficiency

One of the most hotly contested arguments in basketball stats is the relationship between usage and efficiency. While it's more widely believed now that efficiency decreases as usage increases, and vice versa, there are still some circles that remain unconvinced. Even if one believes this relationship exists, we still need hard numbers to evaluate the effects. Drawing on previous research by Eli Witus where he looked at play by play data from a partial season in 2008, I expand his work to five complete seasons and dig deeper into the numbers. The fundamental aspect of the research is to compare all lineups within a season to see if offensive efficiency significantly changes when the average usage changes (i.e. lineups with players with low usage rates, lineups with players with high usage rates, etc.)

Background

If one ignores this effect, then one is implying that players in a lineup with others who don't shoot often will not suffer from any efficiency losses -- that "shot creators" in basketball are not needed because everyone can create shots and that all shots are the same. Thinking about basketball from a practical perspective, if you send out a team composed of guys who have the offensive creation skills of Reggie Evans, you shouldn't expect good results if the guys are efficient in most other lineups. Your offensive game can subsist on putbacks and wide-open layups from defensive breakdowns mostly caused by double teams, and you can have a healthy shooting percentage. But a player pressed into a greater role can no longer rely on that environment. Also, because of the shot clock invention, at some point a shot will need to be taken even if it's contested in a broken offense, and it is far better for Carmelo Anthony to take that shot than Kenyon Martin despite Martin's advantage in TS%. To argue otherwise is a misunderstanding of basketball.

As outlined by Eli in his excellent article, this research works by comparing the expected and actual offensive efficiency of each lineup with respect to usage. The usage of a lineup is found by taking the average of each player season usage (via basketball-reference.) Defined here, the average player usage is 20, but some lineups will have a higher average usage (think Carmelo and JR Smith on the court at the same time) than others (whenever Carmelo and JR Smith are on the bench while Kidd and Prigioni are playing.) For the lineup data I used basketballvalue's publicly accessible matchup data from seasons 2008 to 2012, finding every possible lineup and totaling their pertinent statistics.

The expected offensive efficiency is taking the weighted average of basketball-reference's offensive rating and usage (it's a weighted average so Carmelo has a greater effect on the lineup expected offensive efficiency because he's going to take more of the shots.) I also included a simpler and slightly different metric (simple offensive efficiency) because basketball-reference's is a bit of a black box and it incorporates offensive rebounds, which aren't exactly the point -- this simple measure is points / (FGA + .44*FTA + TO). For actual offensive efficiency, I used points per possession from the lineup data, while for the simple offensive efficiency I had to change it to points / (possessions + offensive rebounds.) As a sidenote, this is actually an interesting diversion -- shouldn't we ignore offensive rebounds in the tradeoff between usage and efficiency? Or are they a part of the discussion?

Results

This research uses five regular seasons with 52,900 lineups with usable possessions (lineups in separate seasons are counted as unique.) with 1,073,818 total possessions and 601,431 in lineups with at least 50 possessions. Many of these lineups were rarely used -- end of quarter three-point bomber lineups, end of season tanking rookie lineups, etc. -- but there were still 3882 lineups with at least 50 possessions and 1683 with 100 possessions. The most efficient lineup with at least 50 possessions was Jason Kidd, Jason Terry, Josh Howard, Brandon Bass, and Dirk Nowitzki with 1.63 points per possession, along with Carlos Arroyo, Maurice Evans, Turkoglu, Brian Cook, and Dwight Howard close behind -- both in 2008. The most efficient lineup with 100 possessions with 1.49 points per possession Jameer Nelson, JJ Redick, Matt Barnes, Ryan Anderson, and Dwight Howard in 2010. Least efficient? With at least 50 possessions it's Brandon Jennings, Michael Redd, Carlos Delfino, Ersan Ilyasova, and Andrew Bogut at 0.59 points per possession, and with at least 100 possessions it's CJ Watson, Kyle Korver, Luol Deng, Carlos Boozer, and Joakim Noah at 0.70 points per possession.

Below is a graph of all lineups above 50 total possessions with the average usage of the lineup on the x-axis and the difference between the actual offensive efficiency (points per possession from the lineup data) versus the expected offensive efficiency (from season totals of offensive efficiency of each player.) The dots are semi-transparent so it's not one huge blob, and wherever it's darker you know that's where more observations are. A positive differential (y-axis) means the lineup is more efficient than expected. There are two main comments that come to mind immediately: one, there is a clear but slight trend where a lineup is more efficient as usage increases; and two, there is a huge amount of variation in offensive efficiency.

One method to smooth out the results is to pool the data together into bins. Eli did this with his graph, and it's pretty straightforward -- all lineups between 15.5 and 16.5 are combined to form the 16 usage % datum, for example. This has the advantage of using all the low possessions lineups that can't be used in a basic regression or a larger graph because there would be too much noise.

Below is said graph. The colors denote the total possessions of all observations for a given usage range. The colors work in terms of breaks: the lightest one0 will appear for any number of possessions between 0 and 15,000, for example. I included color (or, well, shade) as a third dimension to explain why the dots exhibit wackier behavior at the endpoints. The middle sets of points all have over 100,000 possessions, while at 15 usage % there were only 6260 possessions. While it's unclear at the extremes, there appears to be a positive linear relationship with the average usage of a lineup and offensive efficiency.

Regression

To explain offensive efficiency (predicted versus actual) regular old OLS regression was used with a few different tweaks thrown in. One important factor to explain all the noise seen in the first graph is homecourt advantage. A variable called "HCA" was created, ranging from -1 for a lineup only used in away games to 1 for a lineup only in home games. Another possible factor is the season, but for every iteration of the regression seasons were not a significant variable. Since for each year all the possessions were combined for each lineup, the regression is weighted by number of possessions. Note that each lineup is one separate dependent variable. Usage in the regression models is setup as player usage minus 20, so that the results center around zero (the average NBA usage is 20% because there are five players on the court since 100%/5 = 20%.)

For an interpretation of the results, you have to know the key variable well: OffRtgDiff has units in terms of points per possession, and as discussed previously it's the difference between the actual points per possession and the expected points per game. Each lineup in a specific year is its own observation for the regression. Going back to the first graph, there are a few lineups with an OffRtgDiff of +/-0.4, but most observations are within +/-0.2; and as for usage, the extreme is a usage of 16 or 24, but again most are between 18 and 22. Points per possession can also be translated into the offensive efficiency leaderboard you typically see: 1.12 points per possession (112 offensive rating) is what the Thunder had to lead the league, and 1.02 (102 offensive rating) is what Washington had to place last.

The first set of results include an intercept. Two minimum gates for total possessions were used. Homecourt advantage was significant in both models, although in the second it's not as strong. Note that the accepted value for homecourt advantage is  roughly +3 points, and translating the coefficients to a full game confirms this: 0.015*2*92 possessions = 2.76 points or 0.0161*2*92 possessions = 2.96 points (HCA as a variable goes from 1 to -1, so a 2 is needed as a multiplier, and 92 possessions is roughly the league average for a full game.) One can also see how small the R^2 measure is. Only a small amount of the variation is explained by these variables.


OffRtgDiff = HCA + Usage, 50 possessions
OffRtgDiff = HCA + Usage, 100 possessions
Coefficient
St. error
p-value
Coefficient
St. error
p-value
Intercept
-0.00259
0.00148
0.0800
-0.00588
0.00178
0.000997
HCA
0.0150
0.00466
0.00127
0.0161
0.00671
0.0162
Usage
0.00895
0.00105
< 2e-16
0.00904
0.00132
1.17e-11
Adj. R^2
0.01996
0.02848
Standard error
1.119
1.126
Deg. of freedom
3954
1680

Usage, however, was a highly significant variable in both models where the coefficients were extremely similar at near 0.009. If I had used points per 100 possessions for a variable, this would mean the coefficients would translate to a near one to one relationship between average lineup usage and efficiency. That is a handy, easy to interpret result. For example, in 2010 Denver's most popular lineup was Billups-Afflalo-Anthony-Martin-Nene. The average usage was 21.14 and the expected efficiency (weighted average of player usage and offensive rating) was 1.135 with 1033 possessions. Another one of their most used lineups was the same except Anthony, a 33.4 usage player and an offensive rating of 1.10, was replaced with Joey Graham, a 16.2 usage player with an offensive rating of 1.06. Because Joey Graham doesn't have a terrible offensive rating and the other options were efficient, the lineup's expected offensive efficiency barely decreased without Anthony (from 1.1352 to 1.1346.) However, the lineup's efficiency actually crashed down to 1.0346 points per possession. In applying the coefficient, the lineup's change in average usage (3.44) translates to a loss in efficiency of roughly 0.031 points per possession, which is in fact a huge change as it's the same difference between the 9th ranked offense in LAL and the 18th ranked offense in Atlanta. The drop in usage doesn't entirely cover the difference in the two lineup's efficiency since there's a lot of noise in this data, but it's illustrative of the effect of usage and efficiency.

Lineup
Billups-Afflalo-Anthony-Martin-Nene
Billups-Afflalo-Graham-Martin-Nene
Expected efficiency
1.1352
1.1348
Actual efficiency
1.0958
1.0346
Diff. efficiency
-0.0394
-0.1001
Average lineup usage
21.1
17.7

One might have noticed the placement of an intercept in the model without a clear need for one. The data should center around zero, although due to a few problems in reading the play by play data and how player offensive rating is calculated this is not actually true (but it's close.) Adding an intercept when it's not needed could be argued as model overfitting, so I repeated the regression without one. The coefficients are a little smaller, but part of that is because the average OffRtgDiff is a slightly negative, and not exactly zero. But every model coefficient is within a couple standard errors of each other and usually a lot closer, so the results appear to be reliable.


OffRtgDiff = HCA + Usage, 50 possessions
OffRtgDiff = HCA + Usage, 100 possessions
Coefficient
St. error
p-value
Coefficient
St. error
p-value
HCA
0.0147
0.00466
0.0016
0.0149
0.00672
0.0271
Usage
0.00856
0.00102
< 2e-16
0.00765
0.00129
1.46e-9
Adj. R^2
0.01921
0.02285
Standard error
1.120
1.129
Deg. of freedom
3955
1681

Another set of comparisons is with the simple offensive rating metric, which in fact treats an offensive rebound as a new possession so it's only looking at shooting percentages and turnovers. I wouldn't recommend these results for the rudimentary measures, but it's nonetheless important to see another set of results suggesting a highly significant relationship between usage and efficiency. The coefficients are smaller, and perhaps offensive rebounds interact with usage but that would warrant further examination.


SimOffRtgDiff = HCA + Usage, 50 possessions
SimOffRtgDiff = HCA + Usage, 100 possessions
Coefficient
St. error
p-value
Coefficient
St. error
p-value
Intercept
-0.0110
0.00136
5.81e-16
-0.0127
0.00165
2.01e-14
HCA
0.0120
0.00428
0.00490
0.0126
0.00621
0.0418
Usage
0.00629
0.000962
6.81e-11
0.00602
0.00122
9.45e-7
Adj. R^2
0.01204
0.01502
Standard error
1.028
1.041
Deg. of freedom
3954
1680

One last set of results, and then I swear I'm done. This was conducted on the data you can see in the second graph (the pooled lineups.) While there are only nine data points, there's an advantage that nearly every possession over five years is included (the ones that aren't are at the extremes.) The adjusted R^2 is finally respectable, as 0.9507 is actually outstanding, and the usage coefficients are within the same range as in the above tables. The standard errors are both quite small, suggesting a high degree of confidence in the coefficients given despite the nine observations.



OffRtgDiff = Usage, intercept
OffRtgDiff = Usage, no intercept
Coefficient
St. error
p-value
Coefficient
St. error
p-value
Intercept
-0.00779
0.00126
0.000265
N/A
N/A
N/A
Usage
0.0103
0.000781
< 1.03e-6
0.0105
0.00177
0.000212
Adj. R^2
0.9507
0.7756
Standard error
1.302
2.95
Deg. of freedom
8
9

Conclusions

For using these results, I would recommend using a coefficient of 0.01 for usage and points per possession (the coefficient would be 1 for points per 100 possessions) for simple estimates. For models or anything more sophisticated, something more conservative would probably be more appropriate (0.08 to 0.09, possibly.) One has to remember that the average lineup usage and efficiency were used. If a lineup with a usage of 20 adds a player with a usage of 30 (replacing a player with a usage of 20), this means the new average usage will be 22. Then assuming a coefficient of 0.01 the points per possession for the lineup will increase by 0.02 points. That is not a minuscule result: it's the difference between the offensive rating of the league-leading (for 2013) Oklahoma City Thunder at 112.4 points per 100 possession and the fifth-ranked Nuggets at 110.4.

The results are similar to what was found by Eli. His usage variable was relative to 100 instead of 20 like mine so you have to translate the numbers, but his coefficient was roughly 0.0125 compared to mine around 0.009 to 0.01. However, he only looked at part of the 2008 season, while I had five complete ones.

Another point is to take this to the individual level: on average and roughly speaking, when a player decreases his usage by 1%, his efficiency increases roughly by 0.2 (in offensive rating), while increasing his usage by 1% decreases his efficiency by 0.2.

The relationship between usage and efficiency was assumed to be linear. Judging the graphs, this is an acceptable decision, but I would warn about the results at the extremes where few lineups were available to display precise trends. Lineups with an average usage roughly from 17 to 23 seem to be linear, but outside that range the results start to have nonlinear effects. That is probably due to the inherent noise in the data without an appropriate sample size, but this is not clear and it's reasonable to assume that at some point the linear relationship will break down. For another omission, the opposing lineup's defense is ignored. While this is an oversight, it is far more likely that lineups with a high average usage would face tougher defenses than vice versa. When I update the study or do refinements I'll try to incorporate this, although defensive estimates are either hilariously simple (only rebounds/steals/blocks) or not precise (+/- estimates, but regularized +/- is fairly reliable.)

An important point about how the study was setup is that usage is relative to a team. If you put LeBron, Wade, Anthony, and Kobe Bryant on the same team, then their respective usages will decrease and the average team usage will still be 20. Rather than looking at players changing teams and seeing the effects, this study is comparing offensive efficiency at a team level through lineups. One complication, however, is that trades within a season lead to players who collect usage stats in a different environment -- for example, a player on the Bobcats will have a higher usage rate because no one else is shooting, and when he's traded he will appear to be more of a shot creator than he actually is. A further refinement would be to ignore lineups with traded players, use only the usage rate they had for their specific team, or to actually study the change in usage and efficiency due to a new team.

The dialogue will continue into the future, but it's hard to argue that a relationship between usage and efficiency is non-existent after five seasons of data and thousands of lineups. The extent to which it's important and the actual mechanics can be further studied, however, but we should be careful in making such proclamations as "Tyson Chandler is better than Kevin Durant" just because Chandler has an otherworldly shooting percentage.

Furthermore, we have to be very careful with our terms before debating this topic. "Shot creator" strictly means a passer or, heck, a coach, and low usage doesn't imply low skilled players. Chandler, for example, is an astoundingly good at rolling to the rim and finishing, and he's great at the line for his size. What matters is the sum of a whole host of factors -- putting pressure on the defense is the biggest one, with related factors like double teams created. Watch how a modern defense reacts to a star scorer, especially a good defense: they flood the strong side and perimeter guys leak off a few feet to help before scurrying back to their guy. Lots of three-point shots are created after a double team causes a slow rotation to the corner three or a wing.

Also remember that not all shots are equal. Sometimes you'll get lucky and find an open man for a layup inside after a defensive breakdown, but there will be a large number of possessions where a less than ideal shot is needed to be taken like a midrange jump shot from 18 feet or a fallaway jumper from 10 on the baseline. Players who can take those shots with decent efficiency (relative to those situations) are indeed valuable, and what they do is reduce the number of possessions where, say, a low usage player has to hoist an awkward hook shot from outside the paint. Yes, there are lots of low usage players who are very efficient, but they take different shots than players like Westbrook. You cannot go an entire game with Tyson Chandler lobs or Jason Kidd three's. At some point other shots will need to be taken, and you need players who can take those shots. Obviously, you don't want someone hoisting awkward 20-footers while ignoring wide open players, but that is not what this is about and that is not what happens in every high usage player.

To use the results of this study to state, "Monta Ellis increases his team's efficiency by jacking up terrible shots" is a fundamental misunderstanding of the research. First of all, it was already presented that there is a huge amount of variation in efficiency differential, so to state in absolute terms about a specific case is wrong. Secondly, the results refer to the average result. There are players who increase the lineup's usage and help the team even further, and there are players who increase the lineup's usage and hurt their team, even after adjusting for the offensive rating from the new player. Basketball is not a simple game where you apply a single number to each player and call it a day. The interactions are complex and form a dense, tangled set of unseen, incomprehensible machinations. A star understanding his role in distracting the defense, knowing when to take the shot, and when to pass to a low usage player is vitally important, a subtle knowledge few have mastered -- LeBron James is one recent member; watch how he decides to attack or pass off to a teammate. At the other extreme, you can have selfish players who only shoot when open and pass the ball with three seconds on the shot clock to some unlucky soul who has an even worse shot. But those are subjective evaluations and need to be supported by numbers.

We could spin a web of well-formed arguments in prose about the horrors of Carmelo Anthony or the beauty of a usage/efficiency curve, but the truth comes down to the real numbers. And the numbers agree: usage and efficiency are significantly related, where on average inserting a high usage player into your lineup will increase your efficiency all other things equal.

Tyson Chandler, by the way, agrees: his shooting efficiency increased with Carmelo Anthony on the court.

Monday, May 6, 2013

2013 Playoffs: Second Round, Part II


(1) Miami Heat vs. (5) Chicago Bulls

This one screams "quick four or five game" series, with the safest bet being five games due to Chicago's resilence and the brilliant defensive mastermind of Tom Thibodeau. I've been terrible so far in predicting the playoffs, but to be fair there have been some odd turns. I wasn't surprised seeing Memphis knocking off the Clippers, especially because of the Griffin injury, as I thought it was a coin flip series, but the Nuggets had no business losing to the Warriors. Denver responded to Golden State's smallball often with the large Bogut in the middle by benching their productive bigs leaving Faried to guard the basket, which according to a few physics theorems is basically impossible. Then Chicago beat the Nets despite the talent disparity and Noah's injury, plus the Biblical calamities that fell on the rest of the roster. Be careful walking the stairs, Jimmy Butler. Seriously.

Somehow the Heat get to play under negative SRS (adjusted point differential) team. Strictly going by the numbers, this is more likely a sweep, but again, I trust Thibodeau and an inspired game from this crew. That's pretty much what everyone else is saying about the series, but I think no one understands how bad the Bulls were this year given how weak the east was. They were lucky to win 45 games, and beating the Nets was a upset with their injuries. The Bulls have a brand name, sure, but Thibodeau really needs to be given credit for decimating the Nets' "coaching." Too bad you can't use smoke and mirrors to erase LeBron from the series.

What to watch for:
-At full strength, the Bulls may have the best defense against the Heat with Butler, Hinrich, Deng, Gibson, and Noah.
-Miami has two glaring weaknesses -- big centers and rebounding -- but the Bulls won't be able to fully exploit those.
-They've an above average rebounding team, great on offensive rebounding, but not superhuman like they were a couple years ago.
-While Noah is doing his best impression of Bill Walton/video game character with his health bar near zero but just hit a power-up, he's not enough offensively to attack Miami and hurt them.
-LeBron by his own standards had a lackluster first round, but he was probably bored. If he wants a season for the ages he needs to set up his game.
-There's a good chance the Heat will sweep, and then we'll be bombarded with another win streak.

Prediction: Heat in five.

(2) San Antonio Spurs vs. (6) Golden State Warriors

I don't think the Warriors know what they got themselves into. They're going into the teeth of a well-oiled machine, and while we've seen the Spurs come up short recently this is an organization that with the exception of a David Robinson injury-plagued season hasn't had a sub-50 win season (lockout seasons adjusted) since 1989, and the big three appear to be fully healthy. Splitter and Diaw are still ailing and recovering, but the Spurs don't need size against the lilliputian Warriors, save Bogut, who goes from guarding Faried, Koufos, or McGee to a resurgent Tim Duncan season. The Spurs have to be happy about Westbrook's injury as much as anyone, because they know as well as anyone that anything can happen once you make it to the finals. Underdogs can win.

Curry's going to be blanketed by a smarter team and bigger defenders. The Warriors won't be able to break out in transition as much. Their three-pointers won't be quite so open. No one noticed, but San Antonio quietly had a great defensive year. If Ginobili is healthy, he's going to shred the Warriors' bench. There are just few positive signs for Golden State and San Antonio will have a couple of big blowouts. They're ecstatic the Warriors found some luck and pulled off an upset, because now it's basically like another first round opponent for Spurs. Like Miami, their path to the finals is being paved by whatever force -- destiny, God, David Stern (obviously, because he loves seeing the bland market of San Antonio prevail again.)

What to watch for:
-Tim Duncan had one of the best seasons ever in terms of avoiding fouls while blocking shots at the rim. The Warriors will come to learn this.
-The Spurs are not a boring franchise, and Ginobili is one of the main reasons why. We don't know how much longer he'll play, or how many more moments he'll have up to his standards, but we're approaching victory lap territory. Cherish the plays.
-I think we're all waiting for the game back in Oakland where Curry makes 9 three's and the crowd sustains jet engine noise for the entirety of the contest. That one they can win.
-David Lee's defense, now a league-wide joke, is going to be hampered by an injury and will face a smart defense and, if things go horribly awry, Lee may end up guarding Duncan on a few plays.
-Is Barnes having a Tayshaun Prince breakout year in the playoffs, like some rookies or young players have, or is this a Trevor Ariza-like three-point streak that will frizzle out the rest of his career?
-Despite Curry's amazing year shooting from outside, his TS% is one-tenth of one percent better than Tony Parker's, because Parker can actually get to the paint and make layups. Lost in Curry's magical season is his inability to get to the rim.

Prediction: Spurs in five games.

Sunday, May 5, 2013

2013 Playoffs: Second Round, Part I

(1) Oklahoma City Thunder vs. (5) Memphis Grizzlies

This is a really tough series to predict. If I'm going purely by numbers and not matchups, I'm going with the Thunder, but when I look at the teams themselves and the matchups possible I want to say the Grizzlies may have a better chance.

After Westbrook went down, I estimated their new team strength would be around a +5.5 point differential, and lo and behold adjusting for their opponent and homecourt they were a +6 team after Westbrook went down. I even said, "Don't be surprised if they drop a couple games," and they indeed did. I was right on in predicting how much more Fisher, Reggie Jackson, and Sefolosha would play, but I did not predict an Atlas-like minutes load from Durant, which probably put them over the top to hit the +6 mark. He will not continue to play that much, as no one has done so in the playoffs since 2009. However, the Thunder now have a little more time to adjust their defensive problems and Reggie Jackson will have some actual experience under his belt.

Unfortunately for Memphis, if the Thunder continue to play at this level they are a slightly better team than the Grizzlies, and a slightly better team with homecourt advantage is a better decent bet. This is where the matchups come in. The Grizzlies are known for their bruising frontcourt, and Conley should have little trouble getting by Jackson or Fisher. Durant is now the only shot creator on the team, and he's going to be hounded by an elite defense with Tony Allen, Prince's long arms, and Marc Gasol waiting in the paint. And poor Kevin Martin, coming off a rough series when his team needed him the most, will often by harassed by Tony Allen. Weaknesses are magnified in the postseason where coaches and assistants work overtime in preparation, and the Thunder have some glaring problems: Durant turns the ball over when he forces the issue, the point guards are not starting caliber, and an entire system that relied on a player who had never missed a game is thrown into flux while the Grizzlies just feasted on a good team.

Another matchup problem is how isolation heavy Oklahoma City are. Against a great defense, a predictable offense will be destroyed, and this was not made apparent during the previous series because the Rockets were not a good defensive team. The Thunder had an unusually high scoring margin during the regular season, but they've also had magnificent health -- Durant, Westbrook, Harden when he was there, and Ibaka were missing less games per year than five. That inflated their team numbers a bit, and now they have to play without their all-star point guard. Even if I go purely by the numbers, this looks like a seven game series, but I think the Grizzlies take this one because they can exploit weaknesses.

What to watch for:
-Nick Collison is actually a Zach Randolph stopper, but it remains to be seen if the Thunder actually know this and will use it.
-Durant as a playmaker. Compared to most elite perimeter players, he's far behind in this respect. This is the time where he needs to emerge as a playemaker.
-Memphis is a terrible team shooting from the three-point line, but I don't think the Thunder are disciplined enough right now to take advantage of this.
-The rotation of the Thunder big men. How much will Ibaka guard Randolph? Will he draw him out using his jumper? Can Perkins actually guard Gasol and how much will he play? And how much smallball will we see?
-Randolph again will go against a highflying athletic power forward, and may again destroy a man built of pure muscle in the paint.
-Jerryd Bayless is acting like a kick-returner on offense, zipping to the rim with reckless abandon. Awkwardly, the Thunder are using Fisher and Martin off the bench at the guard slots, and there might be some odd matchups when Bayless is on the court.

Prediction: Grizzlies in six.

(2) New York Knicks vs. (3) Indiana Pacers

Matchups, matchups -- the Knicks rely on Carmelo's offense, and the Pacers have the newly minted elite long and quick defensive wing in Paul George; and they rely on long-range bombing while the Pacers are known for shutting down three-pointers. Team strength? It's basically a +3.7 team in New York versus a +3.3 team in the Pacers, except the Pacers have been luckier with health and New York (if they remain healthy) is probably a bit stronger.

I think we might be underrating the Knicks because the Celtics played their hearts out. When Carmelo is playing well, they're a pretty good team. It's that "when" that's in question, while the Pacers consistently have a great defense -- that, in fact, is why defensive teams fare better in the playoffs. Your three-pointers won't always hit their mark, but your rotations probably will. And, once again, we have a low post bruising team versus an offensive club. New York almost exclusively plays smallball now, which makes David West's defensive assignment a bit troublesome. However, Anthony will often be called to guard West in the post, likely taking abuse.

Notice how I keep downplaying the Knicks? They're an inconsistent team who have gone through dead stretches during the season, while the Pacers' biggest problem was Hibbert's horrid shooting problem, which went away as soon as he fully recovered from an elbow injury. But there's one key to the Knicks' season, and no, it's not the mercurial JR Smith; it's Tyson Chandler. He played terrible in the opening games of the Boston series, but improved as he went on. Shumpert, too, improved as the series continued, and he is their best wing defender and will be called on to defend Paul George. If Carmelo Anthony plays up to his standards, the Knicks with homecourt advantage have greater odds to win. But it's all contingent on "if"....

What to watch for:
-Tyson Chandler needs to keep his health because he's going against a great rebounding team and a Goliath at center in the post.
-The Pacers concede midrange jumpers, and this could work well in goading JR Smith into taking stupid shots instead of being aggressive and attacking the rim.
-Playing the Knicks, the Pacers actually get to be seen on national TV, and unlike the last game against the Hawks (the only one not on NBA TV) it will be heavily featured because of the lore of New York.
-Tyler Hansbrough has yet to see a shot he didn't want to take, and Kenyon Martin has been playing remarkable defense. Expect to see a few marvelous rejections when the benches are used.
-Sadly, Felton is highly important on offense, and as a fan in Portland would tell you this would not put one at ease.
-As a sidenote, I can't think of a more interesting destination for Chris Paul than Indiana. It won't happen for a variety of reasons, but his half-court mastery would be a perfect fit for a team desperate for offense.

Prediction: New York in seven games (I've written negative articles about you before, Carmelo, but now is the chance to redeem yourself.)

Edit: Accidentally called the Pacers a fifth seed. Oops.

Thursday, May 2, 2013

Harden Versus Elite Defenses

One of the stories right now is the supposed implosion of James Harden, the breakthrough shooting guard who's lauded for his scoring efficiency but receives heavy backlash from people who hate his flopping and deride his game, calling it bland and, since he has no midrage game, devoid of skill. Critics also imply he's somehow paying off the refs or the league has an interest in his success, like it would make sense to conspire in an effort that could get you fired for propping up the NBA markets of Oklahoma City and Houston. Now there's a widespread declaration that he shrinks in playoff games and against good defenses, coming of a disappointing finals from the 2012 season and the recency bias of some high turnover games against the Thunder. While the playoff woes deserve a little more time because you can't judge a player based on so few games, Harden just spent an entire season as the number one scoring offense and we should be able to tell if he indeed struggles against top defenses.

Methods to the madness

There actually isn't a single straightforward way to judge how well someone fared against a defense. Points doesn't tell you how many shots someone missed or turnovers created. Focusing on shooting efficiency doesn't account for more of an offensive burden: if you take more shots, your efficiency typically decreases. There are lots of single number metrics (too many, honestly), but not all have an only offense component and none are perfect.  The best approach is to analyze this from multiple angles and look for patters, as well as comparing Harden's output to that from the league's other top scorers.

But what myths are we looking to bust? Does Harden perform worse as the defense gets better? Generally, every player gets worse as the defense gets better, because it's better defense. I'll separate all the different hypotheses about the general myth of Harden shrinking versus top clubs.

1) Hypothesis: Harden's offense performs worse as opposing defenses defenses get better.

2) Hypothesis: Harden's offense performs worse than expected as opposing defenses get better. Expectation is based on the opposing team's allowed points per possessions.

3) Hypothesis: Harden's offense performs worse compared to other high scorers as opposing defenses get better.

The data

Using basketball-reference's team defensive ratings and their offensive efficiency rating for players, I've produced the graph below. Usage percentage is the percentage of team possessions a player uses based on field goals, free throws, and turnovers. I've coded usage with color so you can see if Harden is getting less attempts in games against elite defenses, and everything is on a per possession basis so there's no bias in pace or minutes played.


I don't see how there's any pattern between Harden's scoring and the opposing team's defensive rating, and unless you're John Nash you won't find one either. People may get distracted by the clump of low efficiency games versus the league's two best defenses, Memphis and Indiana, but he had another game versus Memphis with an offensive rating of 153 and above average usage -- and yes, Tony Allen and Marc Gasol played, and Memphis was blown out. Additionally, he had a string of good games versus the league's third best offense the Spurs. Using regression, you can test whether or not the defense is a significant predictor of various offensive stats. I've included a summary below, and the results are much like the scatterplot: there is no link. I've also used home/away games as a dummy variable, in case that was influencing the result. It's an interesting sidenote, but with Harden and the other top scorers their production does not appear to be significantly affected by away games.


Offensive rating
Offensive game score
Points/minute
Usage %
Assist %
Coefficient
0.885
0.319
0.216
0.118
0.409
p-value
35%
28%
36%
50%
33%

*Offensive game score is the game score outlined here but without the defensive components

The most telling fact from the results is that none are statistically significant (near the 5% p-value threshold.) There's simply too much noise and variation. He doesn't appear to perform worse in a systemic way. The offensive rating coefficient is interesting. Since it's on the same scale as the defense (the dependent variable), a coefficient less than one means Harden is performing better than expected. (This can also be taken as an inelastic value.) The best way to understand the coefficients is to think of the league's best defense with a rating around 100 and one of the league's worst defenses with a rating around 110. With a ten unit decrease (going from a terrible defense to an elite one), Harden's points per 36 minutes only decreases by two points, his assist percentage by 4%, his offensive rating by 9, etc.

How does he compare versus the other top scorers? Using the same basic regression of offensive rating or usage % versus the opposing team's defensive rating (points allowed per 100 possessions), I created two tables for easy comparisons. Keep in mind the scale and units of the coefficients, and that negative means you're actually improving as the defenses get better. The p-value shows statistical significance (anything around 5% or lower.)

The results are surprising. In offensive rating, Harden performs better than anyone except for Carmelo Anthony. Harden maintains a healthy offensive rating as the defenses get better, and his usage is only slightly decreased. Keep in mind that his results aren't statistically significant, so you can't actually say he performs better or worse. However, the other scorers are more interesting. Durant's efficiency plummets (a coefficient of 2.8 is huge), while he increases his scoring load a far amount, but not enough to offset his efficiency. Both Kobe and James see their efficiency take a dive, and they don't see a significant ride in usage to counteract it.

Offensive rating versus opposing defensive rating

James Harden
Carmelo Anthony
Kevin Durant
Kobe Bryant
LeBron James
Coefficient
0.885
0.282
2.82
1.74
2.18
p-value
35%
77%
0.029%
3.1%
0.31%

Usage rating versus opposing defensive rating

James Harden
Carmelo Anthony
Kevin Durant
Kobe Bryant
LeBron James
Coefficient
0.118
-0.0731
-0.431
-0.176
-0.108
p-value
50%
77%
4.5%
47%
51%

Of course, team roles play an important part because elite defenses are better at certain techniques of flooding the strong side of the court or controlling the top scorer with a premier wing defender. But Harden has a tougher role than some of the other stars: he's the only focal point on offense and the only top creator. James has Wade, Durant had Westbrook, and at least Kobe had Howard, who demanded more defensive attention than anyone else on Houston's team.

Speaking of team effects, one might wonder if Houston performs worse against better defensive teams in the regular season. Well, this is pretty easy to test. Using b-ref's definition of offensive/defensive efficiency, I grabbed Houston's data from the regular season and calculated their offensive efficiency in each game to compare it to the defensive efficiency of their opponent, and used regression (again.) Basically, Houston has a coefficient of 1.08 for defense, which means they play slightly worse than expected versus better defenses, but it's so slight it may not be significant. So I then constructed an expected efficiency differential. This is just the difference between Houston's offensive efficiency for a game and the opposing team's defensive rating for the season, along with the effect of homecourt advantage. Just to be sure, I did this with an assumed homecourt advantage for Houston of 2.32 and league average of 3.23 from this study, and again with home and away as dummy variables so the model calculated its own homecourt advantage numbers.

Long story short, Houston's offense does not perform significantly worse than average. For the expected efficiency differential model with HCA built in, the coefficient for defensive efficiency was only 0.0108 -- translated, in going from an elite defense (rating of 100) to a crappy one (rating of 110) they'd be only 0.108 points per 100 possessions worse than expected. The variable wasn't significant -- or close to it. (As a fun statistical side note, the standard error was forty-six (!!) times larger than the coefficient, and the R^2 value was a minuscule 5.8*10^-5.)

Hypothesis 1: Does Harden's offense get worse as defenses get better?

There appears to be a slight relationship of team defense and his offense, but it's not statistically significant. Hypothesis 1 is actually rejected.

Hypothesis 2: Does Harden's offense perform worse than expected as defenses get better?

Since hypothesis 1 was rejected, this one also falls, and actually Harden was performing better than expected (due to his coefficient being less than one for offensive efficiency) by some metrics.

Hypothesis 3: Does Harden's offense perform worse compared to other top scorers as the defenses get better?

No, and actually he looks better in comparison. This one, however, will be the focus of future research (how top scorers fare against top defenses.)

Harden may depend on his foul drawing abilities for his scoring, but this is true of many elite wing scorers in the past from Jerry West to Jordan. Perhaps he's a playoff underperformer, but he hasn't played enough yet to brand him with such a slanderous accusation. Being called a playoff failure is a career killer, as David Robinson knows all too well. He doesn't have a midrange game, but he's one of the few players who can successfully shoot three-pointers at a healthy percentage off his dribble, and his team doesn't perform worse than expected as defenses get better. Being dependent on the two most valuable shots (three-pointers and at the rim attempts) in the game should in fact be considered a strength. The Beard has his detractors, but his game is holding up.

Edited: Added a section on how Houston performs as defenses get better.