What translates into contest success
To set up the model, data is needed, obviously, but it's strange how difficult it is to find comprehensive results on every player and every round. The NBA's own website was missing the seasons 2010 and forward, and it had incorrect results for 2007. Thankfully, crosschecking with this thorough website from Spain and several news sources, every single round from the year 2000 to 2013 was included. Why not before 2000? For one, the shortened line of '95 to '97 means the data from the mid90's isn't directly comparable, and the league has changed drastically with its threepoint habits; comparing 1988 to 2010 is a fool's errand, basically.
After the data was captured, the model form was selected. Because it's count data (i.e. whole, positive numbers) and there's an upper limit on the score, a simple linear model can't be used. Instead a beta regression model is used where the dependent variable (what you're trying to predict) is the proportion of total points scored in a round. Beta regression is preferred when the scale is limited to 0 to 1, useful for rates and proportions. (Specifically, it's the logit link model used in the betareg function from R.) For example, the alltime record of 25 by the old rules would mean a proportion of 25/30 or 0.833. The functional form is: y = exp(beta*x)/( 1 + exp(beta*x) ).
The fun math aside, what variables predict the threepoint contest scores? When I was collecting the data, I observed no patterns in the winners and the worst performers  big name shooters like Bird and Peja have done well, but shooting specialists like Craig Hodges and Voshon Leonard excel too. Kevin Love surprisingly won a contest, so maybe star players have an advantage, but Curry and Durant were disappointing. Undeterred, I tested a variety of stats  shooting percentages from the past three seasons (being careful to only look at the threepoint percentage before the allstar break), threepoint attempts per minute, usage rate (aka total shooting volume), height, returning champion, repeating participant, a dummy variable for the final round in case players "warmup," year, and product combinations by multiplying, say, height by threepoint attempts per minute in case there's an interaction effect.
The result? Unfortunately, only two variables were significant  a weighted average of shooting percentage from the past three seasons(1), and a dummy variable for the final round. There is no correlation between other factors, as much as people like to state big guys have an advantage because they'll somehow get less tired heaving the shots due to their size. Thankfully, there was one interesting result with the shooting percentages  the previous season is as important as the current one. One would think the more recent season is more relevant, and this is even more surprising because the previous season is full and has more attempts, but there's a selection bias for the contest: the NBA chooses players based on their shooting percentages and number of attempts from the current season. Thus, players who are having an anomalous halfseason are more likely to be chosen, and their "true" threepoint talent level is generally lower than what their preall star break percentage suggests.
(1) (3*3P_year0+3*3P_year1+2*3P_year2)/(3*3PA_year0+3*3PA_year1+2*3PA_year2)
From 2006 to 2013, the average difference in threepoint percentage between the pre and post allstar break was 3.3% (among players with at least 40 attempts after the break)  meaning, a player is likely a significantly worse shooter the rest of the season. However, there's a wrench in the selection bias: champions are invited back unless they're injured. This is probably the best comparison test since they are not selected because of their prebreak percentage. And among returning champions, the average difference is ... 0.02. Clearly, there's selection bias when only looking at a half season of stats.
And why am I including all this discussion? Because deciding a field of participants is a important (it's a big televised event and NBA players have to be scheduled and showcased), and it's unwise and apparent that only looking at a half season does not lead to the best set of selections.
If you're looking for other variables I missed that would explain the results, I'm listening but I considered many. Shooting style is one I've spent time on, but there doesn't appear to be a pattern either. Set shooters win? Beal nearly won and he jumps higher than most do in the contest, and Ray Allen won in 2001. And players with lower verticals like Peja, Pierce, and Bird have excelled as well.
Simulation model
With a formula in hand, the next step is simulating how rounds are won. There is no directly solvable way to estimate a player's chance of winning because there's a great deal of variation in what every player will shoot, you need to simulate the finals based on how well they do the round before, tiebreaker rounds are possible, and which players are in your bracket affect your odds.
Considering those facts, the model was built by varying the coefficient for shooting percentage based on the standard deviation from the regression results. Essentially, this gives a "real world" set of varying results where Kevin Love can shoot 20 one round and 12 the next. With the link logit (beta) form, there's also realistic limits where a score of 25 or 26 is rare, and so are scores of 4 and 5. Out of the 90 first round scores from 2000 to 2013, there was only one score of 23 or higher (Arenas) or 1.1%. Running the simulation with 32,000 games played (4000 simulation seeds with 8 players), there were only 326 such games, which translates to 1.0%. Only one case doesn't prove the model is reflecting real world conditions, so I plotted a histogram below showing how the first round in the real world and virtual compare.
Based on the simulation results with 4000 random seeds for each of the first rounds and 32,000 for the finals (4000 separate simulations for each possibility of the 4000 first round seeds), the most likely winner was Stephen Curry at 24.6%, followed by Beal who was closely followed by Lillard and Belinelli. Love's odds are at 5 percent, but that's largely due to a terrible 2013. On the other hand, the metric is weighted by attempts, so an injuryplagued season is not as destructive. Afflalo also has low odds due to a low percentage the previous season. Beal's advantage is in an easier field; he's not necessarily better than Belinelli. While Curry disappointed again, his odds were not near 100%, of course, and the finals had two guys in the top four by this simulation.
Simulation odds for winning the 2014 threepoint contest
Damian Lillard

Marco Belinelli

Kevin Love

Stephen Curry

Kyrie Irving

Joe Johnson

Bradley Beal

Arron Afflalo

13.7

13.5

5.1

24.6

11.9

10.3

14.6

6.3

Bovada (Vegas), by the way, provided an interesting set of odds. Curry was given 2 to 1 odds, which is significantly more than I estimated and there's a huge gap between him and the next closest competitor. I'm disappointed I didn't try this method sooner because Bovada listed Beal with the second worst odds of winning even though he's shooting 43% right now and shot 39% last season (and he's in an easier bracket.) I would have labeled this the best value along with Belinelli, and since Belinelli ended up winning with Beal second you could have put down 20 dollars on him and won 120.
Alas, the one contest that didn't need an overhaul was changed, and the new maximum score is 34 because there is a moneyballonly rack you can place at your choosing. Since this increases the variability of the score, the odds should be closer together. I'll have to think of a way to emulate this behavior before the next contest.
Under and overperforming shooters
An regression output that's usually ignored is the list of residuals. This is basically every single observation in the data (every single player's round) and it's usually squared or standardized. It's a way to spot outliers and trends. Though in this context, we can see which players are consistently doing better or worse based on their shooting percentages and what round it is.
The table below includes every player with at least four rounds from 2000 to 2013 and their average model error. In this case, a positive error is good; they're overperforming in the contest based on their percentages. A negative error means they're shooting worse than you'd expect. Based a robust sample of 11 rounds, two contests won, two others coming in second place, and outperforming the model by an average of 3.6 points, Peja may possess a little extra magic we can't quite capture by stats. Arenas is entirely buoyed by scoring a 23 in the first round and never repeating a similar feat, but it was only 4 rounds. Billups and Nash, unfortunately, never lived up to their reputations. Nash is currently 9th alltime in threepoint percentage, but his average score was 14.
Residual errors (not squared) for players with at least 4 rounds, 2000 to 2013
Player

Total rounds

Average error

Peja Stojakovic

11

3.6

Gilbert Arenas

4

3.0

Voshon Lenard

4

2.7

Daequan Cook

4

2.4

Jason Kapono

6

1.6

James Jones

4

1.4

Kevin Love

4

1.0

Quentin Richardson

4

0.9

Ray Allen

10

0.6

Kyle Korver

4

0.4

Wesley Person

6

0.4

Dirk Nowitzki

10

0.0

Kevin Durant

4

0.0

Paul Pierce

5

0.4

Rashard Lewis

5

1.6

Chauncey Billups

4

1.8

Steve Nash

4

2.0

Speaking of alltime great shooters who have not done well, I expect many people will bring up Curry, but he hasn't strictly been disappointing. Before the 2014 contest, his average score was 17.3 with an average error of 0.8; he actually did a little better than expected. However, he was very consistent so he's never had a huge score, but on average he does quite well. Translating his 16 point total from 2014 to the old rules with less moneyballs, that roughly equates to 14 points, which isn't a disaster. I'd say he hasn't underperformed based on his shooting skill. The problem, rather, is that people view number one ranked players and teams way too high compared to the field in most situations, especially in this contest where the results are noisy.
If the league wants the best shooters possible, they can't ignore previous seasons of data about threepoint accuracy. And I'd suggest letting three players play in the finals again because luck is too much of a factor already.
As a final note, I want to comment on the strategy of where to place the moneyball rack. Players are afraid of using it in the last corner because they fear they won't be able to finish the rack and could waste the extra points. This is not an ideal strategy for a few reasons. Besides how much closer the line is and how most players shooter better from that distance, players either finish rounds or have the clock expire as they reach and try to shoot the last ball. It's rare that a player leaves two or more balls unused, Joe Johnson notwithstanding. But no matter where you place the rack, the last ball will always be a moneyball, and if you're afraid of time expiring with two or more balls left you won't have a good chance at advancing anyway. (Although I'd prefer the old rules be reinstated because we have so few eraneutral basketball aspects to judge players. Adding moneyballs only increases luck and variability, and this contest already has that in spades.)
Great writeup. I agree that there was no reason to "fix" the contest, but I do like the idea of a moneyballonly rack. Adds an element of strategy to the contest. Looking forward to any further research you might do on the topic.
ReplyDelete