Sunday, September 16, 2012

Basketball in the Olympics

Note: I took a summer break from this, but I'll be back with an article or two a week including more statistical analysis.

Introduction

David Stern suggested that the Olympics and the World Championships should be limit NBA players to those who are 23 years-old or younger. One of the issues is the risk involved in letting multi-million dollar assets compete during the summer with the possibility of a long-term injury suffered during the summer. Mark Cuban is one of the most vocal opponents of NBA stars playing in competitions, and from his perspective it's about protecting a valuable commodity rather than letting human players represent their country in a global celebration of both their sport and all sports. This comes up every summer there's a tournament, and people assume that it leads to a negative effect on NBA players from either getting injured that summer (rare) or suffering one in the regular season due to "stress" from already playing too much in the summer. Of course, no one has actually looked through the numbers to see if this is actually true. In this article I'll be looking at whether or not players do indeed suffer a negative effect in terms of games missed or injuries accrued from playing in either the Olympics or the World Championship in every tournament where NBA players are used since 1992, the original Dream Team.

Games Missed Before a Tournament Versus After a Tournament

The most obvious way to answer this question is to simply look at how many games players log in the regular season before the summer tournament and after. Focusing on just those adjacent seasons minimizes the effects of aging players missing more games or any other similar systematic error. Using every team from 1992 and on (nine total because 1998 occurred during the lockout and 2012 obviously doesn't have an "after" season for comparison), there are 2056 player seasons with which to work. This is limited, however, to American players to keep the data as homogeneous as possible. The lockout seasons are adjusted for an 82 game equivalent. The span is from Dominique Wilkins to Stephen Curry. The good thing with looking at star players, which are the bulk of those selected to the teams, is that they will not miss games due to a coach's decision (i.e. not being good enough to play), improving the data set by maximizing the number of legitimate games missed solely because of an injury. Obviously with real data nothing is ever perfect, and effects like tanking pollute the results, but it's definitely usable.

Table 1: Comparison of games played before and after summer competition
Regular season before
Regular season after
75.8
71.0


Averaging nearly 76 games for a regular season is actually quite impressive, but there's a noticeable drop-off following a summer competition -- 4.8 games, specifically. Note that restricting this to seasons before and after the summer drops the sample size down to 204 seasons. Now if this was Mythbusters or an ESPN article we'd leave it at that and move on, thinking players average less games after the Olympics, but it's not much more work to go a little further and definitively answer the question.

Using a student T-test with two tails (more conservative) on the before and after seasons assuming unequal variances, there's significant evidence that NBA summer Olympic and World Championship participants play less games the following regular season. Specifically, there's a 0.31% chance the null hypothesis was correct -- that regular seasons after a summer competition are the same in terms of games played. However, it's a small difference and one year of aging does make a considerable impact on how many games you play a year.

Regression of Games Missed, a Summer Competition, and Age

A simple regression was conducted to predict the number of games played given age and whether or not the player was in an Olympic or World Championship tournament the summer before. This includes every season from every professional in the competitions, even Larry Bird who played the games after his last season in the NBA. The summer competition is a dummy variable, which for the uninitiated is the slanderous name for a variable that is either 0 or 1 (i.e. the mere  presence of a summer competition.) Below is a graph with the regression results.


Coefficient
Standard error
p-value
Intercept
89.9
2.16
2.81E-275
Age
-0.713
0.0765
2.74E-20
Summer competition
4.14
1.65
0.0124
R^2
0.04553
Adj. R^2
0.04460
Standard error
16.3
Degrees of freedom
2053
F-statistic for inclusion of summer variable
0.0124


The F-statistic is showing the significance of the presented model with one that doesn't include the summer competition variable. In other words, there's only a 1.24% chance that a summer tournament significantly changes the number of games a player participates in the following season, and by (mostly) coincidence it's the same p-value for the summer competition coefficient (note: to reiterate they are different statistics because one is a statistic between regression models and another looks at the standard error of a single coefficient in one model.) The level of significance is traditionally 5% or lower, while what's known as a "very high" level of significance is 1% or lower. To put that in other words, there is no strong evidence here that guys miss more games in the summer, but it is evidence.

One alternative explanation stems from the fact that the data are not randomly selected. If a player is injured, then that player is less likely to be used in the summer games. However, what that means is the data population is biased: the pool used is healthier than normal the season before as otherwise they wouldn't play. There are few ways to remedy the selection bias of the games. Ideally, you want random selection, but coaches opt for certain players instead, or the players take themselves out of the games. Andrew Bynum is a good example: a player with an extensive history of injury problems is less likely to attend the Olympics or FIBA play than otherwise, which distorts the data set in how players are selected. Also, using players from other countries who played in the Olympics invites other biases, and then you have to deal with digging through other leagues and trying to apply the same set of statistical assumptions to everyone.

Conclusion

In the real world, "proving" whether or not certain theories are true is a complicated process loaded with problematic assumptions and funky data. It is bad practice to point to numbers and proclaim, "This is true because this number is big, and this other number is small." Looking at the weak statistical evidence and how players are more likely to be healthy the season before the games than after because healthy players are more likely to be chosen, the hypothesis that summer games from the Olympics and the FIBA World Championship causes NBA players to miss significant time has not been substantiated from this method. A longer-term and more exhaustive study is needed to answer the question, but unfortunately there's always the problem that Olympic/Championship players are distinct from everyone else, making the analysis more elusive. One interesting fact, however, was that one year of aging subtracts 0.7 games played, which can be used as a rough estimate when needed.

But research never ends. Throwing together the most talented young American players on the same team and having them practice and compete with each other may have other results. In an upcoming small study I'll try to figure out if the summer games help the development of NBA players.