Thursday, October 10, 2013

Introducing: 1990's RAPM

Background for what RAPM is: +/- was a revolution for the NBA because it allowed a completely new method at evaluating players. You look at how a team scores and defends with you on the court and without you. When you set players as variables, you can use regression to calculate player impact. It's a full scope view of what matters in a game: outscoring your opponent. However, it's noisy for a number of reasons. One is that some player combinations are rare (this is known as collinearity.) Another is that the models don't deal well with players with low minutes, as they don't have enough of a sample for an accurate estimate and will often produce a ludicrous result just to "fit" the data better. 

In simple terms, RAPM deals with this by introducing a heavy dose of regression to the mean. While traditional adjusted +/- creates a model by minimizing error (the difference between the actual points per possession scored/allowed and the expected), RAPM also minimizes the coefficients in the model using a lambda term. The coefficients are reduced toward the "prior," which can be set as zero or as a set of prior values (like the previous season's result.) Players with few possessions/minutes will have results close to their priors because their sample size isn't big enough to prove to the model they're more or less valuable.

After some major work, I'm finally ready to display the '97 RAPM numbers. Given the nature of the data source, where only last names are used, making it tricky to figure out who's on the court for teams with players who share last names, it's been a labor intensive process. There are also a number of random errors that crop up. Regardless, the work has been fruitful, and I'll work on the seasons following 1997 when I can. The results are shown below.

*When you reference the spreadsheet, try to include the version number. This will reduce future discrepancies.

Back in the earlier days of the stat movement, there was something called the laugh test: if Shaq was not the top ranked player, something was wrong with your metric. Since this is the 90's, this is the Jordan test. I was relieved to find Michael Jordan near the top of the leaderboard, as well as other stars like MVP Malone third. But Christian Laettner and Terry Mills? Well, first of all non-prior informed RAPM often produces wacky results. For example, the fourth through sixth ranked players in the 2002 list (non-prior informed) from the popular RAPM site stats-for-the-nba.com were Eduardo Najera, Devean George, and Ryan Bowen. Laettner played every game for the Hawks, who were one of the best defensive teams in the league: so perhaps without a missed game there wasn't enough data to show his "true" defensive value.

Nevertheless, credit should be given to him and the forgotten Terry Mills, a beefy three-point shooting power forward. He was a bench player on a good team yet had the 12th best net +/- rating, according to stats.NBA.com. Putting up a great raw +/- can come by accident by hitching yourself to better players, only playing when they do; but he came off the bench for a playoff team and has a raw rating nearly two times as great as anyone else on the team. There's something to be said about Terry Mills' value in '97.

Another surprising result was Bo Outlaw's outstanding defensive +/-. It was the best rating a hair ahead of DPOTY Mutombo (retroactively justified here) with defensive legends Mourning and Ewing not too far behind. Even at 21 years-old, Garnett shines once again in +/-, just as he did 16 years later in the 2013 season. As for offense, Jordan, Pippen and Mookie Blaylock, and of course Terry Mills, are near the top, but MVP Malone takes the mantle. As for an underrated player RAPM likes, Hornacek is a close sixth. The sweet-shooting guard had one of the best jump shots in the game and could pass well for an outside shooter. He's also one of the most accepting of advanced stats for former players and coaches.

In the future, I'll work on the 1998 season, as well as creating a statistical +/- prior to test out for this season and other fun items. (Since play-by-play data is unavailable before '97, the results using pure +/- are limiting. Thus, a model that creates priors for every player should provide more reliable results, instead of blindly regressing every player toward 0. Players who only play around 50 minutes will be rated as 0's in RAPM, even though they're probably a lot worse, and the model shouldn't assume the prior for a guy like Michael Jordan is 0.)

Click here for the link to the spreadsheet.

Edit: On updating the possession/lineup parser and adding missing games, some of the values here changed. 

4 comments:

  1. Very cool. Would be nice to see these results with some sort of statistical plus/minus as the prior.

    ReplyDelete
  2. MOOKIE!!!! I knew it:

    http://www.basketballprospectus.com/unfiltered/?p=975

    ReplyDelete
  3. Tremendous, tremendous work. Looking forward to the dataset with some prior of your choice, and more iterations for future seasons. Thanks for your research!

    ReplyDelete
  4. Great work. When will the RAPM's of 1998 and later seasons come out? Looking forward to that and interested to see if you will use a prior for those RAPM's.

    ReplyDelete