Wednesday, September 18, 2013

Initial Plus/Minus Numbers for the '97 Season

I've been silent for a few weeks, working on a bigger project, and the fruits of that labor are appearing. A few months ago, the NBA released play-by-play dating going back to the 1996-97 season, allowing for the possibility of more advanced basketball stats beyond the typical box score counting stats. NBA.com already lists detailed player shooting data (percentages and totals by distance), but the Holy Grail was plus/minus -- it had become the most respected advanced stat out there, beating out other metrics in future, out of sample predictions, especially even further into the future when more guys move from team to team.

When no one stepped up to the plate to break down the play-by-play data, I decided to do this myself. The problem was the data only included last names, making it difficult to discern exactly who was on the court. However, home and away teams have separate columns, so the problem is reduced to only teams where two players have the same name. There are 6 such teams: Portland with Rumeal and Cliff Robinson; New York with Buck and Herb Williams; New Jersey with another set of Williams's, in Jayson and Reggie; Phoenix with Mike and Chucky Brown (no, not that Mike Brown); Denver with LaPhonso and Dale Ellis, as well as Brooks and LaSalle Thompson; and Indiana with Dale and Antonio Davis. Fortunately, the Brown's never played in the same game, so they were easily separated. The rest of the player pairs were successfully separated with the exception of the Davis brothers, who are similar anyway, and the Ellis's, who remain the largest problem. For the matchup file, I combined the pairs as one player; and as such, the initial results below should be taken with a grain of salt, though mostly for players on those two teams (Denver and Indiana.)

Without further ado, here's a table of the top players for 1996-96 by adjusted plus/minus including the playoffs with a minutes cut-off of 250:
Rank
Player....................
Adj. +/-
St. Err.
Minutes
PER
WP/48
1
Mookie Blaylock
15.79
7.01
3056
20.4
0.197
2
Tim Hardaway
15.64
6.79
3136
20.8
0.198
3
Michael Jordan
13.92
5.48
3106
27.8
0.283
4
Latrell Sprewell
13.13
6.79
3353
19.7
0.115
5
Patrick Ewing
13.08
5.78
2887
21.3
0.163
6
Terry Mills
12.92
5.82
1997
16.4
0.148
7
Hakeem Olajuwon
12.81
6.46
2852
22.7
0.154
8
Greg Anthony
12.81
7.75
1863
16.6
0.090
9
Kevin Garnett
12.49
5.87
2995
18.2
0.116
10
John Stockton
12.07
16.05
2896
22.1
0.226
11
Tyrone Hill
11.79
6.82
2582
17.8
0.184
12
Mitch Richmond
10.94
6.38
3125
21.6
0.166
13
Stanley Roberts
10.80
8.94
378
14.1
0.060
14
Melvin Booker
10.65
9.31
430
8.8
0.014
15
Shaquille O'Neal
10.62
5.88
1941
27.1
0.197
16
Gary Payton
10.52
6.74
3213
21.8
0.193
17
Nate McMillan
10.28
6.41
798
14.1
0.158
18
Alonzo Mourning
9.87
5.38
2320
21.6
0.174
19
Litterial Green
9.77
10.10
311
13.5
0.119
20
Jerome Kersey
9.70
4.95
1766
12.3
0.102

*Per 200 possessions (roughly a full game)
**This is the entire '97 season plus the playoffs with the exception of 11 missing games.
***Players with under 250 minutes were combined into one variable. That coefficient, by the way, was -5.94 with a st. error of 7.14.
****Playoff possessions are weighted twice as much (i.e. they're twice as important as regular season ones.)

The important thing to note about adjusted plus/minus data is that the estimates are not precise: there are usually huge ranges for the predictions. The standard errors for the guys in the top 20 table are around 5 to 8, meaning Ewing, for example, isn't significantly "better" than Mourning (by adj. +/-.) With such high variation, what use are these results? For one, it's more evidence to use in evaluation of historical (and even some current) players. One year plus/minus is a little wacky, but once you're armed with a few years of data and better techniques like ridge regression you can find patterns and judge which players have consistently high, or mediocre, impact. For wacky results, you may have noticed three non-entities in the top 20: Stanley Roberts (Shaq's former teammate from LSU), Melvin Booker, and Litterial Green. That's pretty normal in one year adjusted plus/minus, as it's the biggest weakness (low minute guys.) As a sanity check, homecourt advantage was calculated as +3.29.

If you're wondering how a certain star ranked, I put the most notable guys in the table below:
Rank
Player....................
Adj. +/-
St. Err.
Minutes
PER
WP/48
22
Christian Laettner
9.36
6.38
3140
19.1
0.177
23
Scottie Pippen
9.28
5.42
3095
21.3
0.203
25
Horace Grant
8.82
5.41
2496
17.3
0.148
27
Kendall Gill
7.69
5.75
3199
19.6
0.132
31
Vlade Divac
7.44
6.68
2840
17.9
0.123
33
Chris Webber
7.12
5.77
2806
21.8
0.159
37
Hersey Hawkins
6.68
5.95
2755
17.6
0.190
47
Detlef Schrempf
5.56
5.43
2192
18.3
0.174
48
Clyde Drexler
5.54
5.94
2271
19.9
0.172
51
Rasheed Wallace
5.38
5.63
1892
18.4
0.163
52
Anfernee Hardaway
5.15
5.54
2221
21.4
0.175
54
Jeff Hornacek
5.06
7.58
2592
18.8
0.190
55
Sam Cassell
4.98
5.40
1714
18.4
0.108
56
Jason Kidd
4.91
4.89
1964
16.9
0.107
58
Karl Malone
4.76
8.60
2998
28.9
0.268
70
Derrick Coleman
4.33
5.63
2102
17.3
0.076
77
Kenny Anderson
3.94
6.73
3081
19.5
0.193
79
Toni Kukoc
3.90
4.91
1610
20.2
0.204
80
Reggie Miller
3.76
6.89
2966
20.2
0.200
82
Dikembe Mutombo
3.54
6.76
2973
19.0
0.183
94
Kevin Johnson
3.22
3.22
2658
22.9
0.211
95
Eddie Jones
3.07
3.07
2998
17.3
0.154
100
Allen Iverson
2.94
2.94
3045
18.0
0.065
108
Grant Hill
2.32
2.32
3147
25.5
0.223
113
Vin Baker
2.23
2.23
3159
20.1
0.127
122
Arvydas Sabonis
1.97
1.97
1762
21.8
0.205
141
Kobe Bryant
0.27
5.24
1103
14.4
0.079
160
Dennis Rodman
-0.76
5.20
1947
13.9
0.148
173
Shawn Kemp
-1.48
5.64
2750
20.7
0.174
197
Anthony Mason
-2.30
6.52
3143
18.9
0.173
214
Rik Smits
-2.93
5.24
1518
18.3
0.105
222
Damon Stoudamire
-3.22
6.59
3311
18.1
0.110
252
Terrell Brandon
-4.84
6.58
2868
21.5
0.181
256
Chris Mullin
-5.24
5.82
2733
17.6
0.124
273
Rony Seikaly
-6.49
5.66
2615
18.3
0.125
275
Tom Gugliotta
-6.52
6.37
3131
19.0
0.103
284
Dominique Wilkins
-6.96
5.43
1945
19.6
0.083
288
Rod Strickland
-7.21
9.72
2997
19.7
0.141
324
Shareef Abdur-Rahim
-10.45
5.88
2802
17.4
0.049

Non-traditional stars like Hornacek and Pippen rate well by this metric. It's disappointing to see Grant Hill, whose prime was cut short, and Sabonis with just decent numbers. I was hoping they'd have monster impact. It's interesting to see who doesn't fare well, however -- the old star Wilkins is a net negative, Kemp is as well, and a young Shareef Abdur-Rahim, often called a guy who put up good stats on a bad team, brings up the rear. I am surprised, however, by Dennis Rodman's low ranking. This is Rodman the Rebounder on his second title run, although his numbers weren't great in the playoffs. It's shocking to see Malone so low given his MVP, but both he and Stockton has high standard errors, suggesting there were problems untangling the two guys since they both never missed a game.

Remember that one year adjusted plus/minus stats are volatile. It's normal to see a guy in his prime with a terrible +/- one season and then rebound with an excellent one next season. If the roster rotations were rigid, it's impossible for this method to pick up on which guy actually deserves the credit on the court. With another year of data behind it, the plus/minus stats (prior-informed with RAPM or two year adj. +/-) will improve by a large margin. RAPM deals better with low minute guys, which can change some of the numbers completely like a really complicated maze of dominoes.

And why did I set a minutes requirement of 250 for the tables? The weakness of plain adjusted plus/minus is that it dumbly guesses absurd estimates with guys who have few minutes. For example, without a minutes cutoff you get results like Evric Gray with an earth-shattering +27.2 ... in 42 minutes all season. Gary Grant and Jack Haley were another two guys ranked above legends like Olajuwon, though David Robinson was +14.6 in 147 minutes. Since they have so few possessions, these low minutes guys are basically play-doh to fill in any cracks to minimize the squared error. This is where ridge-regression (RAPM) excels: a heavy dose of regression to the mean (or prior.) If you're going to have a high rating, you'd better prove it with many possessions or a high rating the previous year.

I'll clean up the data more and tackle RAPM further, but for now some initial results were worth posting.

Edit: I used Rosembaum's preferred minutes cutoff of 250. Before I was using something really low.
Second edit: found some problems with how the name for the player pairs were being assigned, plus I eliminated all but two player pairs (hence the label "initial" results.) As a result the numbers are completely revamped.

No comments:

Post a Comment