Tuesday, May 22, 2012

Best Fans: Most of the Job Is Showing Up


The court is a stage and all the world watches. Without fans the NBA would be nothing but a series of games whose reward was simple bragging rights, and unwatched the league would have little meaning. The fans are treasured, but which ones are the best? Is there one city that skies above the rest in supporting its team? The question is pertinent because every year there's a rumor about a certain organization pulling up its stakes and leaving, and one of the reasons cited is a "small-market" where there aren't enough fans to fill the stadium. There are, however, many different factors at play, and looking at the question from multiple angles one can have a better educated guess at which fanbase is the best.


The primary determinant for best fanbase is how well the stadium is filled, or total attendance for a year divided by capacity. Obviously, there are various factors that cause attendance to sway, and as a result a nonlinear multivariate regression model was built using team data from the seasons 2000 to 2012. The basic form is a concave asymptotic function: Attendance/Capcity = 1 - C*exp(alpha1*x1 + alpha2*x2 ... ) where as shown in the final results the alpha coefficients are negative. Attendance is taken from basketball-reference.com and capacity from various websites for each stadium. Standing room only was ignored, as well as two seasons of the hybrid New Orleans/Oklahoma City Hornets -- those two weird seasons are the definition of an outlier -- and three from the Charlotte Hornets because its obscenely large stadium with a capacity of 24,000 plus skewed the results.

The first and most important independent variable is the win-loss record. The incarnation of this that worked the best was a two-year average of a win/loss ratio. This makes sense because doing well in the previous season will drive fans to the games while winning during the current season keeps them there. The ratio was also transformed by the exponent 7. The effect is that a jump from a win percentage of 20 to 30 barely affects attendance while going from 50 to 60 has a large effect.

Appearance in the playoffs the previous year was a significant dummy variable (dummy meaning it's either a 0 or a 1.) Having a high win percentage will cause a team to hit the playoffs and that's included in the model, but even then simply making the playoffs is significant; that's important finding. The effect is fairly sizable too as the dummy variable gives an 18% "boost" to the attendance equation holding everything else equal.

City population, which is actually based on the metropolitan statistical area, was statistically significant as well. The data come from the US and Canadian census, and every year between the census date is the government's estimate (2012 though was my own estimate using the previous growth rate for a simple linear extrapolation.) The population is divided by one million and then squared in the model. For Los Angeles, the population was cut in half and given to each team. New York's metro area includes the New Jersey Nets' home turf; consequently the metro area was broken down into New York-White Plains-Wayne, NY-NY and Nassau-Suffolk, NY for the Knicks, and Edison-New Brunswick, NJ and Newark-Union, NJ-PA for the Nets.

For time-rate of growth in the NBA, the years from 1999 was used. The years were then transformed by ln(Year). The ln(Year) effect is a slow growth rate. As the league gains more fans, the stadiums have been filling up more often.

Related to wins is a the simple rating system (SRS) from basketball-reference; it's basically the point differential from the current season adjusted for strength of schedule. Even though win percentage and point differentials are highly correlated, both were statistically significant when used together, suggesting that SRS is effective at knowing when a team is dominant or not and that fans do react positively.

The variables that were not significant when used in the regression model include but are not limited to: pace, offensive efficiency (used with win% it shows whether or not fans prefer offensive teams),  stadium capacity (it can adjust for teams with large or small stadiums), previous season champions, dummy variable for current season MVP, and number of all-star starters.


The final model is shown below with coefficients in the table. The regression was nonlinear in Minitab and not a simplified linear transformed model. The standard error was 0.0655 with 383 data points and the standard error/mean of the fits 0.0758; it's a pretty good fit though of course not perfect. If you're math averse don't worry because I'm just showing the results in case anyone wants to check or use them.

Attendance/stadium capacity = 1 - C*exp[ alpha1*(2-year Wins/Games)^7 + alpha2*( Playoffs previous season) + alpha3* (City pop./10^6)^2  + alpha4* Ln(Year-1999) + alpha5*SRS ]

Standard error
(2-year Wins/Games)^7
Playoffs previous season
(City pop./10^6)^2

What you can take from this is what's important: wins, wins and wins. The presence of an MVP or all-star starters on your team was not statistically significant in the final model. In other versions it was, but that's because of how correlated they are with wins. MVPs, famously, are handed out to "best players on the best team" rather than just best player, and even all-stars heavily come from high win-total teams. Of course, having those players will drive up your win percentage, so the lesson here is that don't rely on stars to fill seats; concentrate on winning. If you do, you'll pick up all-stars on the way to building a good team. However, if you instead pick up big name's who don't contribute to wins like, say, late-era Iverson, you'll find yourself with frustrated fans rather than stadiums filled to capacity.

Looking closer at population totals, it's amazing how small some metro areas like Salt Lake City are compared to ones like the New York metropolis. I'd also like to point out that a metropolitan statistical area is not a perfect population number for a team. For instance, some cities are sprawling and a larger portion will be further away from the stadium then in more compact cities. Stadiums (or stadia if you think using Latian plural forms makes you smarter) also aren't all ideally placed at the exact center of the city. The Warriors are in Oakland, though there's no real "center" to the metro Bay area, and the infamous Palace in Detroit is placed haphazardly.

With the model built, now the focus can be on who the best fanbase is in the league. The simplest way, and simple methods are usually the most fruitful, is to see which teams out-perform the model results. If, say, Portland truly has great fans, then the model should be consistently underrating its yearly attendance-capacity number. For the uninitiated, the actual result minus the predicted is known as the residual. Also, the standard error is used in the rankings to, well, standardize the results. A score of +/-1 means the residuals are on average a standard error from zero and comfortably different than average. A score of +/-2 is pretty significant, and +/-3 is almost assuredly an error or weird outlier.

The numbers on the left show that the best fanbase in the NBA according to this method is the Oklahoma City Thunder. Part is this is the expansion factor where the fans are excited for a new team and will fill the stadium even when the team was bad (and boy were they bad in Durant's first couple years.) They've also only had four seasons as the Thunder in Oklahoma; take it with a grain of salt. In fact, the last couple years with high win totals have come with smaller residuals but still positive. They are, however, a rabid fanbase with a small metro population.

Speaking of small populations, Salt Lake City is the smallest one with a 2011 estimated population of 1.146 million while New Orleans and Oklahoma City were next at 1.191 and 1.278 million, respectively. They are also in my view the "true" best fanbase because they've had strong numbers throughout an entire decade with major personnel changes and fluctuations in wins.

Toronto is surprisingly third, but this is mostly from the first half of the decade when the organization had that new team smell. The last couple years after Bosh they've had negative residuals, but they're still an international city capable of showing Americans how to support a team and in some cases Asian-American point guards.

The rest of the top ten is congruent with conventional wisdom with one exception I'll get to last. Bulls' fans are, well, Bulls' fans. Jordan left that city with a love of basketball. Seattle was known as a great basketball city, and they came out to support the team even though stadium financing and an evil owner decided to move the team to another great fanbase. Another Pacific Northwest city -- and the last one remaining in the "northwest" division -- Portland fans are also known for their love of their team, the Blazers. The Warriors, likewise, are die-hard fans even with terrible teams and a shaky organization; some will argue they deserve the crown of best fans over the Jazz. Although the love has waned in the past few years, Kings' fans are also great, and they're one of the many teams without a championship because of the Lakers' gluttony. Even with all the success and people, the purple and gold fans are fairly excellent. But the exception? The Clippers who even without the last couple years of Blake Griffin still have good results. I think this is the explanation: Los Angeles is known for its transplants, and the Lakers are hated in many different cities, meaning NBA fans who moved into the area are forced to become Clippers' fans (the Bill Simmons effect.)

Conversely, the worst fanbase strictly according to the method is the New Jersey Nets, who apparently lost interest when they heard the team was going to be moved -- though other relocated teams fared better. The Nets were also pretty terrible at filling the stadium even when they were having finals appearances. No doubt the weird location of the stadium didn't help, but they're so far from the second worst it's about more than that. Sure, having the Knicks next door didn't help, but it's not like they were perennial contenders and who wants to drive in New York traffic? Also note that I only gave them the population of parts of northern New Jersey instead of the entire metro area.

Hawks fans can rejoice because next year they'll be the worst fanbase unless the 76ers step up their game. The rest of the bottom-feeders aren't surprising -- Pistons, Timberwolves, and the Rockets. The Pacers don't seem to fit. The Malice at the Palace is one source for blame, but the fans weren't showing up a couple years prior to the incident. I'm a little surprised the Spurs and Knicks are ranked in the middle (seems too low), but it's not like the Spurs' fanbase immediately conjures the term "rabid and energetic" in your mind, and the Knicks are still reeling from the Isiah Thomas era as well as having to serve a population that's probably too big for one team (the Brooklyn Nets should fix that.)


Although it's not a perfect model, the regression shows an interesting and mostly logical list of the best fanbases. A regression model isn't perfect because it can only use the information you feed it, and variables are correlated rather than necessarily causative. There are a whole host of reasons why people attend basketball games, but it's safe to say using a few of the most important ones a decent and objective list can be made.Ranking teams in terms of who fills up their stadiums and adjusting for various factors, Oklahoma City has supported its team well in limited seasons, and Salt Lake City has embraced the Jazz over a longer time-frame. New Jersey won't notice the Nets leaving, as they barely filled up the stadium they had. The best fanbase argument, however, is far from over -- I'm providing some useful information to use -- and cheer as loud and as often as you want, but the bottom-line is that fans in the seats is what matters.


The graph below shows the residuals graphed with respect to 2-year win averages. Basically, it's showing that the model fits it pretty well because there is not pattern of the residuals; they're (mostly) randomly scattered around the 0 residual line. There's a bit of a problem once you near a 80 win percentage, and as such I wouldn't trust one-year model results. Over a decade, however, you get to see seasons that differ.

No comments:

Post a Comment