What Are These Stats?

Plus Minus Pic

So, gotbuckets.com provides 2-year Adjusted Plus-Minus (2-year APM) and regularized APM (RAPM) data Why, you ask?  A little intro…

Let’s start by acknowledging that gotbuckets.com lacks groundbreaking new statistics.  APM and Regularized Adjusted Plus Minus (RAPM) have been in the public conscience for over a decade.  A number of sources played a part in developing the methods we use here, including: Wayne Winston, Jeff Sagarin, Dan Rosenbaum, Joe Sill, Aaron Barzilai, Jeremias Engelmann, and Eli Witus.  They are all smarter than we are.

The creators of this blog are writers at other team-centric basketball blogs though and wanted to have a place to write about the NBA at-large.  APM and RAPM data somewhat dried up, with Mr. Barzalai’s site not being updated, and Mr. Englemann’s only refreshing periodically.  Combining these two factors, we calculated and provide 2-year APM, and RAPM (courtesy of @talkingpractice), and intend to supplement it with NBA writing; sometimes APM / RAPM-centric, other times, not so much.

So what benefits can 2-year APM provide?

Every game ends with one team scoring more points than the other; being that team is the goal of every contest.  A myriad of useful box-score derived stats succinctly measure player performance in wins: Extra Wins Added, Win Shares, Wins Above Replacement Player, Wins Produced, etc.  But ultimately, none originates directly from scoreboard impact.

Plus-minus derivations help in that regard, and inform a well-rounded understanding of contributions to a team’s bottom line.  Everyone understands raw Plus/Minus.   It simply determines how much a player’s team won or lost by while they played; a guy played 30 minutes and his team outscored the opponent 62 to 58 during that time, then he finishes +4.  Easy stuff, but unfortunately it doesn’t account for a player’s teammates or opponents.  Prone to all sorts of randomness, bad calls from referees, an opponent blazing through a hot stretch, excellent teammates, etc, can all impact a player’s Plus/Minus.

As an evolution, along came Adjusted Plus Minus.  Derived from work originally performed by Winston and Sagarin , APM assesses the plus-minus outcome of tens of thousands of annual lineup matchups via a regression model.  Every line-up matchup is an equation and each player a variable.  Home court advantage is accounted for, and the regression calculates a per possession value for every player in the NBA.  Every line-up matchup / equation contains two known values and ten variables.  The ten variables are the players on the court.  The known values are the average per possession home court advantage in the NBA and the scoring differential between the two line-ups (extrapolated to 100 possessions; if Lineup X bested Lineup Y by three points during their ten possessions on court together, then the Margin equals 30).  If the home lineup beat the away lineup by an extrapolated 30 points per 100 possessions, and the average home-court advantage is three points per 100 possessions, and X represents a home player; and Y an away player, then the equation is:

30 = 3 + X1 + X2 + X3 + X4 + X5 – Y1 – Y2 – Y3 – Y4 – Y5

Utilizing tens of thousands of these equations and statistical computation software “R”, a regression is run that calculates a value for each variable (player) surpassing a specified minimum playing time threshold.  The regression weights each equation by number of possessions, and  the value derived is the player’s approximate worth to his team per 100 possessions, compared to an average player.

The idea is explained in some detail here, by Rosenbaum , however gotbuckets.com diverges from his method in some regards.

  • The calculation that we provide is “pure” 2-year APM.  Anything after Section II.1 of the Rosenbaum article doesn’t apply to gotbuckets.  So, no “Statistical Plus-Minus” or “Overall Plus Minus”.
  • All of the APM regressions are two year regressions, but no weighting is applied to make the most recent season more valuable than the prior.
  • No adjustments have been made for “garbage time” or “clutch time”.  Home-court advantage serves as the only number in each equation that isn’t a direct outcome of the actual line-up match-up.
  • Reference Players constitute players that played less than 750 minutes over the two seasons.  This group is lumped together as one.  This prevents extremely erratic results caused by small sample size.
  • Playoff results are included, though also not weighted.
  • The calculation is split into offensive and defensive components, using a method described in an article by Eli Witus.
  • In addition to those sub-numbers, plus their total summed 2-year APM, each APM Page includes the number of minutes each player played during those two seasons, as well as a gotbuckets derived wins equivalent, SWAg, and the standard deviation associated with the APM.

So, that is the “cliff’s notes” version of 2-year APM and some specifics of the data currently at gotbuckets.  The primary allure of 2-year APM is that it provides an unbiased look at player performance, and the intent of the process described above is to limit any biases.  Scoreboard result in; APM out…no coefficients correlating the value of assists, rebounds, steals, etc to a number of wins.

Weaknesses include a lack consistency from season to season; some players vary wildly over a few years, hence some issues as a poor predictive model.  The stat is “noisy”, caused to some extent by collinearity.  Some players see almost all their on-court time with another player, and the model cannot differentiate who deserves credit for the team’s performance.  The small sample of possessions available when those players do not share the court dominate their results. As is typical, gotbuckets provides standard deviations with the APM.  With relatively large standard deviations, several players overlap within one standard deviation.  Particularly in small sample sizes, this makes differentiating similarly ranked players somewhat foolhardy. Gotbuckets attempts to account for this by providing two-year regressions.

Long story short: APM is imperfect, but proves informative as a counter-balance to the box-score based stats, particularly for players making their impact at the defensive end.  For additional insights into APM’s usefulness, check out this handy analysis.

One statistical method known to improve on raw APM is regularization, particularly ridge regression and/or Elastic Net regression.  Standard Elastic Net regressions penalize any coefficients that are far away from 0. An alternative possibility is to use a Bayesian interpretation of this method, by providing the model an “a priori” (prior) estimate of the value for each player, and letting the model penalize coefficients that are far away from that prior.  It is this Bayesian interpretation that is the basis for the RAPM model presented here, courtesy of @talkingpractice.  The benefit of this methodology is that it reduces the error in the estimates, leads to more narrow confidence intervals, and removes the problem of regressing low MP players to 0 (ie to league average).  Players with lots of on-court time move more capably from the assigned prior outcome, while low minute guys tend nearer the starting point.  The combination of these effects makes RAPM substantially more predictive out of sample than standard APM.

The RAPM values currently shown at gotbuckets.com (provided by @talkingpractice) are created as follows:

  • The base “a priori” value is the previous season’s RAPM value for each player.
  • To do the Bayesian “progression” between years, an initial seed was created using data from the 2005-2006 NBA season.  The seed value used for each player is their Individual Player Value (IPV) from that season.  IPV is a SPM-informed RAPM model created by @talkingpractice, using a random forest regression (FORPM) and gradient boosting to improve out of sample predictive accuracy.  This choice was made so that the initial seasons of the progression are tight and not overly dependent on NPI (non prior-informed) RAPM.
  • Reversion to the mean of 15% is used between seasons.  Mean reversion is a universal phenomenon, and absolutely improves predictive accuracy in future seasons.
  • From season to season, an aging curve is applied to each player, calculated as: (27 – Age)* 0.06. This aging factor is added or subtracted to the prior RAPM between seasons.
  • To improve the use of priors for players without previous RAPM values, the list of rookies are subset into two types.  Traditional rookies with no professional basketball experience (“ROOKS”) receive a prior of -1.2 on offense and -1.2 on defense.  Rookies with substantial international experience (“XROOKS”) receive a prior of -0.6 on offense and -0.6 on defense.
  • For veteran NBA players who missed the entire prior season due to injury (Derrick Rose, Andrew Bynum and Channing Frye for 2012 – 2013), the prior for their return season is set at zero on both sides of the ball.
  • Data quality is as important, if not more important, than methodology.  The proprietary datasets created by @talkingpractice are absolutely state of the art, parsing data directly from the pbp files at our source, and then cross-checking with 2 independent sources.  Any discrepancies are handled via video review.

RAPM informed RAPM provides a significant improvement in out of sample predictive power compared to standard 2-year APM (link to come), but is obviously not perfect.  Issues include the effect of forcing low-minute players towards their pre-assumed prior value (which in some cases may result in high minute players on bad teams looking much worse than their teammates), and that players who experience a somewhat discontinuous leap in their ability may get ‘stuck’ in their prior more than would be optimal.

Also, gotbuckets.com created a Wins Equivalents based on 2-year APM and RAPM. In an effort to improve one primary flaw with APM, namely, that small samples can result in questionable results, gotBuckets.com offers this new number derived from a player’s 2-year APM and his minutes.  We call it SWAg, short for Summed Wins de-Aggregated.  SWAg takes every victory within a 2-year timeframe, including playoffs, and apportions those between the players.  Weighting the APM by playing time to create SWAg helps push potentially fluky low minute players further down the rankings, and provides a list that better represents player value over the two season timeframe.  Set relative to the value of the sub-750 minute guys that don’t qualify for 2-year APM, SWAg provides positive and negative numbers; if a guy has negative SWAg, his team may be better off replacing him with someone from the unnamed, sub-750 minute masses.

To calculate SWAg, gotBuckets starts with the total number of wins in two NBA seasons; typically 2580, not accounting for strike shortened years.  From there, a subset of those wins are attributed to the sub-750 minute players that do not qualify for 2-year APM.  Per 100 possessions, this group rates as -6.34 points, or approximately 12 wins in 82 games.  In a two year period, as an example, they may tally 30 wins, leaving 2550 wins for the guys that qualified for APM.  Each player that qualifies for APM (i.e. played 750 minutes in two seasons) is assessed a minute-weighted APM relative to the typical sub-750 minute player.  For example, someone who played 1000 minutes with a +3.66 APM would have a minute-weighted value of 10,000, via a calculation of (3.66 – (-6.34)) * 1000).  This value is summed for every qualified player.  The total number of wins attributed to the qualified set is divided by that summed number, to determine Wins per minute-weighted APM.  The resulting number is then multiplied by each individual minute-weighted APM, to determine the amount of wins credited to each player during the two years.  The sum of individual wins equals the actual number of games played during those two NBA seasons.  Here is an example to illustrate:

Over two seasons, Victor Winnington plays 5000 minutes with an APM of +4.  His minute-weighted APM relative to the sub-750 minute player equals:

4 – (-6.34) = 10.34 times 5000 = 51700.

In this hypothetical example, summing this calculation for every qualified player in the league adds up to 7,750,000.

During those two years, there are 2580 games.  For the sake of simplifying the example, let’s skip the calculation and say 30 wins are thanks to the sub-750 minute players.  That leaves 2550 victories for the guys qualified for APM.  Dividing the 2550 wins by 7,750,000 provides a value of 0.000328 wins per minute-weighted APM.

Bringing everything full circle, Victor Winnington gets credit for 17 SWAg during those two seasons, or 51700 multiplied by 0.000328.

We at gotBuckets think that SWAg are pretty nifty, providing a more reasonable ranking of player usefulness over the two seasons.  We like it so much, that using a similar method, gotbuckets created an RAPM wins equivalent, too.  We call that SWAgR (Summed Wins de-Aggregated – Regularized version)

So, there is a brief description of what we’re doing.  As part of a balanced player evaluation process, we look at box-score derived stats and scoreboard derived stats.  Come back frequently for writing that is analytics based, or not, and focused on APM / RAPM, or not.

Hopefully this site forms a useful new component of your basketball fandom. If you like what we’ve started, need a hobby, and are interested in contributing with number-crunching / programming or writing, please contact me at hetrick46@gmail.com.


on “What Are These Stats?
4 Comments on “What Are These Stats?
  1. Pingback: A Pu Pu Platter of Lack: A C:TB Roundtable « Cavs: The Blog

  2. Pingback: Trey Burke, Plus/Minus Rookie Leader Salt City Hoops

  3. Pingback: On Gordon Hayward’s Role Change Salt City Hoops

  4. Pingback: Steals, Regularized Adjusted Plus-Minus, and Ignoring Position | Georgetown Sports Analysis, Business, and Research Group

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *