After using DVOA for a few months, we came across a strange phenomenon: well-regarded players, particularly those known for their durability, had DVOA ratings that came out around average. The reason is that DVOA, by virtue of being a percentage or rate statistic, doesn’t take into account the cumulative value of having a player producing at a league-average level over the course of an above-average number of plays. By definition, an average level of performance is better than that provided by half of the league and the ability to maintain that level of performance while carrying a heavy workload is very valuable indeed. In addition, a player who is involved in a high number of plays can draw the defense’s attention away from other parts of the offense, and, if that player is a running back, he can take time off the clock with repeated runs.
Let’s say you have a running back who carries the ball 300 times in a season. What would happen if you were to remove this player from his team’s offense? What would happen to those 300 plays? Those plays don’t disappear with the player, though some might be lost to the defense because of the associated loss of first downs. Rather those plays would have to be distributed among the remaining players in the offense, with the bulk of them being given to a replacement running back. This is where we arrive at the concept of replacement level, borrowed from our partners at Baseball Prospectus. When a player is removed from an offense, he is usually not replaced by a player of similar ability. Nearly every starting player in the NFL is a starter because he is better than the alternative. Those 300 plays will typically be given to a significantly worse player, someone who is the backup because he doesn’t have as much experience and/or talent. A player’s true value can then be measured by the level of performance he provides above that replacement level baseline, totaled over all of his run or pass attempts.
Of course, the real replacement player is different for each team in the NFL. In 2013, the second-string running back in Minnesota (Jerick McKinnon) had a higher DVOA than the alleged starter (Matt Asiata). Sometimes a player like Ryan Grant or Danny Woodhead will be cut by one team and turn into a star for another. On other teams, the drop from the starter to the backup can be even greater than the general drop to replacement level. (The Indianapolis Colts of 2011--the dark year between the Manning and Luck eras--will be the hallmark example of this until the end of time.) The choice to start an inferior player or to employ a sub-replacement level backup, however, falls to the team, not the starter being evaluated. Thus, we generalize replacement level for the league as a whole, as the ultimate goal is to evaluate players independent of the quality of their teammates.
Our estimates of replacement level were re-done during the 2008 season and are computed differently for each position. For quarterbacks, we analyzed situations where two or more quarterbacks had played meaningful snaps for a team in the same season, then compared the overall DVOA of the original starters to the overall DVOA of the replacements. We did not include situations where the backup was actually a top prospect waiting his turn on the bench, since a first-round pick is by no means a "replacement-level" player.
At other positions, there is no easy way to separate players into "starters" and "replacements," since unlike at quarterback, being the starter doesn't make you the only guy who gets in the game. Instead, we used a simpler method, ranking players at each position in each season by attempts. The players who made up the final 10 percent of passes or runs were split out as "replacement players" and then compared to the players making up the other 90 percent of plays at that position. This took care of the fact that not every non-starter at running back or wide receiver is a freely available talent. (Think of Jonathan Stewart or Randall Cobb, for example.)
As noted earlier, the challenge of any new stat is to present it on a scale that’s meaningful to those attempting to use it. Saying that Tony Romo's passes were worth 40 success value points over replacement in 2014 has very little value without a context to tell us if 40 is good total or a bad one. Therefore, we translate these success values into a number called "Defense-adjusted Yards Above Replacement, or DYAR. Thus, Romo was fifth among quarterbacks with 1,187 passing DYAR. It is our estimate that a generic replacement-level quarterback, throwing in the same situations as Romo, would have been worth 1,187 fewer yards. Note that this doesn’t mean the replacement level quarterback would have gained exactly 1,187 fewer yards. First downs, touchdowns, and turnovers all have an estimated yardage value in this system, so what we are saying is that a generic replacement-level quarterback would have fewer yards and touchdowns (and more turnovers) that would total up to be equivalent to the value of 1,187 yards.
(Note: Prior to the 2008 season, DYAR was translated in terms of points rather than yardage, and old articles will refer to these stats as "DPAR" instead.)