eWORL vs. WAR
Wins over Replacement Level vs. Baseball-Reference's WAR
In a separate article, I undertook a systematic comparison of Player won-lost records to WAR - as measured by either Baseball-Reference.com or Fangraphs.com - and showed why Player won-lost records are superior measures of player value, based on an analysis of how these value measures correlate to actual team wins at the team level. This article is based on an earlier article which I wrote of the same name, but has been revised to build off of the aforementioned article, in an effort to highlight some of the differences between Player won-lost records and Baseball-Reference's version of WAR (although the analysis here generally applies to Fangraphs' WAR as well) through some player comparisons.
Using Baseball-Reference's Play Index tool, I calculated career WAR values for players amassed over the seasons from 1949 through 2013 (these were all of the seasons for which I had estimated player won-lost records for every game when I originally performed this analysis in early-to-mid 2014).*
*Baseball-Reference's Play Index treats pitcher WAR and non-pitcher WAR separately. Because of this, the WAR values for pitchers presented by the Play Index exclude pitcher hitting. I have tried to add back in these values for pitchers who are mentioned by name in this article.
Putting Player won-lost records on the same scale as bWAR
The first thing to do in order to make a comparison with BB-Ref's WAR (bWAR hereafter) is to figure out what specifically to compare to bWAR.
I calculate two sets of player won-lost records: pWins and eWins. The former of these, pWins, are context-dependent, and are constructed such that the sum of pWins by all players on a team are exactly equal to the number of games played by a team plus the number of team wins, by construction. The latter, eWins, on the other hand, are calculated adjusting for context, assuming a player played on an average team with average teammates. The Contextual Factors relating pWins and eWins are described in detail in a separate article.
Baseball-Reference's bWAR measure is generally calculated independent of context (with one exception, which is discussed below). It therefore is more comparable to eWins than to pWins. Because of this, my comparison statistic is constructed based on eWins.
In order to facilitate these player comparisons, I think it is probably helpful to be able to place Player won-lost records on the same scale as bWAR (and fWAR).
I calculate a measure of Wins over Replacement Level (WORL) based on both pWins and eWins: pWORL and eWORL, respectively. One might, therefore, think that a logical comparison would be to compare bWAR to eWORL. In fact, however, bWAR and eWORL are on somewhat different scales.
As discussed in my article referenced earlier, eWins over positional average (eWOPA) does not relate to team wins over .500 on a one-to-one basis (as bWAA and fWAA do), but on something closer to a two-to-one basis, i.e.,
(Wins over .500) ~ bWAA ~ 2*eWOPA
But the difference between WAA (Wins above Average) and WAR (Wins above Replacement) and the difference between WOPA (Wins over Positional Average) and WORL (Wins over Replacement Level) are on the same scale, i.e.,
(bWAR - bWAA) ~ (eWORL - eWOPA)
Hence, one can calculate the pWin or eWin-based equivalent of WAR by adding WOPA plus WORL.
Except for one detail: bWAR and fWAR are both calculated setting team replacement level at .294 (approximately 48-114 over a 162-game season). In contrast, my player-level replacement level is approximately 0.455. As I explain elsewhere, a player-level replacement level of 0.455 works out to a team-level replacement level of 0.366. Converting eWORL from a team-level replacement level of 0.366 to .294 (to match bWAR and fWAR) can be done as follows:
If player replacement level works out to 0.455 and wins over positional average (WOPA) work out to 0.500 (on average), then we can make two general statements:
(1) WORL - WOPA = 0.045*(Player Decisions)
Given 2 pWins and 1 pLoss for every team win (and the reverse for every team loss), a team-level replacement level of .294 would work out a player replacement level of .431. To set player-level replacement level at .431, you want to subtract 0.024*(Player Decisions) from WORL, or, from (2) above: (0.024/0.455)*(Wins - WORL), i.e.,
(2) Wins - WORL = 0.455*(Player Decisions)
Wins over 0.431 = WORL + 0.053*(Wins - WORL) = 0.053*Wins + 0.947*WORL
Combining the earlier result, then, that WAR ~ WOPA + WORL, we can get the Player won-lost version of WAR - eWAR - by fitting the following formula:
"eWAR" = 0.053*eWins + eWOPA + 0.947*eWORL
The player comparisons below compare eWAR, calculated as above, to bWAR.
[Just to clarify: I'm not suggesting that eWAR is a better measure than either eWOPA or eWORL; it's just makes for a more informative comparison to put eWins on the same scale as bWAR and fWAR and it's easier for me to do this by constructing "eWAR" rather than trying to construct the bWAR (or fWAR) equivalent of eWOPA or eWORL.]
I have created a page on my website whereby people can construct a customized "uber-statistic" weighting various factors - including Wins, WOPA, and WORL - however they see fit. The version of this page which creates the numbers that are compared to bWAR through the rest of this article can be found here.
The top 10 players in bWAR are compared to the top 10 in this adjusted version of Player won-lost records (which I will call "eWAR" for the remainder of this article) over the time period from 1949 - 2013 in the table below.
Eight players are in the top 10 in both eWAR and bWAR over the time period being considered here. The two players in the top 10 in bWAR but not eWAR are Rickey Henderson and Tom Seaver, who ranked 13th and 16th in eWAR over this time period, respectively. The two players in the top 10 in eWAR but not bWAR are Mike Schmidt and Joe Morgan, who ranked 11th and 12th in bWAR over this time period, respectively.
At a sufficiently high level, eWAR and bWAR are very similar; one might even say that they're more similar than different.
bWAR versus eWAR
To allow for a more detailed analysis, I also compared lists of the top 1,000 players by eWAR and bWAR from 1949 to 2013. I won't bore you by actually listing all of these players (although the top 1,000 players in eWAR can be found here), but will simply summarize the results. Overall, 841 players made the top 1,000 in both eWAR and bWAR over this time period. This means that there were 159 players who were in the top 1,000 in eWAR but not bWAR, and another 159 players who were in the top 1,000 in bWAR, but not eWAR.
For each of the 1,159 players who made one or both of these two top 1,000 lists, I assigned them to a single player position. This was done based on the position where the player played a plurality of his games over the relevant time period. In some cases, the position where a player played the most games is not the position at which the player accumulated most of his value (e.g., Ernie Banks). Many players also accumulated significant value across multiple positions (e.g., Pete Rose). But as a general categorization of players by position, this should work fine.
The next table breaks down the positions played by the top 1,000 players in eWAR and top 1,000 players in bWAR.
The detailed breakdown in the above table perhaps buries some more general similarities. Counting catchers as infielders, the eWAR top 1,000 includes 359 infielders, 249 outfielders, and 376 pitchers. For bWAR, the parallel numbers are 360, 251, and 375, respectively.
As noted above, the total number of pitchers in these two lists differs by one. That similarity obscures the two largest differences between the two systems, however. The top 1,000 players measured by bWAR includes 37 more relief pitchers than the top 1,000 players in eWAR. But the top 1,000 players measured by eWAR includes 38 more starting pitchers than the top 1,000 players in bWAR.
The primary reason for this is a difference in the treatment of context in calculating eWins vis-a-vis bWAR. In general, bWAR is a context-neutral measure. This is why the measure of Player won-lost records to which I am comparing it here is based on eWins, which are also context-neutral. There is one exception, however: bWAR takes into account leverage in valuing relief pitchers. That is, bWAR gives closers (and other relief pitchers) credit for pitching in higher-context situations.
A perfect example of this is Billy Wagner. Baseball-Reference credits him with 28.1 bWAR, which puts him in (approximately*) 489th place among major-league baseball players over the time period being considered here. In his career, Wagner allowed 262 runs in 903 innings, good for a career ERA+ of 187. That is an outstanding ERA+, but the number of innings is extremely low. Wagner's bWAR is boosted, however, by his having pitched at an average leverage of 1.7 during his career.
Relief Pitchers: example, Billy Wagner
*Wagner's pitching WAR alone places him in 504th place in bWAR. Adding his batting & baserunning WAR (0.4) pushes his total up to a level that would be 489th in bWAR; of course, other pitchers might have also moved up or down the list around this level.
In contrast, my context-neutral calculations recognize Billy Wagner's excellence, showing him with a career eWinning Percentage of 0.581. But eWins (and eLosses) do not take actual context into account (in fact, they use expected context for relief pitchers, which is actually well below average), so Wagner's career eWAR total of 17.2 ends up just outside the top 1,000 (specifically, Wagner comes in at #1,002).
I do, however, also calculate player won-lost records with context taken into account via pWins and pLosses. Incorporating actual context into Billy Wagner's record increases his player decisions by 66%. That alone would boost Wagner's "eWAR" from 17.2 to 28.5, quite close to his bWAR number (28.1). But, in addition to just multiplying a player's context-neutral record by a context multiplier, pWins also adjust for the timing of a player's performance. In his career, Billy Wagner performed better in higher-context situations than in lower-context situations, so that his pWin winning percentage was 0.614. Calculating a variation of eWAR based on Billy Wagner's career pWins, pWOPA, and pWORL produces a "pWAR" value of 37.2, which actually makes Wagner's career quite a bit more valuable than Baseball-Reference rates it.
While relief pitchers are more heavily represented in the bWAR top 1,000, starting pitchers are more heavily represented in the eWAR top 1,000.
The reason for this is also because of the treatment of context in the two measures. Technically, eWins are not calculated using no context. Rather, they are calculated incorporating expected context. For non-pitchers, expected context has a very minor effect. For pitchers, however, expected context actually serves to increase the context for starting pitchers and decrease the context for relief pitchers. This higher expected context for starting pitchers serves to increase the level of eWins for starting pitchers..
For example, Dan Petry just sneaks onto the bottom of the top 1,000 in eWAR with 18.6, while just missing the top 1,000 in bWAR with 17.1. What pushes him into the top 1,000 is that his starting pitching gave him an expected context of 1.05 which boosted his value by 5%. Removing that expected context would push Petry's eWAR down to 17.7, just out of the top 1,000 (and just below his bWAR). Fully incorporating the actual context in which Dan Petry performed in his career, a pWin-based version of WAR would give Petry 18.6 "pWAR".
Counting catchers, the top 1,000 players in eWAR includes 359 infielders; the top 1,000 players in bWAR includes 360 infielders. The only notable differences in the number of players by position are at catcher - where the bWAR list includes 6 more players - and second base - where the eWAR list includes 6 more players.
Catchers are the least-represented fielding position in both lists. This is because catchers tend to play fewer games than players at other positions, both within single seasons and also over their careers. It is also true that the parts of catcher fielding which are measured by both Player won-lost records and WAR measures are actually fairly minor. One possible reason why fewer catchers might show up in my list than in the bWAR list could be that I divide credit for stolen base attempts and wild pitches and passed balls between pitchers and catchers, whereas I believe catchers' fielding ratings in bWAR give 100% credit for these things to catchers.
As to second basemen, I would point out in the above table that my top 1,000 list produces the same number of second and third basemen (72 apiece), while the top 1,000 bWAR includes 8 more third basemen than second basemen (74-66).
The top 1,000 players in eWAR includes 249 players whose primary position was one of the three outfield positions. The top 1,000 players in bWAR includes 251 such players. The difference is that the bWAR list includes 11 more center fielders (and 9 fewer corner outfielders). As with second basemen, I would note that the split of players across the three outfield positions is more evenly distributed in the eWAR top 1,000 (82-81-86) than in the bWAR top 1,000 (79-92-80).
Player-Specific Differences in Valuation
While there are some differences in the positional mix of the top 1,000 players in bWAR vis-a-vis eWAR as discussed above, the overwhelming majority of players on only one of the two lists are there simply because of differences in how these players' performances were evaluated by bWAR versus Player won-lost records.
For example, the top 1,000 players as measured by bWAR includes 82 first basemen, as does the top 1,000 players as measured by eWAR. But the two lists actually only share 76 first basemen. Each list contains 6 players whose primary position is identified as 1B who are not on the other list. In fact, the only position for which all of the players on one list are also on the other list is designated hitters. All 14 DH's among the top 1,000 in bWAR are also among the top 1,000 in eWAR. The latter list then also includes two additional designated hitters: Cliff Johnson and Ken Phelps.
The top 1,000 players in eWAR even includes 7 players classified as relief pitchers who are not among the top 1,000 players in bWAR. Although, a closer examination of these seven players finds that all of them spent at least some time as starting pitchers. But there are at least two players who appear to have accumulated the majority of their career value as relief pitchers who are in the top 1,000 in eWAR, but not bWAR. Their career values as measured by bWAR and pWin- and eWin-based "WAR" measurements are compared below.
In both cases, incorporating actual context (pWAR) produces results quite close to Baseball-Reference's bWAR numbers.
The rest of this article looks at two player comparisons. The first comparison is of the highest-ranked player in bWAR outside the top 1,000 in eWAR versus the highest-ranked player in eWAR outside the top 1,000 in bWAR. The second comparison compares two shortstops, one in the top 1,000 in bWAR but not eWAR and one in the top 1,000 in eWAR but not bWAR.
Adam Dunn vs. Mickey Rivers
The highest-rated player as measured by eWAR who is not among the top 1,000 players in bWAR is OF/1B/DH Adam Dunn. Dunn's 45.0 eWAR ranks 252nd over the time period evaluated here while his bWAR total of 16.7 falls just short of the top 1,000 (17.2 was needed to make the top 1,000).
The decomposition of Adam Dunn's values as measured by Baseball-Reference and Player won-lost records are compared in the next table.
Player won-lost records like two aspects of Adam Dunn's game more than bWAR: batting and fielding.
For batting, the issue is how Adam Dunn generated most of his batting value: home runs. One of the results that I discovered, in looking at the net win values of various offensive events is that home runs are more valuable than run-based systems, such as linear weights, rate them. This is because, in addition to the average number of runs generated by a particular event, the value of a win is also affected by the certainty of those runs scoring and home runs, of course, are guaranteed to produce runs.
As for fielding, the difference here is one of scale. I agree with Baseball-Reference that Adam Dunn was well below average as a fielder. In fact, he rates among the 20 worst fielders for whom I have calculated Player won-lost records, as measured by net wins, at both first base and left field. But defensive player won-lost records are shared between pitchers and fielders, even on balls in play. The result is that my fielding records tend to be less extreme than those that underlie bWAR (and fWAR).
Put it together and I think that Adam Dunn was much more valuable in his career than Baseball-Reference rates him. But, one could argue, that's just one man's opinion. Is there an objective way to evaluate who is right in this case?
I wrote another article which sought to show why my weighting of fielding is superior to the weighting underlying bWAR (and fWAR). One of the key tools of this evaluation which I introduced there was to evaluate how closely bWAA, fWAA, and eWOPA tied to actual team wins over .500 at a team level by looking at standard errors associated with each of these measures. Weighting the teams on which Adam Dunn played in his career by the number of games he played for them, the standard error for bWAA (Baseball-Reference's measure of Wins above Average) was 6.5 wins. For eWOPA (including teammate adjustments), the standard error for Adam Dunn's teams was 5.2 wins. In other words, on average, Player won-lost records tended to be about 1.3 wins (almost 20%) closer to actual team wins than Baseball-Reference for Adam Dunn's teams through his career.
The highest-rated player as measured by bWAR who is not among the top 1,000 players in eWAR is former major-league centerfielder Mickey Rivers. Rivers's 32.5 bWAR ranks 391st over the time period evaluated here while his eWAR total of 22.5 falls short of the top 1,000 (17.3 was needed to make the top 1,000).
The decomposition of Mickey Rivers's values as measured by Baseball-Reference and Player won-lost records are compared in the next table.
Baseball-Reference and I basically agree about Mickey Rivers's baserunning - excellent - and his fielding - average. But Baseball-Reference rates Mickey Rivers as an above-average batter over his career, while I rate him as below average.
For his career, Mickey Rivers batted .295/.327/.397. According to Baseball-Reference, that was good for an OPS+ of 106 - which is consistent with their bWAR breakdown.
The 1975 season was a fairly typical season in Mickey Rivers's career: he batted .284/.331/.359, which Baseball-Reference rated as good for an OPS+ of 103 and 4 net batting runs (Rbat) plus 4 additional runs for avoiding double plays (Rivers grounded into 6 double plays in 1975). All told, Baseball-Reference rates Mickey Rivers's 1975 season as above-average, being worth approximately 0.8 wins above average.
Mickey Rivers is the flip side of Adam Dunn. Rivers's game was all about speed and putting the ball in play. For example, one of his biggest offensive weapons was the infield single (including the bunt hit). In 1975, Retrosheet play-by-play data identifies the first fielder to touch approximately three-quarters of all singles. Of those, approximately 13% were infield singles. In 1975, we know the first fielder to touch 97% of Mickey Rivers's 144 singles. Of these 140 singles, 43 of them - 31% - were infield singles. As far as I know, Baseball-Reference treats all singles as equal in determining player value. But infield singles are less valuable than outfield singles as they are much less likely to advance baserunners more than one base. Overall, I calculate that outfield singles are about 25% more valuable than infield singles, so that a system that treats all singles as equal will overrate hitters like Rivers who get more infield hits.
In contrast, Player won-lost records give Mickey Rivers a (context-neutral, teammate-adjusted) batting won-lost record of 10.9 - 11.9, -1.0 net batting wins.
In addition to over-valuing Mickey Rivers's singles, run-based offensive estimators (as Baseball-Reference and Fangraphs both use) undervalue home runs and, offsettingly, overvalue other hits.
For example, according to this article, the relative linear weights for doubles, triples, and home runs (denominated in runs) for 1975 are the following.
These numbers suggest that one home run is equal in value to 1.9 doubles or 1.4 triples.
In 1975, Mickey Rivers hit 17 doubles, 13 triples, and 1 home run. Based on the above weights, that is the equivalent of 19.6 home runs.
In contrast, these are the net win values for doubles, triples, and home runs for the 1975 American League.
These numbers suggest that one home run is equal in value to 2.4 doubles or 1.8 triples.
Using these numbers, then, Mickey Rivers's 17 doubles, 13 triples, and 1 home run are equivalent in value to 15.2 home runs.
That explains why bWAR and eWAR are different. But which is correct? Again, I would direct you to this article, where I make the case for Player won-lost records. As with Adam Dunn, I looked at the teams which Mickey Rivers played for in his career. The average standard error for bWAA (weighted by the games played by Rivers) was 4.96 wins per season. Using eWOPA (plus teammate adjustments) produces a standard error of 4.30 wins per season.
Rey Sanchez vs. Jose Offerman
As I said above, the vast majority of the differences in the bWAR top 1,000 and eWAR top 1,000 are not differences based on position, but differences in the valuation of players. The Rivers and Dunn examples highlight some of these differences. My final example is perhaps the clearest example of this difference because it is a direct comparison of two players at the same position.
Rey Sanchez was a brilliant defensive shortstop who lasted 15 seasons in the major leagues with over 5,000 career plate appearances despite a career batting line of .272/.308/.334. Sanchez played through the heart of the "sillyball" era of the late 1990's and early 2000's, so that translated into an OPS+ of 69. In an era where seemingly every middle infielder in baseball could hit 15-20 home runs per season, Sanchez hit 15 in his career. But, despite (surprisingly to me) never winning a Gold Glove, Sanchez's defense was brilliant enough that he appeared in 95 or more games 11 consecutive seasons from 1993 through 2003.
Jose Offerman was an exact contemporary of Sanchez. Offerman played 29 games in 1990, but otherwise his career perfectly overlapped with Sanchez's (Offerman missed the 2003 season, so that he and Sanchez ended up playing an identical 15 seasons). Like Sanchez, Offerman arrived in the major leagues as a shortstop. Unlike Sanchez, Offerman was not a "brilliant defensive shortstop" and Offerman ended up playing his last game at shortstop at age 27 in 1996. But Offerman's bat was solid enough - career batting line of .273/.360/.373 (OPS+ of 94) - that teams were willing to move Offerman to second base and, eventually, first base, to keep him in the lineup. Offerman even played 100 games as a designated hitter.
Rey Sanchez, brilliant defensive middle infielder (in addition to shortstop, he also played 480 games at second base and was brilliant there defensively as well) played in 1,490 games in his career and amassed 5,246 plate appearances. According to his Baseball-Reference page, Sanchez won no major awards (including, as I said, no Gold Gloves), made no All-Star teams, and earned $13.5 million in his major-league career.
Jose Offerman, defensively-challenged middle infielder, played in 1,651 games in his career and had 6,582 plate appearances. According to his Baseball-Reference page, Offerman made two All-Star teams (in 1995 and 1999) and earned $32.7 million in his major-league career.
The next table compares the careers of Rey Sanchez and Jose Offerman, as measured by Baseball-Reference's bWAR calculation and Player won-lost records.
Player won-lost records value both Sanchez's and Offerman's offense less than Baseball Reference. This is for the same reasons as were outlined earlier in this article with respect to Adam Dunn - Sanchez and Offerman hit a combined 72 career home runs - and Mickey Rivers - both Sanchez and Offerman were largely singles hitters for whom a significant portion of their offensive value came from infield hits and bunts.
But the difference in offensive value between the two players is basically the same, measured either by Baseball Reference or by Player won-lost records.
The big difference, then, between Sanchez's and Offerman's career values in bWAR vis-a-vis eWAR is because of the difference in the valuation of their fielding. But it is important to understand that the difference here is one of valuation not evaluation. Player won-lost records rate Rey Sanchez as one of the top 15 defensive shortstops and Jose Offerman as one of the 20 worst defensive shortstops for whom I have calculated Player won-lost records. Neither played enough second base to rank that highly at that position, but their relative defensive performances as second basemen were comparable to their relative performances as shortstops.
The difference in the fielding value shown here between Sanchez and Offerman, then, is not because of a difference in the evaluation of their specific fielding performance, but is due to a difference in the overall value which Baseball Reference and I place on fielding. As I explained above when discussing Dunn, the relative value of fielding vis-a-vis batting, baserunning, and pitching in Player won-lost records was not assigned by me, but was, instead, an output of my work. The relative value of offense to pitching to fielding (roughly 3 to 2 to 1) which I end up with are the relative values which the data tell us.
So which is the correct weighting of fielding? Obviously, I think that my numbers are correct.
As with Dunn and Rivers, I looked at the teams for which Offerman and Sanchez played (including the 2002 Boston Red Sox for whom they both played) and calculated how closely Baseball-Reference and Player won-lost records came to their actual records. The standard error for bWAA for Offerman and Sanchez's teams were 5.48 and 5.56, respectively. For eWOPA (including teammate adjustments), the standard errors for their teams were 5.18 and 4.79, respectively.
Let me also throw out one other piece of information. There are clear and obvious issues with relating player compensation to player value, most directly because of the timing of player compensation based on Major-League Baseball's arbitration and free-agent systems. And it has certainly been the case historically that many front offices have made serious mistakes with regard to player evaluation and player compensation. But overall, for his career, Baseball Reference estimates that Rey Sanchez earned $0.7 million per bWAR while Jose Offerman earned $1.9 million per bWAR. In contrast, I estimate that Rey Sanchez was paid $1.4 million per eWAR versus $1.8 million per eWAR for Jose Offerman. And, in case you were curious, the numbers for Adam Dunn (through 2013) were $5.9 million per bWAR vs. $2.2 million per eWAR.
The relative valuation of batting, baserunning, pitching, and fielding as determined by Player won-lost records are generally consistent with the relative valuation of batting, baserunning, pitching, and fielding as paid for by Major-League front offices. In my opinion, it is clearly the case that Major-League front offices can, and frequently do, overpay for specific players based on specific mis-evaluations of player value (e.g., Adam Dunn and the Chicago White Sox), but I find it far less likely that Major-League Baseball, as a group, would consistently mis-evaluate the relative importance of specific components of the game across all major-league players as Baseball-Reference's fielding numbers (and most other sabermetric fielding measures) implicitly claim.
note: This article was revised significantly on February 5, 2016.
All articles are written so that they pull data directly from the most recent version of the Player won-lost database. Hence, any numbers cited within these articles should automatically incorporate the most recent update to Player won-lost records. In some cases, however, the accompanying text may have been written based on previous versions of Player won-lost records. I apologize if this results in non-sensical text in any cases.
List of Articles