Ultimate Stat
Baseball Player WonLost Records: The Ultimate Baseball Statistic
The purpose of this article is to explain why I believe that Player wonlost records are the best possible measure of player value  in essence, why Baseball Player wonlost records are the ultimate baseball statistic. The heart of this explanation includes a comparison of my results to Wins above Replacement (WAR) as measured by BaseballReference.com and Fangraphs.com.
As I explain in detail elsewhere, I calculate Player wins and losses two ways. I begin by calculating pWins, which are tied directly to team wins, by construction. Having constructed these, I then also construct eWins, which are neutralized for context. Statistics derived from eWins  such as eWins over Positional Average (eWOPA) and eWins over Replacement Level (eWORL)  are conceptually comparable to other sabermetric "uberstatistics", including the various constructions of WAR (Wins above Replacement).
The relationship between team wins and pWins is perfect: [Actual Wins minus Actual Losses] equals [pWins minus pLosses] for any given team by construction. As such, there's not much "analysis" to be done there. But what about contextneutral wins (eWins)?
The object of analysis throughout this report will be net wins  Wins minus Losses  and/or wins above average (WOPA, in my vernacular; WAA, in the vernacular of WAR). I focus on wins relative to average for three reasons.
 First, wins above or below average are a "real" thing that can be empirically measured, whereas "replacement level" is more of a theoretical concept (although, once "replacement level" is set  at, say, .294 as is the case for BaseballReference and Fangraphs  it essentially becomes as empirically valid a measuring stick as .500).
 Second, for both Player wonlost records as well as for WAR, values are built up initially relative to average; comparisons to replacement level simply derive from a final step that shifts the comparison point from .500 to something else.
 Third, net wins, WOPA, and WAA are all centered around zero by construction. This simplifies the mathematics of the statistical analyses that I undertake here by eliminating the need for constant terms in any of my equations.
Player WonLost Records: eWins versus Team Wins
Having laid that out, we begin, then, with a basic equation that looks at the relationship between Net Wins (actual team wins minus actual team losses) and Net eWins (total eWins for the players on a team minus total eLosses for the players on a team):
Net Wins = a*(Net eWins)
This equation was fit using Ordinary Least Squares with the following results.
Seasons 
a 
Standard Error 
R^{2} 

2003  2015 
2.294 
0.054 
0.821 
I chose the time period investigated here, 2003  2015, because Retrosheet data are generally consistent since that time at identifying the hit type (e.g., ground ball, fly ball, line drive) for all balls in play. I also investigated some longer time periods to see how consistent the results were over the longer time period. Generally speaking, the results presented here were fairly stable across earlier seasons as well.
The estimated coefficient, a, has a value of approximately two. This is, perhaps, twice what one might expect  that the coefficient in the above equation should be approximately equal to one. I have discussed this relationship in some of my earlier writings, where I noted that the difference in Player winning percentage between the winning and losing team within a game tends to be fairly narrow. Specifically the pWinning percentage of players on a winning team will be .667 by construction (2 pWins vs. 1 pLoss). But the eWinning percentage of players on a winning team has tended to be closer to .576 (1.9 "wins" vs. 1.4 "losses" per game before normalization). In other words, 0.076 net eWins (0.576  0.500) translate into 0.167 net pWins (0.667  0.500), a ratio of about 2.2, which is very similar to the numbers in the above table.
The Standard Error of the coefficient, a, measures the reliability of the coefficient estimate. Given certain assumptions, we would expect the true coefficient to fall within one standard error of the point estimate approximately twothirds of the time and we would expect the true coefficient to fall within two standard errors of the point estimate approximately 95% of the time.
The value, R^{2}, measures the percentage of total variation in the "dependent" variable (Net Wins) that is explained by the equation  i.e., that is explained by the "explanatory" variable(s) in the equation  Net eWins, in this case. Overall, somewhat more than 80% of the variation in team wins can be explained by differences in eWins. The remaining differences can presumably be attributed to differences in the context in which player performance took place.
Teammate Interaction
For several components of Player wonlost records, responsibility is shared between players  either between batters and baserunners or between pitchers and fielders. The values of eWins are calculated controlling for the ability of one's teammates. For shared components, however, the teamlevel winning percentage is affected not only by the contextneutral winning percentages for the two sets of players sharing the components (e.g., pitchers and fielders), but also by the interaction of these two variables. This latter term is referred to by me as a "Teammate Adjustment". If both pitchers and fielders on a team are above average at something, the team, as a whole, will be better than either its pitchers or its fielders.
To account for this interaction, then, the next equation which I investigated added teammate adjustments to the previous equation as follows.
Net Wins = a_{0}*(Net eWins) + a_{1}*(Teammate Adj.)
This equation was fit using Ordinary Least Squares with the following results.
Seasons 
a_{0} 
Standard Error 
a_{1} 
Standard Error 
R^{2} 

2003  2015 
2.239 
0.054 
3.899 
0.854 
0.830 
The coefficient on Teammate Adjustments is approximately twice as large as the coefficient on net eWins. This is because of a difference in the nature of the two variables. Net eWins are equal to wins minus losses. So, for a record of, say, 90  72, net wins would be +18. Teammate adjustments are reported relative to .500, where 90 wins (out of 162) is only 9 games over .500 (81 out of 162 games). If the coefficient on Teammate Adjustments was constrained to be exactly equal to twice the coefficient on net eWins in the equation above, the coefficient on net eWins would be 2.227 (standard error of 0.051) and the R^{2} of the equation would be 0.830.
Impact of Batting v. Baserunning v. Pitching v. Fielding on Team Wins
Having set up a basic equation to relate eWins to Team Wins, this equation can be extended to evaluate whether the four basic factors are weighted appropriately within Player wonlost records. That is, the basic equation laid out above:
Net Wins = a_{0}*(Net eWins) + a_{1}*(Teammate Adj.)
can be replaced with the following equation:
Net Wins = a_{b}*(Net Batting eWins) + a_{r}*(Net Baserunning eWins) + a_{p}*(Net Pitching eWins) + a_{f}*(Net Fielding eWins) + a_{1}*(Teammate Adj.)
One might think, then, that if batting, baserunning, pitching, and fielding are weighted correctly, then the coefficients on these factors (a_{b}, a_{r}, a_{p}, and a_{f}) should be equal to each other (and should all be equal to the coefficient on net eWins from the earlier equation(s), a_{0}).
I rearranged some terms in the basic equation outlined above to make the interpretation and analysis of the results somewhat more intuitive. Specifically, I fit the following equation (using Ordinary Least Squares):
Net Wins = a_{0}*[(Net Batting eWins) + (1 +a_{r0})*(Net Baserunning eWins) + (1 + a_{f0})*(Net Fielding eWins) + a_{2}*(Teammate Adj.)]
+ a_{0}*(1 + a_{p0})*(Net Pitching eWins)
This equation is mathematically identical to the previous equation, but some terms have been rearranged and coefficients have been represented to facilitate analysis.
 In this equation, a_{0} is the same as in the equation relating (Net Wins) to (Net eWins) and we would expect this coefficient to be similar in magnitude across both equations.
 The coefficient on Teammate Adjustments in the earlier equation, a_{1}, is equal to a_{0}*a_{2} in this equation. As explained above, the coefficient here, a_{2}, has an expected value of 2.
 The coefficients, a_{r0} and a_{f0}, measure the difference in the weight on these two factors (a_{r} and a_{f}) relative to the weight on the factor, batting (a_{b}). The expected coefficients on a_{r0} and a_{f0} are both zero.
 I have separated pitching from the other three factors for reasons that will become more obvious later in this article.
The final results of this equation are presented in the next table.

2003  2015 

a_{0} 
2.059 
(Std. Error) 
0.078 
a_{2} 
1.881 
(Std. Error) 
0.546 
a_{r0} 
0.278 
(Std. Error) 
0.272 
a_{f0} 
0.042 
(Std. Error) 
0.164 
a_{p0} 
0.216 
(Std. Error) 
0.063 
R^{2} 
0.838 
A few comments.
1) The general coefficient, a_{0}, is similar to earlier estimates, around 2.0.
2) Batting, baserunning, and fielding seem to generally be weighted correctly. The one possible exception is baserunning, with the coefficient on a_{r0} being about one standard error below zero (which implies that baserunning is, perhaps, somewhat overweighted in Player wonlost records), although a difference of one standard error is generally not viewed as statistically significant.
3) The coefficient on teammate adjustments, a_{2}, is not significantly different from 2.
4) Including the four factors separately improves the R^{2} value of the equation only slightly, from 83.0% to 83.8%. This is reassuring to me, suggesting that the relative weighting of the four factors in my Player wonlost records is correct.
5) The coefficient on pitching, a_{p0}, is significantly (3.5 standard errors) greater than zero. Over the most recent sample period, the coefficient on pitching here, 0.22, suggests that pitching is underweighted in Player wonlost records by approximately 22%.
Obviously, comment 5) warrants some further discussion and analysis.
One of the key findings of my work is that player wins are not additive. In fact, they are something closer to multiplicative. This is mostly because of the result noted above that the players on a winning baseball team have an average contextneutral (eWin) winning percentage of .576 which translates into a pWin winning percentage of .667. As mentioned above, this is the reason why a_{0} has a value of two in the equations presented so far. This multiplicative effect affects the expected impact of players who are somewhat above (or below) average. The impact of a player being slightly above average will translate into a greater impact on team wins. This effect is not taken account of in the net factor wins analyzed above. And it is this effect that explains the significant positive coefficient on a_{p0}.
The multiplicative effect of player performance on team wins is incorporated into my calculation of eWins through an expected team win adjustment. This increases the expected player winning percentage based on the expected impact of the player's performance on the team's winning percentage. Expected team win adjustments are stronger for pitchers than for nonpitchers, because pitchers concentrate their performance into fewer games, so that the pergame impact of pitchers tends to be greater than the pergame impact of individual nonpitchers.
From 2003  2015, pitching (including pitcher fielding) accounted for 33.2% of unadjusted player decisions. But pitchers accounted for 43.9% of pWins over replacement level (excluding pitcher offense). In other words, the impact of pitchers on team wins is 32.2% greater than the impact implied by simple, unadjusted pitching decisions (43.9% / 33.2%  1). Hence, the expected coefficient on a_{p0} is not zero, but is, instead, 0.322, which is not significantly different from the value of a_{p0} shown above.
In other words, my analysis here strongly suggests (to me) that the relative value of batting, baserunning, fielding, and pitching implied by Player wonlost records accurately reflect the relative value of these four factors on actual team wins.
Summary of Results
To summarize, then, I fit the following equation to relate actual team wins to (contextneutral, teammateadjusted) eWins:
Net Wins = a_{0}*[(Net Batting eWins) + (1 +a_{r0})*(Net Baserunning eWins) + (1 + a_{f0})*(Net Fielding eWins) + a_{2}*(Teammate Adj.)]
+ a_{0}*(1 + a_{p0})*(Net Pitching eWins)
The next table repeats the results above for my final equation and contrasts the estimated coefficients with the expected coefficients, as they were derived above (except for a_{0}, for which the "expected" value is really an empirical question  i.e., the "right" coefficient is whatever comes out of the equation). That is, the second equation takes everything except a_{0} as given and only estimates a coefficient for a_{0}.

Statistical Estimates 
Expected Values 

a_{0} 
2.059 
1.963 
(Std. Error) 
0.078 
0.044 
a_{2} 
1.882 
2.000 
(Std. Error) 
0.546 
 
a_{r0} 
0.278 
0.000 
(Std. Error) 
0.272 
 
a_{f0} 
0.042 
0.000 
(Std. Error) 
0.164 
 
a_{p0} 
0.216 
0.322 
(Std. Error) 
0.063 
 
R^{2} 
0.838 
0.837 
None of the results in the first column are significantly different from the expected results in the righthand column.
Taking all of this a step further, then, team wins over .500 can be related to eWins over positional average by the following equation:
Team Wins over .500 = a_{0}*(eWOPA + (Teammate Adjustments))
If eWOPA (and, by extension, eWORL) is calculated correctly, we would expect the coefficient in this equation, a_{0}, to match the coefficient of the same name in the previous equation, and we would expect the R^{2} here to match the R^{2} from that equation as well. The results are as follows.
Seasons 
a_{0} 
Standard Error 
R^{2} 

2003  2015 
1.837 
0.045 
0.814 
The value of a_{0} perhaps changed a bit more than expected and the value of R^{2} is bit lower, but, overall, the results are reasonably similar.
Wins above Replacement (WAR) vs. Actual Team Wins
Having looked at how the factors underlying Player wonlost records  batting, baserunning, pitching, and fielding  related to team wins and whether, based on this analysis, these factors were correctly weighted in the calculation of Player wonlost records  specifically, wins over positional average (eWOPA, pWOPA) and replacement level (eWORL, pWORL), I will perform similar analysis for WAR (Wins above Replacement) as calculated and presented by BaseballReference (bWAR) as well as by Fangraphs (fWAR).
For both bWAR and fWAR, the basic calculation framework is the same. For nonpitchers (as well for the offensive contributions of pitchers), a player's contributions are expressed in terms of runs above average (runs below average being expressed as negative numbers) for the three nonpitching factors: batting, baserunning, and fielding. A fourth factor is then added into the mix, a positional adjustment, also expressed in runs above average (RAA). The positional adjustments are positive for "defensefirst" positions (C, SS, 2B) and negative for "offensefirst" positions (1B, LF, RF; CF and 3B tend to have positional adjustments near zero). These four factors are added up to produce an aggregate RAA for the player. A final value, called Rrep by BaseballReference, based on playing time, is added to convert from runs above average (RAA) to runs above replacement level (RAR). RAA and RAR are then converted from runs to wins, based on the runscoring environment in which the player played. In theory, one could apply the runtowin converter to the individual components to create, in effect, separate values of WAA for batting, baserunning, and fielding (WAA_{b}, WAA_{r}, WAA_{f}).
Pitcher WAR is somewhat more complicated but is similar in concept: a pitcher's runs allowed are compared against average and converted into wins above average (WAA_{p}) and replacement (WAR_{p}). BaseballReference begins with RA9  runs allowed per nine innings  and adjusts for the team's fielding RAA; Fangraphs uses FIP  expected runs allowed per nine innings, based on strikeouts, walks, and home runs allowed. Both BaseballReference and Fangraphs adjust relief pitcher WAR to account for leverage. BaseballReference also calculates a unique runtowin converter for each pitcher to reflect the impact of the pitcher on the runscoring environment (I'm not entirely sure what Fangraphs does in this regard).
Team WAR (or WAA) is then simply equal to the sum of the WAR (WAA) of the individual players on the team. In theory, I would expect the positional adjustments to balance out  every team has exactly one of every position in every inning of every game  so that, at the team level, I would expect a team's total WAA to equal the sum of WAA_{b}, WAA_{r}, WAA_{f}, and WAA_{p}.
To test, then, whether batting, baserunning, fielding, and pitching are weighted appropriately within WAR, I fit the following equation:
Team Wins over .500 = a_{b}*WAA_{b} + a_{r}*WAA_{r} + a_{f}*WAA_{f} + a_{p}*WAA_{p}
For analysis purposes, I rearranged the terms in the above equation, as I did in my analysis of Player wonlost records earlier in this article.
Team Wins over .500 = (1 + a_{0})*[WAA_{b} + (1 + a_{r0})*WAA_{r} + (1 + a_{f0})*WAA_{f}]
+ (1 + a_{0})*(1 + a_{p0})*WAA_{p}
The next two sections present and discuss my results for both bWAR and fWAR.
BaseballReference: bWAR
BaseballReference has two pages on its website for every season which summarize position player and pitcher WAR for every team within the season.
For position players, BaseballReference provides data on Rbat (RAA for batting), Rbaser, Rdp (runs above average for batters at avoiding grounding into double plays  for this analysis, I combined Rdp and Rbat), Rfield, and Rpos (positional adjustments), along with total RAA (the sum of all of the aforementioned columns) and WAA, Rrep (replacement runs), RAR (RAA + Rrep), and WAR.
As I said above, in theory, I would have expected Rpos to be approximately zero at the team level. In fact, however, for the 2015 season, total Rpos across all 30 teams summed to +742 runs (+25 runs per team on average). Offsetting this, the combined total for Rbat was 700 runs. This is typical of the seasons which I examined (back to 1969). I am reasonably sure that the reason for this is that the average number of runs against which Rbat is measured excludes pitcher batting. But the sum of Rbat (and Rpos) for teams includes pitcher batting. For the 2015 NL, total Rpos was +847 vs. Rbat of 630; for the AL, total Rpos was 105 vs. Rbat of 70.
My intended analysis required that total WAA be limited to batting, baserunning, pitching and fielding, and that total WAA be equal to zero at the seasonal level, by construction. To do this, I distributed Rpos to Rbat such that the sum of Rbat across the league was exactly equal to zero  i.e., in 2015, since Rbat summed to 700, I adjusted that number up by +700; I did so proportional to the +742 Rpos  i.e., I added 94.3% (700/742) of Rpos to Rbat for every team. For Rrun, Rdp, and Rfield, I adjusted the numbers proportionally across all teams such that the sum for the season was equal to zero  e.g., in 2015, Rfield totaled +37; I therefore subtracted 1.2 runs (37/30) from each team's Rfield value; in 2015, Rrun and Rdp both summed to zero across the league, so that no adjustments were necessary to these numbers.
On BaseballReference's pitcher WAR page, they provided data for WAA, WAAadj, and WAR. The last of these was, of course, total pitcher WAR. The first two of these summed to zero at the league level in every season. I, therefore, set pitcher WAA equal to the sum of WAA and WAAadj. Based on BaseballReference's explanation of its WAR for pitchers, WAAadj is an adjustment made to account for reliever leverage. As I understand it, then, at the league/team level, WAAadj ends up essentially being rounding error to recenter WAA to zero.
Having set all of that up, I fit the above equation using BaseballReference data from 2003  2015. The equation being solved is repeated here for reference.
Team Wins over .500 = (1 + a_{0})*[WAA_{b} + (1 + a_{r0})*WAA_{r} + (1 + a_{f0})*WAA_{f}]
+ (1 + a_{0})*(1 + a_{p0})*WAA_{p}
The results in the first column of the table were estimated using Ordinary Least Squares. The results in the last column are what we would expect if the four factors  batting, baserunning, fielding, and pitching  were appropriately weighted in the calculation of bWAR.

2003  2015


Statistical Estimates 
Expected Values 
a_{0} 
0.080 
0 
(Std. Error) 
0.043 
 
a_{r0} 
0.085 
0 
(Std. Error) 
0.328 
 
a_{f0} 
0.118 
0 
(Std. Error) 
0.074 
 
a_{p0} 
0.013 
0 
(Std. Error) 
0.035 
 
R^{2} 
0.817 
0.815 
None of the coefficients are significantly different from their expected value (zero) at a 95% significance level. The value for a_{0} is nearly so, however (p=.064, meaning a_{0} differs from zero at about a 93.6% significance level (1  p)). The value for a_{f0} (p=.114) is also at least suggestive if, perhaps not quite "significant".
A positive value of a_{0} suggests that the impact of position player WAA (i.e., batting, baserunning, and fielding) on team WAA is greater than onetoone. In this case, a coefficient of 0.080 suggests that team wins over .500 are, on average, 8% greater than implied by teamlevel positionplayer WAA. So, for example a team with players with a combined (positionplayer) WAA of +12 (and 0 pitching WAA) would be expected to finish 13 games over .500 (this is the difference between a 93 and 94win team in a 162game schedule).
A negative value of a_{f0} suggests that the impact of player fielding on team wins is less than the impact of batting or baserunning. In this case, a coefficient of 0.118 suggests that fielding WAA are, on average, 12% less valuable than batting or baserunning WAA in translating into team wins.
The top fielding team in MLB in 2015, according to BaseballReference, was the Arizona Diamondbacks at +68 Rfield. I translated that into a WAA_{f} of 6.5. Reducing that by the 12% implied by the estimated value of a_{f} would lower that to approximately 5.7 WAA  a reduction of just under one team win (0.8). Overall, BaseballReference calculated a total of 6.2 WAA for the 2015 DBacks. Reducing that by 0.8 would lower it to 5.4 WAA. The 2015 DBacks actually finished 7983, 2 wins below .500.
The worst fielding team in MLB in 2015, according to BaseballReference, was the Seattle Mariners at 68 Rfield. I translated that into a WAA_{f} of 6.7. Reducing that by the 12% implied by the estimated value of a_{f} would lower that (in absolute value) to 5.9  a reduction of 0.8 wins. Overall, BaseballReference calculated a total of 7.7 WAA for the 2015 Mariners. Adjusting that by 0.8 would raise it to 6.9 WAA. The 2015 Mariners actually finished 7686, 5 wins below .500.
Correlation between Pitching and Fielding
BaseballReference's treatment of pitching visavis fielding makes it difficult to evaluate the accuracy of bWAR as compared to fWAR or eWOPA. This is not a criticism of BaseballReference's treatment of pitching and fielding, merely a statement of fact. From the perspective of a team, BaseballReference begins with actual runs allowed, calculates an independent estimate of fielding runs above or below average, and attributes the difference between the two (i.e., total runs allowed minus (net) runs allowed by the team's fielders) to the team's pitchers. BaseballReference does not calculate WAR directly at the team level  WAR is constructed at the player level  and there are differences in the conversion from runs to wins for position players (where I understand the adjustment to be constant, or at least nearlyconstant, across all players within a league) and pitchers (where the adjustment is calculated uniquely for each pitcher to reflect the impact of the pitcher on his own runscoring environment). Because of these differences, it is not literally true that fielding WAA and pitching WAA can be traded off exactly oneforone. But, it is the case, that, essentially, teamlevel pitching WAA and teamlevel fielding WAA will very nearly add up to a teamlevel defensive WAA based on actual runs allowed at the team level.
In other words, any "errors" in BaseballReference's calculation of fielding WAA will produce nearlyexactly offsetting errors in BaseballReference's calculation of pitching WAA  and vice versa. The mathematical term for this issue is Multicollinearity and this issue may affect the interpretation of the results in the above table (especially a_{f0} and a_{p0}). Specifically, from the Wikipedia article linked in this paragraph, "One of the features of multicollinearity is that the standard errors of the affected coefficients tend to be large. In that case, the test of the hypothesis that the coefficient is equal to zero may lead to a failure to reject a false null hypothesis of no effect of the explanator, a type II error." In layman's terms, the standard errors associated with a_{f0} and a_{p0} are artificially large, because of the way in which BaseballReference calculates bWAR.
Because of the way in which BaseballReference calculates fielding and pitching WAA, total WAA (or WAR), as calculated with BaseballReference will have virtually no "errors" on the defensive side, relative to actual runs allowed. Actual runs allowed may not track perfectly with team wins because of differences in timing (e.g., "clutch performance", "pitching to the score"), but these differences should generally be beyond the scope of fWAA and eWOPA, as well (but not pWins and pWOPA, which explicitly measure such factors, of course). This should make bWAR a more accurate measure of actual team performance than either eWOPA or fWAR, neither of which tie their defensive measures directly to actual runs allowed at the team level.
This makes it very difficult to evaluate BaseballReference's treatment of fielding and pitching at the player level by looking at the teamlevel accuracy of bWAR (or bWAA). Difficult, but not entirely impossible.
One thing worth looking at is the teamlevel correlation between pitching (WAA) and fielding (WAA). If there were systematic errors in BaseballReference's calculation of fielding WAA, this would lead to perfectly offsetting errors in BaseballReference's pitching WAA, which would lead to these two measures being negatively correlated. Hence, a negative correlation between fielding WAA and pitching WAA, at the team level, could be indicative of problems in the split between fielding and pitching.
Player wonlost records also calculate fielding and pitching measures controlling for each other. As with BaseballReference, a negative correlation between these two measures could indicate problems with this split.
One challenge, however, in evaluating correlations between pitching and fielding is to figure out what correlation we should expect. At one level, we might expect a correlation of zero: pitching and fielding are performed by entirely different players (outside of pitcher fielding, but (a) pitchers tend to have relatively few fielding opportunities compared to other positions, and (b) pitcher fielding is necessarily subsumed within "pitching" by BaseballReference, because of its decision to tie to actual runs allowed). On the other hand, good teams tend to be good at everything and bad teams  especially very bad teams  tend to be bad at everything. So, it might be reasonable to expect pitching and fielding to be positively correlated at the team level.
Fortunately for our analysis, one of the three systems being analyzed here  Fangraphs  estimates pitching and fielding independently, based on entirely independent statistics. Specifically, pitchers are evaluated based entirely on strikeouts, walks, and home runs (via FIP), while fielders are evaluated based entirely on balls in play (via UZR). The correlation between pitching WAA and fielding WAA, as measured by Fangraphs, should reflect the "true" correlation between these factors at the team level.
The next table calculates the correlation between pitching and fielding WAA for the three systems from 1969 through 2015.
Fangraphs 
BaseballReference 
Player WL Records 

6.67% 
13.07% 
11.05% 
As measured by Fangraphs, the correlation between pitching and fielding is fairly small, but is slightly (and somewhat significantly) positive  as one might expect for the reasons suggested above. As measured by BaseballReference, however, the correlation between pitching and fielding is negative  not hugely, but significantly, so. This suggests to me that BaseballReference may be systematically misallocating credit for runs allowed between pitchers and fielders.
And what of Player wonlost records? The correlation between fielding and pitching as measured by Player wonlost records, 11.05%. This is somewhat higher than the correlation identified by Fangraphs, which would seem to suggest that I am not misallocating credit for runs allowed between pitchers and fielders.
bWAR vs. Actual Wins above Replacement
Both BaseballReference and Fangraphs use a replacement level of .294. As a final analysis, I compared bWAR to team WAR, where the latter was set equal to actual team wins minus the number of wins a .294 team would have won over that team's total games (47.6 per 162). For this experiment, I fit the following equation:
Team Wins over .294 = a_{0} + (1 + a_{pos})*WAR_{pos} + (1 + a_{p})*WAR_{p}
As with the previous table, the results in the first column of the table were estimated using Ordinary Least Squares. The results in the last column are what would be expected.

2003  2015


Statistical Estimates 
Expected Values 
a_{0} 
2.152 
0 
(Std. Error) 
0.847 
 
a_{pos} 
0.092 
0 
(Std. Error) 
0.033 
 
a_{p} 
0.039 
0 
(Std. Error) 
0.036 
 
R^{2} 
0.798 
0.793 
The coefficients, a_{0} and a_{pos} are both significant at a 95% confidence level (in fact, both are significant at more than a 98% confidence level).
The value of a_{0}, 2.15, indicates that a team that amassed an actual .294 winning percentage would be expected to earn 2 WAR rather than the 0 WAR implied by a replacement level of .294.
The only subreplacement team over the time period analyzed here was the 2003 Detroit Tigers, who went 43119 for a .265 winning percentage, which works out to 4.3 wins over .294. BaseballReference shows them with +4.3 WAR.
The next two worst teams over this time period were the 2004 Arizona Diamondbacks and the 2013 Houston Astros, who both finished 51111 (.315), 3.4 wins over .294. According to BaseballReference, the players on the 2004 Diamondbacks accumulated 5.7 WAR and the players on the 2013 Astros had 8.4 WAR.
The value of a_{pos}, 0.092, indicates that positionplayer WAR translate into about 9% fewer team WAR  i.e., 11 player WAR translate into only 10 team WAR. This is broadly consistent  in the sign of the coefficient if nothing else  with the earlier result suggesting that fielding WAA may be overstated by 12% or so.
The value of R^{2} indicates that just under 80% of the variance in team wins (over .294) can be explained by player WAR as presented at BaseballReference.com.
Fangraphs: fWAR
Fangraphs has two pages on its website for every season which summarize position player and pitcher WAR for every team within the season.
For position players, Fangraphs provides data on Batting, Base Running, and Fielding, as well as Positional values, expressed as runs above average. Fangraphs also has a column titled "League" which appears to reflect differences between the two leagues in a particular season (e.g., in 2015, AL teams are credited with around 22 runs; NL teams are credited with around 11 runs here). Finally, Fangraphs has a column "Replacement", which converts the previous columns (including League) from runs above average (RAA) to runs above replacement (RAR). Fangraphs then shows RAR (which is the sum of the preceding aforementioned columns) and WAR.
For a season as a whole, the sum of Fangraphs' values for Batting, Baserunning, Fielding, Positional, and League add up to zero (or something exceptionally close to zero, most likely due to minor rounding issues). As was the case with BaseballReference, however, total Batting runs above average tend to be negative while Positional and League adjustments tend to be positive, on average, across all teams. To create WAA measures for Batting, Baserunning, and Fielding, all of which were centered at zero, therefore, I distributed Positional and League adjustments by team across Batting, Baserunning, and Fielding, such that the total number of Batting, Baserunning, and Fielding Runs (relative to average) were all exactly equal to zero for every season. I then converted these runs above average (RAA) measures into wins above average (WAA) measures using the ratio of WAR to RAR reported by Fangraphs.
Fangraphs' pitcher WAR page provided team values for RA9WAR (WAR based on actual runs allowed) and WAR (their preferred measure, based on FIP  i.e., based only on strikeouts, walks, and home runs allowed). Fangraphs did not provide any measures of either runs or wins relative to average (RAA or WAA). I converted Fangraphs' WAR estimates (using WAR, not RA9WAR) to WAA by simply subtracting the same number of WAR from each team such that the sum equaled zero. So, for example, in 2015, total pitcher WAR, as reported by Fangraphs was 429.8. Dividing 429.8 by the 30 MLB teams, the "replacement" portion of WAR worked out to 14.3 "wins" per team. Subtracting each team's WAR by 14.3 produced a set of WAA measures which summed to zero across the 30 major league teams in 2015.
Having set all of that up, I fit the same equation as used earlier for eWins and bWAR, using Fangraphs data from 2003  2015. The equation being solved is repeated here for reference.
Team Wins over .500 = (1 + a_{0})*[WAA_{b} + (1 + a_{r0})*WAA_{r} + (1 + a_{f0})*WAA_{f}]
+ (1 + a_{0})*(1 + a_{p0})*WAA_{p}
The results in the first column of the table were estimated using Ordinary Least Squares. The results in the last column are what we would expect if the four factors  batting, baserunning, fielding, and pitching  were appropriately weighted in the calculation of fWAR.

2003  2015


Statistical Estimates 
Expected Values 
a_{0} 
0.043 
0 
(Std. Error) 
0.043 
 
a_{r0} 
0.122 
0 
(Std. Error) 
0.263 
 
a_{f0} 
0.200 
0 
(Std. Error) 
0.085 
 
a_{p0} 
0.190 
0 
(Std. Error) 
0.051 
 
R^{2} 
0.802 
0.790 
The coefficients on fielding, a_{f0}, and pitching, a_{p0}, are both significantly different from their expected value (zero) at more than a 95% significance level.
A negative value of a_{f0} suggests that the impact of player fielding on team wins is less than the impact of batting or baserunning. In this case, a coefficient of 0.200 suggests that fielding WAA are, on average, 20% less valuable than batting or baserunning WAA in translating into team wins.
The top fielding team in MLB in 2003, according to Fangraphs, was the Seattle Mariners at +78.1 Fielding Runs (above average). I translated that into a WAA_{f} of 7.7. Reducing that by the 20% implied by the estimated value of a_{f0} would lower that to approximately 6.1 WAA  a reduction of 1.6 wins. Overall, Fangraphs calculated a total of 47.2 WAR for the 2003 Mariners. Reducing that by 1.6 would lower it to 45.6 WAR. The 2003 Mariners actually finished 9369, which is 45.4 wins above the .294 replacement level used by Fangraphs (and BaseballReference).
The worst fielding team in MLB in 2003, according to Fangraphs, was the Toronto Blue Jays at 73.5 Fielding Runs. I translated that into a WAA_{f} of 7.2. Reducing that by the 20% implied by the estimated value of a_{f0} would lower that (in absolute value) to 5.8  a reduction of 1.4 wins. Overall, Fangraphs calculated a total of 33.6 WAR for the 2003 Blue Jays. Increasing that by 1.4 would raise it to 35.0 WAR. The 2003 Blue Jays actually finished 8676, 38.4 wins above the .294 replacement level used by Fangraphs.
A positive value of a_{p0} suggests that the impact of pitching WAR on team wins is greater than the impact of positionplayer WAR on team wins. In this case, a coefficient of 0.190 suggests that pitching WAA are, on average, 19% more valuable than positionplayer WAA in translating into team wins.
The top pitching team in MLB in 2003, according to Fangraphs, was the New York Yankees with 28.6 WAR. I translated that into a WAA_{p} of 14.3. Increasing that by the 19% implied by the estimated value of a_{p0} would raise that to approximately 17.0 WAA and 31.3 WAR. Overall, Fangraphs calculated a total of 55.1 WAR for the 2003 Yankees. Increasing that by the additional 2.7 pitcher WAR derived above would raise it to 57.8 WAR. The 2003 Yankees actually finished 10161, which is 53.6 wins above the .294 replacement level used by Fangraphs.
The worst pitching team in MLB in 2003, according to Fangraphs, was the Detroit Tigers with 2.9 WAR. I translated that into a WAA_{p} of 11.4. Increasing that (in absolute value) by the 19% implied by the estimated value of a_{p0} would raise that (in absolute value) to 13.6 WAA and 0.7 WAR. Overall, Fangraphs calculated a total of 1.7 WAR for the 2003 Tigers. Decreasing that by the additional negative pitcher WAA derived above (2.2) would lower it to 0.5 WAR. The 2003 Tigers actually finished 43119, which is 4.6 wins below the .294 replacement level used by Fangraphs (i.e, an actual WAR of 4.6).
fWAR vs. Actual Wins above Replacement
Both BaseballReference and Fangraphs use a replacement level of .294. As a final analysis, I compared fWAR to team WAR, where the latter was set equal to actual team wins minus the number of wins a .294 team would have won over that team's total games (47.6 per 162). For this experiment, I fit the following equation:
Team Wins over .294 = a_{0} + (1 + a_{pos})*WAR_{pos} + (1 + a_{p})*WAR_{p}
As with the previous table, the results in the first column of the table were estimated using Ordinary Least Squares. The results in the last column are what would be expected.

2003  2015


Statistical Estimates 
Expected Values 
a_{0} 
0.605 
0 
(Std. Error) 
0.906 
 
a_{pos} 
0.116 
0 
(Std. Error) 
0.035 
 
a_{p} 
0.199 
0 
(Std. Error) 
0.052 
 
R^{2} 
0.799 
0.788 
The coefficients, a_{pos} and a_{p} are both significant at a 99% confidence level.
The value of a_{pos}, 0.116, indicates that positionplayer WAR translate into about 12% fewer team WAR  i.e., 9 positionplayer WAR translate into only 8 team WAR. This is broadly consistent with the earlier result suggesting that fielding WAA is overstated by 20%.
The value of a_{p} in this equation, 0.199, is virtually identical to the value of a_{p0} in the previous equation. Both coefficients suggest that pitcher WAR translates into 20% more team WAR  i.e., 5 pitcher WAR translate into 6 team WAR.
The value of R^{2} indicates that just under 80% of the variance in team wins (over .294) can be explained by player WAR as presented at Fangraphs.com.
Comparison: eWOPA vs. bWAR vs. fWAR
Measuring the Accuracy of bWAA, fWAA, and eWOPA
At the team level, one would expect bWAA, fWAA, and eWOPA to correlate at least reasonably strongly with actual team wins over .500. The correlation will not be perfect (as it is for pWOPA and pWORL, by construction), of course. On offense, none of bWAA, fWAA, nor eWOPA tie to actual runs scored. And even if they did, differences in the distribution of runs scored lead to a lessthan perfect correlation between runs scored (and runs allowed) and team wins. On the other hand, there is no particular reason to expect any of bWAA, fWAA, or eWOPA to do a notably better job of incorporating these differences, since none of the three are designed to capture such differences.
There are some expected differences across the three systems.
 As noted above, bWAA for pitching and fielding are constructed to tie to actual runs allowed at the team level, by construction. This might lead one to expect bWAA to correlate somewhat more strongly to actual team wins than either fWAA or eWOPA.
 Both bWAA and fWAA for relief pitchers incorporate the leverage in which relief pitchers pitched. To the extent that better relief pitchers pitch in more important situations, this should lead to a better correlation with team wins for bWAA and fWAA than for eWOPA, which does not adjust for actual pitcher leverage.
 While eWOPA are calculated based on "contextneutral" win probabilities, there are some plays  stolen bases, bunts, and intentional walks  which I do not "neutralize" for context. To the extent that these plays are incorporated within eWOPA based on their actual context, this may lead eWOPA to correlate somewhat better with actual wins than bWAA or fWAA.
But, overall, the best (only?) way to evaluate how "accurate" bWAA, fWAA, and eWOPA are, relative to one another, is to evaluate how close they come to actual wins over .500 at the team level.
The first table below repeats results presented earlier in this article that relate actual team wins to my eWOPA (eWins over positional average) and to WAR (Wins above Replacement), as calculated by BaseballReference (bWAR) and Fangraphs (fWAR). (I evaluated WAR rather than WAA because the WAA values investigated here were at least partially constructed by me, as explained earlier in the article.)
For eWOPA, I fit the following equation:
Team Wins over .500 = a_{0}*(eWOPA + (Teammate Adj.))
For bWAR and fWAR, I fit the following equation:
Team Wins over .294 = c + (1 + a_{pos})*WAR_{pos} + (1 + a_{p})*WAR_{p}
The equations were all fit over team data from 2003 through 2015.

eWOPA 
bWAR 
fWAR 

a_{0} 
1.837 
 
 
(Std. Error) 
0.045 
 
 
c 
 
2.152 
0.605 
(Std. Error) 
 
0.847 
0.906 
a_{pos} 
 
0.092 
0.116 
(Std. Error) 
 
0.033 
0.035 
a_{p} 
 
0.039 
0.199 
(Std. Error) 
 
0.036 
0.052 
R^{2} 
0.814 
0.798 
0.799 
In comparing the results, I would point out that the equation for eWOPA presumes that the various factors are weighted optimally (as, indeed, I showed that they are earlier in this article). For bWAR and fWAR, however, the equation corrects for any misweighting between position players and pitchers. As such, to the extent the results here may be biased toward one or the other, they would be biased toward the WARs.
In spite of this possible bias, the highest R^{2} (which measures the percentage of variance in actual team wins explained by the various equations) is for eWOPA.
There are several alternate ways to measure how "close" these measures come to actual team wins beyond the above table. The next table presents two such measures over two alternate time periods.








eWOPA 

bWAA 
fWAA 
Raw 
incl. Teammate Adj. 
1969  2015 
89.7% 
88.4% 
90.5% 
91.0% 
2003  2015 
89.3% 
88.8% 
90.2% 
90.8% 
1969  2015 
4.931 
5.213 
4.793 
4.684 
2003  2015 
5.066 
5.118 
4.847 
4.732 
The first two rows present the simple correlation between team wins over .500 and the measures being evaluted here (bWAA, fWAA, eWOPA). Correlation is a measure that ranges from 1 to 1. Numbers greater than zero indicate that teams with higher values of bWAA (for example) tend to also have more actual wins over .500 (and vice versa). A correlation of 1 (or 100%) would mean that actual wins and the measure of interest move perfectly in synch  5% more bWAA would translate into exactly 5% more wins over .500.
Statisticians often refer to correlation by the letter, r. The relationship between the "r" here and the R^{2} in several of my earlier tables is not coincidental. In fact, for a univariate equation (i.e., y is a simple function of one variable, x), R^{2} is the square of the correlation coefficient, r. Not surprisingly, then, the correlation results here tell the same basic story as the R^{2} results told earlier: the relationships between team wins over .500 and bWAA, fWAA, and eWOPA are fairly similar, with eWOPA correlating somewhat better than bWAA and fWAA.
The last two rows calculate standard errors for bWAA, fWAA, and eWOPA. These are calculated as follows. For every teamseason, the difference between team wins over .500 and the number of wins over .500 predicted by the relevant measure is calculated. For bWAA and fWAA, the "number of wins over .500 predicted" is simply equal to bWAA and fWAA, respectively. As discussed earlier, the relationship between net eWins and net team wins (and, by extension, between eWOPA and team wins over .500) is not onetoone, but is closer to two to one. Hence, for this set of calculations, "the number of wins over .500 predicted by" eWOPA is equal to 2 times eWOPA. These differences are squared and then summed. Squaring the errors has two effects. First, a square of any number is positive, so squaring the numbers has the effect of valuing 2 the same as 2, so that positive and negative errors do not simply cancel out. Second, squaring these numbers (as opposed to simply taking the absolute value) weights larger errors more strongly than smaller errors. For example, squaring errors of 1 and 4 would produce a sum of squared errors of 17 (1^{2} + 4^{2}) while squaring errors of 2 and 3 (which have the same simple sum: 5) would produce a sum of squared errors of only 13 (2^{2} + 3^{2}): being off by 4 half of the time is worse than always being off by 2 or 3. The sum of squared errors is then divided by the total number of observations (1,288 teamseasons from 1969  2015) and the square root is taken. The results, then, are, essentially, average absolute errors (weighted against large errors)  so lower numbers are better.
The conclusion from the standard errors is pretty much the same as the conclusion from the correlations: eWOPA is best. Over the most recent time period (2003  2015), the standard error associated with eWOPA (including teammate adjustments) is nearly 7% better than bWAA and 8% better than fWAA.
Comparing bWAA and fWAA, the results seem to clearly favor BaseballReference. This is as we would expect, I think, given that defensive bWAA are constructed based on actual runs scored. Given that, the fact that eWOPA is even more accurate than bWAA strikes me as truly impressive (although I'm obviously not the most objective observer of these results, of course).
Comparison of Factors: Batting, Baserunning, Fielding, Pitching
Proper Factor Weighting: Batting vs. Baserunning vs. Pitching vs. Fielding
Earlier in this article, I spent a great deal of time looking at the individual factors of player value  Batting, Baserunning, Fielding, and Pitching  and assessing whether these factors were properly weighted within eWOPA, bWAA, and fWAA. Those results are repeated below.
To review, I fit the following equation for eWins, bWAA, and fWAA by factor.
Net Wins = a_{0}*[(Net Batting eWins) + (1 + a_{r0})*(Net Baserunning eWins) + (1 + a_{f0})*(Net Fielding eWins) + a_{2}*(Teammate Adj.)]
+ a_{0}*(1 + a_{p0})*(Net Pitching eWins)
The table below presents statistical results (estimated using Ordinary Least Squares) as well as expected coefficients. All three equations were estimated over data from 2003  2015.

Player WonLost Records 
Wins above Replacement (WAR) 

Statistical Estimates 
Expected Values 
BaseballRef 
Fangraphs 
Expected Values 
a_{0} 
2.059 
1.963 
1.080 
0.957 
1.000 
(Std. Error) 
0.078 
0.044 
0.043 
0.043 
 
a_{2} 
1.882 
2.000 
 
 
 
(Std. Error) 
0.546 
 
 
 
 
a_{r0} 
0.278 
0.000 
0.085 
0.122 
0.000 
(Std. Error) 
0.272 
 
0.328 
0.263 
 
a_{f0} 
0.042 
0.000 
0.118 
0.200 
0.000 
(Std. Error) 
0.256 
 
0.074 
0.085 
 
a_{p0} 
0.216 
0.322 
0.062 
0.243 
0.000 
(Std. Error) 
0.063 
 
0.050 
0.086 
 
R^{2} 
0.838 
0.837 
0.817 
0.802 
0.793 / 0.788 
To review some key points from my earlier analysis. First, with respect to Player wonlost records:
 None of the coefficients in the equation for Player wonlost records are significantly different from their expected values.
 The impact of pitching on Player wonlost records is stronger (by 2030 percent) than expected based on raw Player wonlost records. But this is accounted for in player eWins through adjustments for expected context and "team win adjustment".
 The relationship between net eWins and net Team wins is approximately 2to1.
 The relationship between eWins and team wins is strengthened by taking explicit account of teammate adjustments, to reflect the interactive relationship between pitchers and fielders (and, to a lesser extent, between batters and baserunners).
 Overall, approximately 84% of the variance in team wins is captured within eWins.
As for the two WAR measures:
 Fielding is significantly overweighted and pitching is significantly underweighted within Fangraphs' fWAR framework.
 Because of the structure of its calculations  which tie to actual runs allowed at the team level  it is difficult to evaluate the appropriateness of BaseballReference's weighting of fielding and pitching. The evidence that exists, however, suggests that fielding is overweighted by BaseballReference.
 Despite certain factors that should give the two WAR measures certain structural advantages visavis contextneutral eWins  relief pitcher leverage, BaseballReference's use of actual runs allowed  both WAR measures explain less of the actual variance in team wins than eWins, even when optimizing the weighting of batting, baserunning, pitching, and fielding.
 As presented by BaseballReference and Fangraphs, less than 80% of the variance in team wins is captured within either bWAR or fWAR.
I extend my analysis at the team level here to the level of individual players, comparing Player WonLost records vs. bWAR at the player level in a separate article.
Why are Player WonLost Records Superior?
The math seems very compelling to me. Player wonlost records are a better measure of actual team value  and, hence, by extension, are a better measure of player value  than WAR. Of course, I'm not the most objective observer here, but hopefully I have made a sufficiently compelling case that you agree with me.
Moving beyond the math, why are Player wonlost records superior to WAR?
The answer, I believe, is because I start from actual wins. I actually begin by calculating pWins, which tie to team wins by construction. I then pull out the context from pWins to create eWins. But starting from actual wins ensures that eWins still tie directly to team wins because eWins are still derived from actual team wins  albeit indirectly.
For example, starting from actual wins, I discovered that home runs are more valuable, relative to other hits, than conventional sabermetric wisdom believed. I discuss this in the companion piece to this article that compares the values of individual players as measured by Player wonlost records to values as measured by BaseballReference's WAR.
Starting from actual wins, my other big discovery is that the translation from player value to team value is not linear, but is, instead, largely multiplicative. Being a little bit better than average will translate into a lot of wins. By starting from actual team wins, I was able to incorporate this finding even into my "contextneutral" wins through what I call an "expected team win adjustment". This recognizes that a player who is somewhat above (or below) average will have a nonlinear, multiplicative, impact on his team's wins above (or below) average. The extent to which this is true will depend on how concentrated a player's performance is within his team's games. Because pitchers concentrate their performance more heavily than position players, this leads to pitchers having stronger expected (and actual) team win adjustments. This leads me to (correctly) weight pitcher performance more heavily than may be suggested by a simple linear analysis.
Probably the most significant difference between my eWOPA and eWORL measures versus bWAR and fWAR is in the impact of fielding on team wins. As I showed and discussed above, both WAR measures overstate the impact of fielding on team wins, by perhaps as much as 25%. In contrast, the evidence strongly suggests that my weighting of fielding is entirely appropriate. As with batting and pitching, I believe that I have gotten this weighting right because I determined the appropriate split between pitching and fielding through an objective analysis that began from a framework tied to actual team wins.
Ultimately, if you want to understand what leads to wins in Major League Baseball, you have to look at what leads to actual wins in Major League Baseball. That is what I have done in constructing Player wonlost records. And that is why Player wonlost records produce the best possible estimate of player value, either in or out of context.
All articles are written so that they pull data directly from the most recent version of the Player wonlost database. Hence, any numbers cited within these articles should automatically incorporate the most recent update to Player wonlost records. In some cases, however, the accompanying text may have been written based on previous versions of Player wonlost records. I apologize if this results in nonsensical text in any cases.
Home
List of Articles