As explained elsewhere, my starting point for comparing players across positions is Positional Averages.

Even having calculated Positional Averages to make comparisons possible across positions, a problem still exists in attempting to compare Won-Lost records of different players. This problem is best illustrated by example.

Which player is more valuable: a player who earns a Player Won-Lost record of 5-2 (0.714 winning percentage) or a player who puts up a Won-Lost record of 9-5 (0.667)^{*}? The first player (call him A) has a better winning percentage, but the second player (call him B) has more wins. In this case, the answer is fairly simple: Player B is almost certainly more valuable. His value is basically the same as Player A (5-2) plus the value of another player who went 4-3.

^{*}For simplicity, assume that all of these players played the same position(s) / had the same Positional Average.

Let’s try another one. Which player is more valuable, a player who puts up a Won-Lost record of 9-5 (player B) or a player who puts up a won-lost record of 9-8 (player C)? Again, this one is fairly simple. Clearly, the player who won 9 games with fewer losses is more valuable. Breaking value down, Player C is basically the same as Player B (9-5) plus the value of a player who went 0-3. Notice the logical inference from those last two sentences. If Player C has*more* value than Player B, then the value of a player who goes 0-3 isn’t zero (otherwise Player C would have the same value as Player B), it’s negative.

Finally, let’s make the problem a little harder. Who’s more valuable, Player A (5-2) or Player C (9-8)? Now the problem gets trickier. Player C has the same value as Player A plus a second player with a Won-Lost record of 4-6 (0.400). Well, how valuable is a 0.400 winning percentage from a player?

The question really is, “how valuable compared to what?” And the answer to that question is, “Compared to what the team’s other alternatives would have been,” which leads nicely to the theory of Replacement Level.

**Replacement Level** is the level of performance which a team should be able to get from a player who they can find easily on short notice – such as a minor-league call-up or a veteran waiver-wire pickup. The theory here is that major league baseball players only have value to a team above and beyond what the team could get from basically pulling players off the street. That is, there’s no real marginal value to having a third baseman make routine plays that anybody who’s capable of playing third base at the high school or college level could make, since if a major-league team were to lose its starting third baseman, they would fill the position with somebody and that somebody would, in fact, make at least those routine plays at third base. This is similar to the economic concept of Opportunity Cost.

*So, how does one go about determining Replacement Level?* The first step, it seems to me, would be to define precisely what is meant by Replacement Level. The most obvious definition of Replacement Level to me, or, perhaps more precisely, the definition of Replacement Level which leads most obviously to a means of measuring it, would be the average winning percentage of marginal major-league baseball players.

Over the entire Retrosheet Era, there have been a total of 1,802 team seasons and 71,859 player-seasons^{*} for which the player accumulated a non-zero number of player decisions.

^{*}A player-season is defined as a unique player-season-team combination, so, for example, Aramis Ramirez’s 2003 performance with the Pittsburgh Pirates and Ramirez’s 2003 performance with the Chicago Cubs are treated as two distinct player-seasons.

I sorted these 71,859 player-seasons by total basic player games (wins plus losses). The total number of games ranged from a high of 47.8 for Mickey Lolich for the 1971 Detroit Tigers to a low of 0.00008 for Walt McKeel for the 1996 Boston Red Sox.^{*}

^{*}Actually,
15 players have appeared in a game but amassed exactly zero Player decisions during a player-season during the Retrosheet Era. This was done most recently by Charlton Jimerson in 2005, who played one inning in CF for the Houston Astros. Of course, his performance, such as it was, becomes wholly irrelevant to the results presented here.

As noted above, there have been 1,802 team-seasons over the Retrosheet Era. That works out to a total of 45,050 major-league roster spots available over this time period (1,802 teams times 25 roster spots per team). So, one could view the top 45,050 player-seasons over this time period as being “roster-level” player-seasons and the remaining 26,809 player-seasons (15 per team) as being “replacement-level” player-seasons.

Sorting by total Player games (wins plus losses), the aggregate adjusted winning percentage^{*} for “roster-level” players was
0.504, and the aggregate winning percentage for “replacement-level” player seasons was
0.452, with “roster-level” players accounting for
94.5% of all Player decisions when measured in this way. This would imply a Replacement Level approximately
5.1% below Positional Average.

^{*}As I explain elsewhere, a .500 winning percentage by a first baseman isn’t strictly comparable to a .500 winning percentage by a shortstop. To account for that, I adjusted player winning percentages here; adjusted winning percentage is equal to the player’s winning percentage plus (0.500 – the player’s positional average).

*Is this really the best way to measure Replacement Level?* Conceptually, I think that it is. The question, however, is where to draw the line between “roster-level” and “replacement-level”. Drawing the line at 25 players per team makes some obvious sense, of course, as (before September 1st) there are 25 roster spots per team. Of course, no team uses only 25 players in a single season. On average, for the seasons for which I have calculated Player won-lost records, the average major-league team played
39.9 players per season (which is quite a bit higher than I would have expected). Given that, how much difference is there between, say, the 23rd player on a typical team and the 28th player on a typical team?

The table below sets out to answer that very question. The winning percentages shown here are the aggregate winning percentage for all players who ranked at a given roster level as well as the aggregate winning percentage for all players who ranked below the given roster level. For example, the top 16,218 players in terms of Player Games would constitute roster spots 1 – 9 (1,802 teams times 9 roster spots = 16,218). These players posted an aggregate winning percentage of 0.513, while players who occupied roster spots 10 – 40 posted an aggregate winning percentage of 0.484. The cumulative percentage of team games is also shown for those who occupied roster spots 1 through the particular roster spot(s) shown.

Even having calculated Positional Averages to make comparisons possible across positions, a problem still exists in attempting to compare Won-Lost records of different players. This problem is best illustrated by example.

Which player is more valuable: a player who earns a Player Won-Lost record of 5-2 (0.714 winning percentage) or a player who puts up a Won-Lost record of 9-5 (0.667)

Let’s try another one. Which player is more valuable, a player who puts up a Won-Lost record of 9-5 (player B) or a player who puts up a won-lost record of 9-8 (player C)? Again, this one is fairly simple. Clearly, the player who won 9 games with fewer losses is more valuable. Breaking value down, Player C is basically the same as Player B (9-5) plus the value of a player who went 0-3. Notice the logical inference from those last two sentences. If Player C has

Finally, let’s make the problem a little harder. Who’s more valuable, Player A (5-2) or Player C (9-8)? Now the problem gets trickier. Player C has the same value as Player A plus a second player with a Won-Lost record of 4-6 (0.400). Well, how valuable is a 0.400 winning percentage from a player?

The question really is, “how valuable compared to what?” And the answer to that question is, “Compared to what the team’s other alternatives would have been,” which leads nicely to the theory of Replacement Level.

For my work, I define Replacement Level as equal to a winning percentage one weighted standard deviation below Positional Average, with separate standard deviations calculated for pitchers and non-pitchers. Unique standard deviations are calculated in this way for each year. These standard deviations are then applied to the unique Positional Averages of each individual player. Overall, this works out to an average Replacement Level of about 0.448 (0.454 for non-pitchers, and 0.437 for pitchers). A team of 0.448 players would have an expected winning percentage of 0.343 (56 - 106 over a 162-game season). The derivation of my choice of Replacement Level is described next.Replacement Level

Derivation of Replacement Level

**Hitting versus Fielding**

Some analysts distinguish between replacement-level hitting – the level of hitting that could be found from freely-available talent – and replacement-level fielding – the level of fielding that could be found amongst freely-available talent. The problem with this is that, except for designated hitters, a team can’t actually replace a player’s hitting and a player’s fielding independent of one another. In fact, in many cases, it’s quite reasonable to think of situations where a player’s replacement is actually better than the player he is replacing at either hitting or fielding, but is nevertheless a worse overall player. Instead, a team must make a tradeoff and settle for the replacement player who provides the best combination of hitting and fielding. Hence, in my opinion, it only makes sense to talk about replacement level at an overall level, taking into account all aspects of a player’s game: batting, baserunning, fielding, and, if appropriate, pitching.

**Replacement Level by Position**

Some analysts also argue that replacement level differs by position – that is, one should calculate the replacement level for first basemen differently from the replacement level for second basemen. This seems to me to be a more reasonable position and is certainly worth investigating. On the other hand, the pool of replacement third basemen is likely to overlap considerably with the pool of replacement shortstops, for example, and any possible replacement starting pitcher is likely to also be a replacement relief pitcher. Certainly, however, at a minimum, the pool of replacement non-pitchers will be distinct from the pool of replacement pitchers.

I will begin by investigating all players to get a sense of where a general Replacement Level might be. From there, I will investigate Replacement Level by position.

Replacement Player Winning Percentages

Over the entire Retrosheet Era, there have been a total of 1,802 team seasons and 71,859 player-seasons

I sorted these 71,859 player-seasons by total basic player games (wins plus losses). The total number of games ranged from a high of 47.8 for Mickey Lolich for the 1971 Detroit Tigers to a low of 0.00008 for Walt McKeel for the 1996 Boston Red Sox.

As noted above, there have been 1,802 team-seasons over the Retrosheet Era. That works out to a total of 45,050 major-league roster spots available over this time period (1,802 teams times 25 roster spots per team). So, one could view the top 45,050 player-seasons over this time period as being “roster-level” player-seasons and the remaining 26,809 player-seasons (15 per team) as being “replacement-level” player-seasons.

Sorting by total Player games (wins plus losses), the aggregate adjusted winning percentage

The table below sets out to answer that very question. The winning percentages shown here are the aggregate winning percentage for all players who ranked at a given roster level as well as the aggregate winning percentage for all players who ranked below the given roster level. For example, the top 16,218 players in terms of Player Games would constitute roster spots 1 – 9 (1,802 teams times 9 roster spots = 16,218). These players posted an aggregate winning percentage of 0.513, while players who occupied roster spots 10 – 40 posted an aggregate winning percentage of 0.484. The cumulative percentage of team games is also shown for those who occupied roster spots 1 through the particular roster spot(s) shown.

Winning Percentage | Cumulative % of | ||

Roster Spot |
at Roster Spot |
below Roster Spot |
Total Games |

1-9 | 0.513 | 0.484 | 58.0% |

10 | 0.499 | 0.482 | 62.3% |

11 | 0.494 | 0.481 | 66.2% |

12 | 0.491 | 0.479 | 69.7% |

13 | 0.491 | 0.478 | 73.0% |

14 | 0.489 | 0.477 | 75.9% |

15 | 0.487 | 0.475 | 78.5% |

16 | 0.486 | 0.474 | 80.8% |

17 | 0.486 | 0.473 | 83.0% |

18 | 0.486 | 0.471 | 84.9% |

19 | 0.483 | 0.469 | 86.7% |

20 | 0.483 | 0.467 | 88.3% |

21 | 0.483 | 0.465 | 89.8% |

22 | 0.484 | 0.462 | 91.2% |

23 | 0.481 | 0.459 | 92.4% |

24 | 0.478 | 0.456 | 93.5% |

25 | 0.474 | 0.452 | 94.5% |

26 | 0.469 | 0.449 | 95.4% |

27 | 0.464 | 0.446 | 96.2% |

28 | 0.465 | 0.442 | 96.9% |

29 | 0.459 | 0.438 | 97.5% |

30 | 0.454 | 0.433 | 98.0% |

31 | 0.447 | 0.429 | 98.5% |

32 | 0.443 | 0.425 | 98.8% |

33 | 0.439 | 0.420 | 99.1% |

34 | 0.434 | 0.414 | 99.4% |

35 | 0.428 | 0.407 | 99.6% |

36 | 0.418 | 0.401 | 99.8% |

37 | 0.413 | 0.391 | 99.9% |

38 | 0.399 | 0.379 | 99.9% |

39 | 0.383 | 0.367 | 100.0% |

40 | 0.367 | -0.037 | 0.0% |

So what exactly do all of these numbers really mean and how do they help us calculate Replacement Level? Well, the first thing to notice, which shouldn’t be too surprising is that, in general, average winning percentages decline as one works one’s way deeper into the roster. In fact, looking at the column showing the aggregate winning percentage for players below a given roster level, this value declines uniformly through the entire table. In terms of winning percentage by roster spot, the trend is slightly less perfect, but is still fairly clear nevertheless, particularly for the second half of the table.

Looking at the above table, is there any obvious break-point where the data seem to indicate that below a certain roster spot players are “replacement-level”? To me, the answer is “Sort of.”

Below the first 12 or so roster spots, roster spots 13 – 23 hover just below 0.500, in a relatively narrow range between 0.481 and 0.491. Changes in winning percentage by roster spot are somewhat erratic in this area of the roster, suggesting that differences in player decisions at this level are the result of differences in decisions earned across positions moreso than differences in the quality of the players occupying the various spots (e.g., outfielders tend to earn more decisions than infielders even though there’s no reason to think that outfielders are better players than infielders on average). This might suggest that, in fact, setting “replacement level” as the level just below roster spot 23 - which ends up being very close to just off the 25-man roster - may have some merit. If players at roster spots below 23 are viewed as “replacement-level”, this would put replacement level at approximately 0.459. Another possibility could be three positions lower as roster spots 25 and 26 appear fairly interchangeable, with a more pronounced downward trend starting at row 27. Setting replacement level below roster spot 26 would put replacement level at approximately 0.449.

This ends up being fairly close to the result setting replacement level one standard deviation below Positional Average, 0.448.

To do so, I looked at what a one-standard-deviation standard would imply regarding unique replacement levels by player position. Standard deviations for winning percentage by position are shown below calculated in two ways. The numbers on the left were calculated based on basic context-neutral, teammate-adjusted records. The numbers on the right also incorporate Expected Team Win Adjustments.

Looking at the above table, is there any obvious break-point where the data seem to indicate that below a certain roster spot players are “replacement-level”? To me, the answer is “Sort of.”

Below the first 12 or so roster spots, roster spots 13 – 23 hover just below 0.500, in a relatively narrow range between 0.481 and 0.491. Changes in winning percentage by roster spot are somewhat erratic in this area of the roster, suggesting that differences in player decisions at this level are the result of differences in decisions earned across positions moreso than differences in the quality of the players occupying the various spots (e.g., outfielders tend to earn more decisions than infielders even though there’s no reason to think that outfielders are better players than infielders on average). This might suggest that, in fact, setting “replacement level” as the level just below roster spot 23 - which ends up being very close to just off the 25-man roster - may have some merit. If players at roster spots below 23 are viewed as “replacement-level”, this would put replacement level at approximately 0.459. Another possibility could be three positions lower as roster spots 25 and 26 appear fairly interchangeable, with a more pronounced downward trend starting at row 27. Setting replacement level below roster spot 26 would put replacement level at approximately 0.449.

This ends up being fairly close to the result setting replacement level one standard deviation below Positional Average, 0.448.

I noted above that some people like to calculate unique Replacement Levels by position. This is an idea worth at least examining.Winning Percentages by Position

To do so, I looked at what a one-standard-deviation standard would imply regarding unique replacement levels by player position. Standard deviations for winning percentage by position are shown below calculated in two ways. The numbers on the left were calculated based on basic context-neutral, teammate-adjusted records. The numbers on the right also incorporate Expected Team Win Adjustments.

Standard Deviations for Positional Winning Percentages

Position | Raw Wins | Adjusted Wins |
---|---|---|

C | 4.5% | 4.7% |

1B | 4.5% | 4.7% |

2B | 3.7% | 3.9% |

3B | 4.2% | 4.4% |

SS | 3.6% | 3.8% |

LF | 4.3% | 4.5% |

CF | 3.9% | 4.1% |

RF | 4.2% | 4.3% |

DH | 7.1% | 7.2% |

PH | 15.5% | 15.5% |

PR | 26.5% | 26.5% |

Pitcher (Offense) | 10.9% | 11.1% |

Starting Pitcher | 4.3% | 5.2% |

Relief Pitcher | 6.6% | 7.0% |

A few comments about this table. First, the positions of DH, PH, PR, and pitcher offense give somewhat odd results that don’t necessarily make a lot of sense and are likely plagued to some extent by small-sample problems, even over the 60+ year time period considered here.

The other problem with PH, PR, and pitcher offense, I think, is that the correlation between winning percentage and total games is likely weaker for these positions, especially pitcher offense, than for other positions. That is, better catchers will catch more games, which will serve to reduce the weighted standard deviation of catcher winning percentage. Pitchers, on the other hand, are chosen almost exclusively for their pitching ability, not their hitting ability. As a result, there is likely to be very little correlation between the number of batting decisions earned by a pitcher and his hitting ability. The same is probably due, albeit to a lesser extent, for pinch hitting and pinch running. In many cases, a team's best pinch-hitting option on a particular day will be the best-hitting regular who has the day off, but, for any given regular, the number of times when that will be him will be very small; if he had too many days off, he'd no longer be a regular.

Excluding these positions, the results are actually quite stable across positions. For non-pitchers at fielding positions, the standard deviation for basic wins averages out to 3.8%, with a fairly narrow range across positions (3.6% - 4.5%). For pitchers, the standard deviation for basic wins averages out to 4.5%.

The gap between the standard deviations for position players and pitchers is even greater when expected intra-game win adjustments are taken into consideration. As I explain elsewhere, expected team win adjustments adjust for the fact that player differences from .500 will tend to have an exaggerated impact on leading to team wins; the reverse is true of below-average players as well. Being a little bit better than average has a multiplicative impact on a team's winning percentage. Because of this effect, when one adjusts basic player winning percentages for this expected team win adjustment, this will have the effect of increasing the spread of player winning percentages: player winning percentages above 0.500 will move farther above 0.500, while player winning percentages below 0.500 will move farther below 0.500. As a result, the standard deviation of player winning percentages is greater when expected team win adjustments are accounted for.

This effect of players on team wins is stronger for pitchers than it is for non-pitchers, because pitchers concentrate their performance into fewer team games. Adding, for example 0.3 player wins in one game will have more of an impact on a team than adding 0.1 player wins in each of three separate games. Because of this, when one incorporates expected team win adjustments for pitchers, especially starting pitchers, this has a much more significant impact on their standard deviation - which rises from 4.3% to 5.2% - than is the case for non-pitchers.

Overall, non-pitcher fielders see their average standard deviation increase from 3.8% to 4.0%. Pitchers, on the other hand, see their average standard deviation increase of 4.5% to 5.2%. This increased separation in the standard deviations of winning percentages for pitchers and non-pitchers further strengthens my decision to calculate separate standard deviations for these two groups.

Even with these adjustments, the differences in standard deviation across fielding positions, however, are still very narrow - ranging from 3.8% to 4.7%. Because of this, I have chosen to calculate a single standard deviation for calculating Replacement Levels for all non-pitchers.

Combining these, a team of replacement-level players would have an expected winning percentage of around 0.343 (56 - 106 over a 162-game season).

For an individual player, Wins over Replacement Level (pWORL, eWORL) are equal to Player Wins minus (Player Decisions times Replacement Level). I compare my wins over replacement level (WORL) to Baseball-Reference's Wins above Replacement Level (WAR) in a separate article.

The other problem with PH, PR, and pitcher offense, I think, is that the correlation between winning percentage and total games is likely weaker for these positions, especially pitcher offense, than for other positions. That is, better catchers will catch more games, which will serve to reduce the weighted standard deviation of catcher winning percentage. Pitchers, on the other hand, are chosen almost exclusively for their pitching ability, not their hitting ability. As a result, there is likely to be very little correlation between the number of batting decisions earned by a pitcher and his hitting ability. The same is probably due, albeit to a lesser extent, for pinch hitting and pinch running. In many cases, a team's best pinch-hitting option on a particular day will be the best-hitting regular who has the day off, but, for any given regular, the number of times when that will be him will be very small; if he had too many days off, he'd no longer be a regular.

Excluding these positions, the results are actually quite stable across positions. For non-pitchers at fielding positions, the standard deviation for basic wins averages out to 3.8%, with a fairly narrow range across positions (3.6% - 4.5%). For pitchers, the standard deviation for basic wins averages out to 4.5%.

The gap between the standard deviations for position players and pitchers is even greater when expected intra-game win adjustments are taken into consideration. As I explain elsewhere, expected team win adjustments adjust for the fact that player differences from .500 will tend to have an exaggerated impact on leading to team wins; the reverse is true of below-average players as well. Being a little bit better than average has a multiplicative impact on a team's winning percentage. Because of this effect, when one adjusts basic player winning percentages for this expected team win adjustment, this will have the effect of increasing the spread of player winning percentages: player winning percentages above 0.500 will move farther above 0.500, while player winning percentages below 0.500 will move farther below 0.500. As a result, the standard deviation of player winning percentages is greater when expected team win adjustments are accounted for.

This effect of players on team wins is stronger for pitchers than it is for non-pitchers, because pitchers concentrate their performance into fewer team games. Adding, for example 0.3 player wins in one game will have more of an impact on a team than adding 0.1 player wins in each of three separate games. Because of this, when one incorporates expected team win adjustments for pitchers, especially starting pitchers, this has a much more significant impact on their standard deviation - which rises from 4.3% to 5.2% - than is the case for non-pitchers.

Overall, non-pitcher fielders see their average standard deviation increase from 3.8% to 4.0%. Pitchers, on the other hand, see their average standard deviation increase of 4.5% to 5.2%. This increased separation in the standard deviations of winning percentages for pitchers and non-pitchers further strengthens my decision to calculate separate standard deviations for these two groups.

Even with these adjustments, the differences in standard deviation across fielding positions, however, are still very narrow - ranging from 3.8% to 4.7%. Because of this, I have chosen to calculate a single standard deviation for calculating Replacement Levels for all non-pitchers.

Putting all of this together, these results lead to my final decision to set Replacement Level at one standard deviation below Positional Average with standard deviations calculated separately for non-pitchers and pitchers. Over the Retrosheet era as a whole, this works out to an average Replacement Level for non-pitchers of 0.454, and for pitchers of 0.437. Going back to the table by roster level, this puts Replacement Level at about the level of players below the 27th-best player on an average major-league roster, with Replacement-Level players accounting for maybe 4% of all Player decisions.Final Results

Combining these, a team of replacement-level players would have an expected winning percentage of around 0.343 (56 - 106 over a 162-game season).

For an individual player, Wins over Replacement Level (pWORL, eWORL) are equal to Player Wins minus (Player Decisions times Replacement Level). I compare my wins over replacement level (WORL) to Baseball-Reference's Wins above Replacement Level (WAR) in a separate article.