Wins over Replacement Level

Article revised on July 27, 2014

As explained elsewhere, my starting point for comparing players across positions is Positional Averages.

Even having calculated Positional Averages to make comparisons possible across positions, a problem still exists in attempting to compare Won-Lost records of different players. This problem is best illustrated by example.

Which player is more valuable: a player who earns a Player Won-Lost record of 5-2 (0.714 winning percentage) or a player who puts up a Won-Lost record of 9-5 (0.667)*? The first player (call him A) has a better winning percentage, but the second player (call him B) has more wins. In this case, the answer is fairly simple: Player B is almost certainly more valuable. His value is basically the same as Player A (5-2) plus the value of another player who went 4-3.
*For simplicity, assume that all of these players played the same position(s) / had the same Positional Average.

Let’s try another one. Which player is more valuable, a player who puts up a Won-Lost record of 9-5 (player B) or a player who puts up a won-lost record of 9-8 (player C)? Again, this one is fairly simple. Clearly, the player who won 9 games with fewer losses is more valuable. Breaking value down, Player C is basically the same as Player B (9-5) plus the value of a player who went 0-3. Notice the logical inference from those last two sentences. If Player B has more value than Player C, then the value of a player who goes 0-3 isn’t zero (otherwise Player C would have the same value as Player B), it’s negative.

Finally, let’s make the problem a little harder. Who’s more valuable, Player A (5-2) or Player C (9-8)? Now the problem gets trickier. Player C has the same value as Player A plus a second player with a Won-Lost record of 4-6 (0.400). Well, how valuable is a 0.400 winning percentage from a player?

The question really is, “how valuable compared to what?” And the answer to that question is, “Compared to what the team’s other alternatives would have been,” which leads nicely to the theory of Replacement Level.

Replacement Level is the level of performance which a team should be able to get from a player who they can find easily on short notice – such as a minor-league call-up or a veteran waiver-wire pickup. The theory here is that major league baseball players only have value to a team above and beyond what the team could get from basically pulling players off the street. That is, there’s no real marginal value to having a third baseman make routine plays that anybody who’s capable of playing third base at the high school or college level could make, since if a major-league team were to lose its starting third baseman, they would fill the position with somebody and that somebody would, in fact, make at least those routine plays at third base. This is similar to the economic concept of Opportunity Cost.

Replacement Level
For my work, I define Replacement Level as equal to a winning percentage one weighted standard deviation below Positional Average. Separate standard deviations are calculated for players at fielding positions, players at offense-only positions (DH, PH, PR), starting pitchers, and relief pitchers. Unique standard deviations are calculated in this way for each year. These standard deviations are then applied to the unique Positional Averages of each individual player. Overall, this works out to an average Replacement Level of about 0.454 (0.463 for non-pitchers, and 0.438 for pitchers). A team of 0.454 players would have an expected winning percentage of 0.362 (59 - 103 over a 162-game season). The derivation of my choice of Replacement Level is described next.

Derivation of Replacement Level

Hitting versus Fielding

Some analysts distinguish between replacement-level hitting – the level of hitting that could be found from freely-available talent – and replacement-level fielding – the level of fielding that could be found amongst freely-available talent. The problem with this is that, except for designated hitters, a team can’t actually replace a player’s hitting and a player’s fielding independent of one another. In fact, in many cases, it’s quite reasonable to think of situations where a player’s replacement is actually better than the player he is replacing at either hitting or fielding, but is nevertheless a worse overall player. Instead, a team must make a tradeoff and settle for the replacement player who provides the best combination of hitting and fielding. Hence, in my opinion, it only makes sense to talk about replacement level at an overall level, taking into account all aspects of a player’s game: batting, baserunning, fielding, and, if appropriate, pitching.

Replacement Level by Position

Some analysts also argue that replacement level differs by position – that is, one should calculate the replacement level for first basemen differently from the replacement level for second basemen. This seems to me to be a more reasonable position and is certainly worth investigating. On the other hand, the pool of replacement third basemen is likely to overlap considerably with the pool of replacement shortstops, for example, and any possible replacement starting pitcher is likely to also be a replacement relief pitcher. Certainly, however, at a minimum, the pool of replacement non-pitchers will be distinct from the pool of replacement pitchers.

I will begin by investigating all players to get a sense of where a general Replacement Level might be. From there, I will investigate Replacement Level by position.

Replacement Player Winning Percentages
So, how does one go about determining Replacement Level?
The first step, it seems to me, would be to define precisely what is meant by Replacement Level. The most obvious definition of Replacement Level to me, or, perhaps more precisely, the definition of Replacement Level which leads most obviously to a means of measuring it, would be the average winning percentage of marginal major-league baseball players.

Over the entire Retrosheet Era, there have been a total of 1,976 team seasons and 77,881 player-seasons* for which the player accumulated a non-zero number of player decisions.
*A player-season is defined as a unique player-season-team combination, so, for example, Aramis Ramirez’s 2003 performance with the Pittsburgh Pirates and Ramirez’s 2003 performance with the Chicago Cubs are treated as two distinct player-seasons.

I sorted these 77,881 player-seasons by total basic player games (wins plus losses). The total number of games ranged from a high of 47.9 for Mickey Lolich for the 1971 Detroit Tigers to a low of 0.00008 for Walt McKeel for the 1996 Boston Red Sox.*
*Actually, 18 players have appeared in a game but amassed exactly zero Player decisions during a player-season during the Retrosheet Era. This was done most recently by Charlton Jimerson in 2005, who played one inning in CF for the Houston Astros. Of course, his performance, such as it was, becomes wholly irrelevant to the results presented here.

As noted above, there have been 1,976 team-seasons over the Retrosheet Era. That works out to a total of 49,400 major-league roster spots available over this time period (1,976 teams times 25 roster spots per team). So, one could view the top 49,400 player-seasons over this time period as being “roster-level” player-seasons and the remaining 28,481 player-seasons (14 per team) as being “replacement-level” player-seasons.

Sorting by total Player games (wins plus losses), the aggregate adjusted winning percentage* for “roster-level” players was 0.503, and the aggregate winning percentage for “replacement-level” player seasons was 0.452, with “roster-level” players accounting for 94.8% of all Player decisions when measured in this way. This would imply a Replacement Level approximately 5.2% below Positional Average.
*As I explain elsewhere, a .500 winning percentage by a first baseman isn’t strictly comparable to a .500 winning percentage by a shortstop. To account for that, I adjusted player winning percentages here; adjusted winning percentage is equal to the player’s winning percentage plus (0.500 – the player’s positional average).

Is this really the best way to measure Replacement Level?
Conceptually, I think that it is. The question, however, is where to draw the line between “roster-level” and “replacement-level”. Drawing the line at 25 players per team makes some obvious sense, of course, as (before September 1st) there are 25 roster spots per team. Of course, no team uses only 25 players in a single season. On average, for the seasons for which I have calculated Player won-lost records, the average major-league team played 39.4 players per season (which is quite a bit higher than I would have guessed). Given that, how much difference is there between, say, the 23rd player on a typical team and the 28th player on a typical team?

The table below sets out to answer that very question. The winning percentages shown here are the aggregate winning percentage for all players who ranked at a given roster level as well as the aggregate winning percentage for all players who ranked below the given roster level. For example, the top 17,784 players in terms of Player Games would constitute roster spots 1 – 9 (1,976 teams times 9 roster spots = 17,784). These players posted an aggregate winning percentage of 0.513, while players who occupied roster spots 10 – 40 posted an aggregate winning percentage of 0.484. The cumulative percentage of team games is also shown for those who occupied roster spots 1 through the particular roster spot(s) shown.

 Winning Percentage Cumulative % of Roster Spot at Roster Spot below Roster Spot Total Games 1-9 0.513 0.484 57.9% 10 0.499 0.482 62.2% 11 0.494 0.481 66.1% 12 0.493 0.479 69.7% 13 0.491 0.478 73.0% 14 0.489 0.476 75.9% 15 0.487 0.475 78.6% 16 0.486 0.474 81.0% 17 0.485 0.472 83.2% 18 0.485 0.471 85.1% 19 0.482 0.469 86.9% 20 0.484 0.467 88.6% 21 0.483 0.465 90.0% 22 0.484 0.461 91.4% 23 0.480 0.458 92.6% 24 0.477 0.455 93.8% 25 0.472 0.452 94.8% 26 0.468 0.448 95.6% 27 0.462 0.445 96.4% 28 0.463 0.441 97.1% 29 0.457 0.437 97.7% 30 0.452 0.432 98.2% 31 0.446 0.428 98.6% 32 0.443 0.423 99.0% 33 0.438 0.418 99.3% 34 0.432 0.411 99.5% 35 0.424 0.404 99.7% 36 0.417 0.395 99.8% 37 0.402 0.389 99.9% 38 0.397 0.372 100.0% 39 0.380 0.292 100.0% 40 0.292 --- 100.0%

So what exactly do all of these numbers really mean and how do they help us calculate Replacement Level? Well, the first thing to notice, which shouldn’t be too surprising is that, in general, average winning percentages decline as one works one’s way deeper into the roster. In fact, looking at the column showing the aggregate winning percentage for players below a given roster level, this value declines uniformly through the entire table. In terms of winning percentage by roster spot, the trend is slightly less perfect, but is still fairly clear nevertheless.

Looking at the above table, is there any obvious break-point where the data seem to indicate that below a certain roster spot players are “replacement-level”? To me, the answer is “Sort of.”

Below the first 10 or 11 roster spots, roster spots 12 – 22 hover just below 0.500, in a relatively narrow range between 0.482 and 0.493. Changes in winning percentage by roster spot are somewhat erratic in this area of the roster, suggesting that differences in player decisions at this level are the result of differences in decisions earned across positions moreso than differences in the quality of the players occupying the various spots (e.g., outfielders tend to earn more decisions than infielders even though there’s no reason to think that outfielders are better players than infielders on average).

Roster spots 23 and 24 are fairly close to each other and are just a tick below the roster spots just above them. There is a bit more of a clear break, however, in moving from roster spot 24 to roster spot 25.

This might suggest setting “replacement level” as the level just below roster spot 24 - which ends up being very close, of course, to just off the 25-man roster. If players at roster spots below 24 are viewed as “replacement-level”, this would put replacement level at approximately 0.455.

This ends up being very close to the result setting replacement level one standard deviation below Positional Average, 0.454, which I have used for my work.
Winning Percentages by Position
I noted above that some people like to calculate unique Replacement Levels by position. This is an idea worth at least examining.

To do so, I looked at what a one-standard-deviation standard would imply regarding unique replacement levels by player position. Standard deviations for winning percentage by position are shown below calculated in two ways. The numbers on the left were calculated based on basic context-neutral, teammate-adjusted records. The numbers on the right also incorporate Expected Team Win Adjustments.

C4.7%4.9%
1B4.7%4.9%
2B3.8%4.0%
3B4.3%4.4%
SS3.6%3.9%
LF4.4%4.6%
CF4.0%4.2%
RF4.2%4.4%
DH7.2%7.4%
PH15.5%15.6%
PR26.5%26.5%
Pitcher (Offense)10.7%11.0%
Starting Pitcher4.4%5.3%
Relief Pitcher6.7%7.1%
Non-Pitcher Offense5.1%5.3%

A few comments about this table. First, the positions of DH, PH, PR, and pitcher offense give somewhat odd results that don’t necessarily make a lot of sense and are likely plagued to some extent by small-sample problems, even over the 60+ year time period considered here.

The other problem with PH, PR, and pitcher offense, I think, is that the correlation between winning percentage and total games is likely weaker for these positions, especially pitcher offense, than for other positions. That is, better catchers will catch more games, which will serve to reduce the weighted standard deviation of catcher winning percentage. Pitchers, on the other hand, are chosen almost exclusively for their pitching ability, not their hitting ability. As a result, there is likely to be very little correlation between the number of batting decisions earned by a pitcher and his hitting ability. The same is probably due, albeit to a lesser extent, for pinch hitting and pinch running. In many cases, a team's best pinch-hitting option on a particular day will be the best-hitting regular who has the day off, but, for any given regular, the number of times when that will be him will be very small; if he had too many days off, he'd no longer be a regular.

Excluding these positions, the results are actually quite stable across positions. For non-pitchers at fielding positions, the standard deviation for basic wins averages out to 3.9%, with a fairly narrow range across positions (3.6% - 4.7%). For pitchers, the standard deviation for basic wins averages out to 4.6%.

The final row shows the standard deviation for non-pitcher offense - i.e., batting and baserunning only. The standard deviation for offense is somewhat greater than the offense for non-pitchers at fielding positions. This is because poor hitters can improve their overall value with good fielding (and good hitters can reduce their overall value with poor fielding).

The gap between the standard deviations for position players and pitchers is even greater when expected intra-game win adjustments are taken into consideration. As I explain elsewhere, expected team win adjustments adjust for the fact that player differences from .500 will tend to have an exaggerated impact on leading to team wins; the reverse is true of below-average players as well. Being a little bit better than average has a multiplicative impact on a team's winning percentage. Because of this effect, when one adjusts basic player winning percentages for this expected team win adjustment, this will have the effect of increasing the spread of player winning percentages: player winning percentages above 0.500 will move farther above 0.500, while player winning percentages below 0.500 will move farther below 0.500. As a result, the standard deviation of player winning percentages is greater when expected team win adjustments are accounted for.

This effect of players on team wins is stronger for pitchers than it is for non-pitchers, because pitchers concentrate their performance into fewer team games. Adding, for example 0.3 player wins in one game will have more of an impact on a team than adding 0.1 player wins in each of three separate games. Because of this, when one incorporates expected team win adjustments for pitchers, especially starting pitchers, this has a much more significant impact on their standard deviation - which rises from 4.4% to 5.3% - than is the case for non-pitchers.

Overall, non-pitcher fielders see their average standard deviation increase from 3.9% to 4.1%. Pitchers, on the other hand, see their average standard deviation increase of 4.6% to 5.3%. This increased separation in the standard deviations of winning percentages for pitchers and non-pitchers further strengthens my decision to calculate separate standard deviations for these two groups. Note that making this adjustment also increases the difference in standard deviation between starting pitchers and relief pitchers. This difference is also recognized in my calculation of replacement level, as described below.

Even with these adjustments, the differences in standard deviation across fielding positions, however, are still very narrow - ranging from 3.9% to 4.9%. Because of this, I have chosen to calculate a single standard deviation for calculating Replacement Levels for all non-pitcher fielding positions.

Final Results
Putting all of this together, these results lead to my final decision to set Replacement Level at one standard deviation below Positional Average with standard deviations calculated separately for non-pitchers and pitchers. A single standard deviation is used for non-pitcher position players across all fielding positions. In the case of DH's, PH's, and PR's, however, all of their value is offensive. Hence, the overall standard deviation for non-pitcher offense is used to calculate replacement level at these positions.

Separate standard deviations are calculated for starting pitchers and relief pitchers based on the differences observed above. The standard deviation for pitcher offense is set equal to zero for purposes of calculating replacement level, on the grounds that, because hitting ability is not (generally) selected for in pitchers, there is no reason to believe that a replacement-level pitcher would be any worse at hitting than an average pitcher.

Over the Retrosheet era as a whole, putting all of this together, this works out to an average Replacement Level for non-pitchers of 0.463, and for pitchers of 0.438. Across all players, this works out to an average replacement level of 0.454. Going back to the table by roster level, this puts Replacement Level at about the level of players below the 24th-best player on an average major-league roster, with Replacement-Level players accounting for just over 6% of all Player decisions.

Combining these, a team of replacement-level players would have an expected winning percentage of around 0.362 (59 - 103 over a 162-game season).

For an individual player, Wins over Replacement Level (pWORL, eWORL) are equal to Player Wins minus (Player Decisions times Replacement Level). I compare my wins over replacement level (WORL) to Baseball-Reference's Wins above Replacement Level (WAR) in a separate article*.
*That article was written prior to my decision to use the standard deviation for non-pitcher offense to calculate replacement level for DH, PH, and PR positions. I believe that Baseball-Reference has also updated its replacement level since I wrote the article. I hope to update that article in the next few months to reflect these changes to both sets of numbers.

All articles are written so that they pull data directly from the most recent version of the Player won-lost database. Hence, any numbers cited within these articles should automatically incorporate the most recent update to Player won-lost records. In some cases, however, the accompanying text may have been written based on previous versions of Player won-lost records. I apologize if this results in non-sensical text in any cases.