I have recently updated my Player won-lost records. The primary driver of this update was Retrosheet's recent release of additional data. In addition to simply updating the data used to calculate my Player won-lost records, however, I have also updated the calculations themselves in several ways. I believe that these changes are significant enough that it may be best to view my newest Player won-lost records as Baseball Player Won-Lost Records: Version 2.0.
Over the next several months, I hope to update the articles on my website which describe my methodology to more fully describe my Player won-lost records. In the meantime, however, this article summarizes the changes which I have made and documents the impact of some of these changes.
In my initial release of these new data, there were a couple of glitches, the most significant of which was an issue with the calculation of positional averages for pitchers since the advent of inter-league play (1997). These issues have been corrected and my data are now correct as of August 3, 2019.
For those who lack either the interest or the time to read all of this article, here is the "tl;dr" version of the article.
The most obvious (and most significant) methodological change I have made is that I have changed my calculation of positional averages in such a way that my wins over (positional) average (WOPA) and wins over replacement level (WORL) are now on the same approximate scale as WAA and WAR as presented at other baseball websites (e.g., Baseball-Reference, Fangraphs).
For example, in my previous incarnation of Player won-lost records, Barry Bonds had 61.0 career pWOPA (pWins over positional average) and 98.4 career pWORL (pWins over replacement level). If you go to Barry Bonds's
page now, however, my new methodology gives him 137.3 career pWOPA and 174.9 career pWORL. Baseball-Reference shows Barry Bonds with 162.8 bWAR; Fangraphs shows Bonds with 164.4 fWAR. Obviously, my newest pWORL is much closer to these two figures than my previous pWORL was.
To best understand the methodological changes which I have made, I think it is important to understand the general methodology which I employ.
1. pWins and pLosses
The starting point for my construction of Player wins and losses is context-dependent player wins and losses - pWins and pLosses - and the starting point for constructing pWins and pLosses is Win Probabilities. I track win probability through a game, assigning player "wins" for events which increase a team's win probability and player "losses" for events which reduce a team's win probability. I then make two adjustments to these win-probability based wins and losses.
First, I normalize player game points to ensure that the total number of "wins" is exactly equal to the number of "losses" for every component of player wins and losses as well as by sub-component, at the finest level of detail which makes logical sense in each case.
I then normalize the pWins and pLosses within a particular game in order to ensure that player pWins and pLosses add up to the same total(s) for the winning and losing team in every game: the winning team gets two pWins and one pLoss; the losing team gets one pWin and two pLosses.
Technically, the second normalization here undoes some of the first normalization. When I first constructed Player won-lost records, I assumed that any such asymmmetries introduced by the second normalization would be random and would be likely to balance out over time. In fact, however, the normalization of games to exactly two pWins per team win (and two pLosses per team loss) lead to systematic asymmetries for some components. To correct this, I iterate through these two normalizations three times. That is, I normalize the results so that winning percentages by component and sub-component are equal to 0.500. I then normalize player decisions to tie to team wins and losses. I then take those results, and re-normalize the results by component and sub-component. I then re-normalize those re-normalized results to again tie back to team wins and losses. I then repeat the last two steps two more times.
The result are a set of pWins which tie exactly to team wins (two pWins and one pLoss in team wins, one pWin and two pLosses in team losses) and for which pWin winning percentages are approximately 0.500 for every component and sub-component.
2. eWins and eLosses
After calculating pWins and pLosses, which tie to team wins and losses, I also calculate a set of expected wins (eWins) and expected losses (eLosses). For eWins and eLosses, I replace the actual context and win adjustments with expected
context and expected
In my original version of Player won-lost records, I calculated expected context based on the positions played by a player, assigning a single expected context to all starting pitchers and a separate single expected context to all relief pitchers. I also introduced expected contexts for pinch hitting and pinch running. For all other players, I set expected context equal to 1.0. In fact, however, using a constant context for all relief pitchers ends up applying the same context to closers, set-up men, and mop-up men. But, in fact, part of the value of an elite relief pitcher is the fact that such a player's manager is able to utilize him at the most advantageous time(s) within a game. As for non-pitchers, while it is true that there is little correlation between expected context and one's fielding position, there are correlations which do tie to a player's own ability. For example, average context varies (somewhat) by lineup position - batters who bat higher in the lineup tend to perform in a slightly higher average context than batters who bat lower in the lineup. But where one bats in the lineup is not entirely random: better hitters tend to bat higher in the lineup; hence, better hitters tend to perform in a somewhat higher context on average.
To more accurately account for these factors, I have changed my approach and now set expected context equal to actual context for all players. This means that a player's total pDecisions (pWins plus pLosses) will equal his eDecisions (eWins plus eLosses), by construction. Differences between pWins and eWins, then, are entirely due to differences between the player's actual win adjustment and his expected win adjustment (i.e., differences in how a player performs in high- versus low-context situations).
3. Positional Averages
As of January 2019, users of this website are able to make their own choices of positional averages to be used in evaluating player values. A full discussion of positional averages and other aspects of player evaluation using Player won-lost records can be found in a 50-page PDF essay which I wrote at that time (January 2019), Comparing Players Using Player Won-Lost Records
A brief overview of the options available for calculating positional averages follows. These weights will be applied to offensive performance by position and for pitchers
if different positional averages are desired for starting versus relief pitchers.
Positional averages will be calculated as a weighted average of four possible options using weights chosen by the user.
(1) 0.500 - i.e., all positions are treated equally.
(2) One-year positional averages. Positional averages will be calculated based on offensive performance by position within the season of interest.
(3) Nine-year positional averages. Positional averages will be calculated based on offensive performance by position within the season of interest as well as the four seasons immediately before and immediately after the season of interest.
(4) Long-run positional averages. Positional averages will be calculated based on offensive performance by position across all seasons for which Player won-lost records have been calculated.
In the past, I have chose option (2). In discussions - mostly online - this choice is somewhat controversial. I want to be open-minded about issues of opinion - as one's choice of positional average is - while trying to maximize the acceptance of the objective core of Player won-lost records - the wins and the losses. Hence, I have decided to allow users to choose their own positional averages. I don't want anybody to reject Player won-lost records because of my choice of positional average and I don't want debates about positional averages to get in the way of understanding the "objective truths" that Player won-lost records reveal via player wins and player losses.
On any page for which positional averages are presented or used in the underlying calculations, there will be boxes in which the user can enter any numbers desired associated with the four positional average options described above. Positional averages will then be calculated as a weighted average of the choices based on the numbers entered.
For example, entering values of 1, 2, 3, and 4 for options (1), (2), and (3), (4), respectively, would calculate a weighted positional average equal to (1/10) option 1 (10 = 1 + 2 + 3 + 4), (2/10) option 2, (3/10) option 3, and (4/10) option 4.
If the user does not enter anything, the default positional average weights options (2), (3), and (4) equally: (1/3) Option 2, (1/3) Option 3, and (1/3) Option 4.
4. WOPA and WORL
The most noticeable change, then, to my Player won-lost records with this latest release is in my calculation of what I call WOPA - wins over positional average - and, by extension, my calculation of wins over replacement level (WORL). Essentially, I have doubled my WOPA values, which puts them on the same scale as wins above average (WAA) at Baseball-Reference.com. The gap between WOPA and WORL is unchanged (and has always been on basically the same scale as the gap between WAA and WAR at Baseball-Reference.com). Hence, if a player's eWOPA has increased by, say, 7.0 wins with my new methodology, that player's eWORL will also have increased by the same 7.0 wins.
Let me explain why I have done this.
I had noticed several years ago that my WORL were not on the same scale as the WAR values presented by Baseball-Reference and Fangraphs. Specifically, my WOPA values were approximately half the size of Baseball-Reference's WAA figures, while the difference between WOPA and WORL was comparable to the difference between WAA and WAR (and the difference between the two was easily explained by differences in replacement levels). This both troubled me and confused me somewhat.
The way that I calculated WOPA was straightforward and seemed fairly obvious to me. Suppose a player had a Player won-lost record of 12.0 - 8.0 (a 0.600 winning percentage) and a positional average of 0.520. An average player would have been expected to have a 0.520 winning percentage in this player's 20 player decisions, which works out to a record of 10.4 - 9.6 (a 0.520 winning percentage). Take the difference between the two win totals, 12.0 minus 10.4, equals 1.6 WOPA. Easy peasy. But Baseball-Reference (and Fangraphs) would show this player with something more like 3.2 WAA (not including any differences in their evaluation of the player vis-a-vis Player won-lost records).
Why the difference? My initial instinct was that this was because of the difference between net wins (wins minus losses) and wins over 0.500. How many games over 0.500 is a team that finishes with a record of 92-70? They have 22 more wins than losses, but they would only be 11 games ahead of an 81-81 team in the standings. Later, I had second thoughts and thought that maybe the difference was because player wins are not linear - the players on a winning team have an initial record of something like 1.9 - 1.4, which I then normalize to 2 - 1.
I have had several conversations about my Player won-lost records with a SABR member who read my book and had a lot of interesting thoughts on it, Bob Sawyer. And he convinced me of two things. First, my initial instinct was right. In effect, my old WOPA would say a 92 - 70 team was 11 games over 0.500 (i.e., 11 games better than a team with a 0.500 record in the same number of games played). Baseball-Reference, on the other hand, was calculating net wins (actually net runs, which they then converted to wins), simple wins minus losses (a 92 - 70 team would have 22 net wins).
Second, he convinced me that the way that Baseball-Reference does it is the correct way to do it. Broadly speaking, from an offensive standpoint wins correspond to hits (and walks and hit-by-pitch, etc.) and losses correspond to outs. But the constant across baseball games is not hits (or baserunners or runs, or anything positive offensively); the constant across baseball games is outs - outside of rain-shortened games and extra innings, the losing team will make 27 outs.
In fact, one of the interesting results I discovered in building and analyzing my Player won-lost records is that the net win value of an out is remarkably constant across seasons. The win value of a single or a home run will depend heavily on the run environment - an individual home run is less valuable in a higher run-scoring environment. But outs - on average, they're pretty much always worth about -0.023 or -0.024 net wins.
So, going back to our player who had a record of 12.0 - 8.0, let's assume that was all on offense. What would an average batter be expected to do given the same opportunities? We already said, he'd be expected to have a 0.520 winning percentage, but in how many decisions? Well, the number of outs are constant, so, the "same opportunities" isn't the same number of player decisions (20), it's the same number of losses - in this case, 8.0. A player with a 0.520 winning percentage and 8.0 losses would have a record of 8.7 - 8.0. So our player with a record of 12.0 - 8.0 is not 1.6 wins over positional average; he's 3.3 wins over positional average (12.0 minus 8.7).
The same argument, then, holds in reverse for defensive players (pitching and fielding): WOPA is calculated holding player wins constant and adjusting losses based on positional average. Note that (a) this requires distinct positional averages for offense and defense - which I have always calculated, although I only present a single positional average for players, and (b) if positional average is exactly 0.500 (as it always is for fielding, regardless of position), then WOPA is simply equal to net wins, wins minus losses.
Having calculated WOPA, then, WORL (wins over replacement level) is calculated in the same way that Baseball-Reference converts from WAA to WAR - the difference between positional average and replacement level (which, in my work, is one standard deviation in winning percentage by construction) is multiplied by total player decisions and that total is added to WOPA to get to WORL. This is identical to what I had been doing - and, as noted, is also identical to how Baseball-Reference and Fangraphs construct their WAR. The result, then, is that my WORL is essentially on the same scale as WAR, except for the fact that my replacement level differs from the common replacement level used by Baseball-Reference and Fangraphs. My WORL (and WOPA) will also disagree with Baseball-Reference and/or Fangraphs in some cases because of differences in how Player won-lost records evaluates some players. I have written an article
comparing my Player won-lost records to WAR in the past. I hope to update this article to incorporate my new treatment of positional averages within the next few weeks.
How Much Have Things Changed?
It is not ideal when a statistic which purportedly measures a specific performance at a specific point in time changes several years after the event it is alleging to measure. Some of this is because of the discovery of new and/or better data on that performance - new and better play-by-play sources. Even for games for which the play-by-play data have not changed, my methodology relies upon other games within a particular season or in surrounding seasons to better establish a context in which to evaluate player performance. This is inevitable and, while it is perhaps a bit annoying for numbers to change, I think it leads to more accurate numbers over time. Methodological changes, on the other hand, may be harder to take. So, how big a deal are the changes I have made here?
The best way to answer that is to look at some numbers and compare them to earlier numbers of mine. One could think of my first set of "definitive" numbers as being those published in my first book, Player Won-Lost Records in Baseball: Measuring Performance in Context
. This book was published in September 2016, and at the time that book was written, Retrosheet
had released play-by-play data (including some deduced games) for every game from 1945 through 2015 as well as for partial seasons back to 1930 (as well as four partial seasons in the 1920s). Clearly, for players who played some games prior to 1945 or since 2015, the numbers on my website include the addition of more games. Even for some players whose careers fell entirely between 1945 and 2015, the numbers currently on my website include additional data for some games. For example, as part of Retrosheet's most recent release (June 2019), an improved set of play-by-play data were released for every game of the 2002 season which included more accurate fielding credits as well as hit-type information (ground ball, fly ball, line drive) for hits. I know this because I spent four months making these improvements.
Early in my first book, I presented a list of players with 300 or more career pWins as calculated based on the data available at that time using the methodology outlined in the book. At that time, there were 72 such players. By my count, 61 of these players played exclusively between 1945 and 2015. So, any changes to the Player won-lost records of these 61 players would have been due entirely to improved data within seasons, residual effects from adding data to other seasons, and methodological changes.
The next table, then, compares the career records of these players, as presented in my book, to the career records of these players as they currently appear on this website.
Obviously, the biggest differences in the above table are for pWOPA (pWins over positional average), where the most recent numbers are roughly twice as large as the earlier numbers. This is the result of a methodological change which I described above. The pWORL figures changed by approximately the same magnitude for the same reason.
On the other hand, the changes in career pWins and pLosses here are mostly quite modest. The average absolute difference between the two sets of pWins is 3.0 or about 0.9%.
Still, the changes to pWOPA and pWORL are quite large. And, fair enough, I get why that might bother some people. On the other hand, the newer pWORL figures are now on the same approximate scale as WAR figures from Baseball-Reference and Fangraphs. The next table compares pWORL and eWORL with bWAR and fWAR (WAR as calculated by Baseball-Reference and Fangraphs, respectively) for these 61 players. I will note that I use a slightly higher replacement level than Baseball-Reference and Fangraphs. So, all other things being equal, my WORL numbers will be slightly lower than their WAR figures. Beyond that, I will simply let these numbers speak for themselves.
Bottom line: I hope that you understand the changes I have made and why I have made them. I hope that you will agree with me that these are improvements that have resulted in a better, more accurate set of Player won-lost records. And, most importantly, I hope that you will re-visit my website frequently and that you will find my Player won-lost records to be a fun and useful measure of baseball player value.
Thank you for reading this article, thank you for visiting the site, and please come back often!
Article last updated: August 3, 2019
All articles are written so that they pull data directly from the most recent version of the Player won-lost database. Hence, any numbers cited within these articles should automatically incorporate the most recent update to Player won-lost records. In some cases, however, the accompanying text may have been written based on previous versions of Player won-lost records. I apologize if this results in non-sensical text in any cases.
List of Articles