Win Probabilities

Baseball Player Won-Loss Records
Home List of Articles

Win Probabilities and Their Role in Constructing Player Won-Lost Records

What Are Win Probabilities?

The basic concept that underlies my construction of Player Won-Lost records is the concept of Win Probability, which, as far as I know, was first developed by Eldon and Harlan Mills in 1969 and published in their book, Player Win Averages. The concept has been developed further by many people over the years. Notable proponents of Win Probability Advancement concepts in recent years include Tangotiger, Dave Studenmund (Studes) at the Hardball Times, Keith Woolner at Baseball Prospectus (Baseball Prospectus 2005), and many others (e.g., Fangraphs, Baseball Prospectus). This list is not close to being exhaustive.

The basic concept underlying win probability systems is elegantly simple. At any point in time, the situation in a baseball game can be uniquely described by considering the inning, the number and location of any baserunners, the number of outs, and the difference in score between the two teams. Given these four things, one can calculate a probability of each team winning the game. Hence, at the start of a batter's plate appearance, one can calculate the probability of the batting team winning the game. After the completion of the batter's plate appearance, one can once again calculate the probability of the batting team winning the game. The difference between these two probabilities, typically called the Win Probability Advancement or something similar, is the value added by the offensive team during that particular plate appearance (where such value could, of course, be negative).

If we assume that the two teams are evenly matched, then the initial probability of winning is 50% for each team. At the end of the game, the probability of one team winning will be 100%, while the probability of the other team winning will be 0%. The sum of the Win Probability advancements for a particular team will add up to exactly 50% for a winning team (100% minus 50%) and exactly -50% for a losing team (0% minus 50%). Hence, Win Probability Advancement is a perfect accounting structure for allocating credit for team wins and losses to individual players.

This basic concept is used here to develop Player Won-Lost records for individual players based upon their contributions during Major League Baseball games. The technical calculation of Win Probabilities is described in this article.

Understanding Win Probabilities

Win Probabilities are a very popular analytical tool. They are also an important component in my construction of Player Won-Lost records. It is important, however, to understand that win probabilities have two very distinct purposes, which require different assumptions.

First, win probabilities can be used to assess specific in-game strategies. What is the cost/benefit of intentionally walking Albert Pujols? When does it make sense to attempt to steal home in the bottom of the ninth inning of a tie game? To properly answer these questions, one needs to evaluate the actual win probabilities that exist, given the actual players involved. In other words, the question of whether to walk Pujols depends not simply on the location of the baserunners, the number of outs, the score, and the inning, it also depends on who the on-deck hitter is, who the pitcher is, who's available in the bullpen, maybe even how the wind is blowing. Here, the specifics of who's involved are crucial to making a proper evaluation.

Alternatively, however, win probabilities can be used as a tool to help one evaluate the value of individual players. This is what I am doing here, creating a value-based system using win probabilities. In order to evaluate the value of individual players, however, I need to begin by assuming that everybody is average. That is, the probability of winning when batting with runners on 1st and 2nd, one out, in a tie game in the last of the ninth has to be the same for everybody. If I calculate a unique probability for every player based on what his actual performance was, then I'll just find that everybody is a 0.500 player - everybody is exactly as valuable as an average player, if the average player is him. The key here is to assume that everybody is average - for the league as a whole. A player's value, measured in this way, doesn't depend, therefore, on the quality of the on-deck hitter, or the speed of the baserunners, or even on the quality of the pitcher. One could, and perhaps should, attempt to adjust for such things - that is, a 0.500 hitter who faced above-average pitching is better than a 0.500 hitter who faced below-average pitching. I have made some attempts to make some adjustments of this nature, although other such adjustments are beyond the scope of my research thus far.

In fact, however, I believe that this initial round of research that I have done here is a necessary first step before one could begin to accurately make such adjustments.

Looking at win probabilities in these two ways can lead to different results. For example, there may be a situation (say, runner on first, nobody out) where a successful sacrifice bunt reduces win probability for an average hitter (i.e., the expected win probability is lower with a runner on second and one out than with a runner on first and nobody out). In that same situation, however, a successful bunt may well increase the expected win probability for some hitters (e.g., a bad-hitting but good-bunting pitcher with a tendency to ground into double plays with an excellent lead-off hitter coming up next). In such a case, a value-based system such as I have devised here will assign a negative value (i.e., Player Losses) to the latter hitter for sacrificing, even though, given who he is, a bunt may have been the right strategic decision. While this may seem wrong, in fact, it's perfectly reasonable. The only reason why the bunt was the right decision for that hitter was because he was a below-average hitter. This should be reflected in the value of that player. His ability to bunt will be credited to him, insofar as the Losses charged for the sacrifice bunt will be less than the expected Losses that would have been charged had he not bunted, but still, from the team's overall perspective, the bunt was not a positive event - it cost the team an out. The team suffered because this was a below-average hitter, and this is properly reflected in the value assigned to this player.

By the same token, however, one should be cautious about using average Win Probabilities to assess the value of specific events. For example, it is often noted that there are very few situations where Win Probabilities say that the correct average event is to sacrifice bunt. This is true. Of course, this is why sacrifice bunts are never attempted in most situations and are only rarely used in most other situations. To use average win probabilities to assess whether a particular strategy was a good idea or not misses, in my opinion, the point. Strategies are chosen for specific plays, based on the specific players involved.

For example, in my research, I have found that, on average, in most seasons (but not for the last several seasons), stolen base attempts are slight net negative events - total Player wins from successful stolen bases are slightly less than total Player losses from being caught stealing and picked off. Does this mean that teams and/or players' specific decisions to attempt to steal bases were bad, on average, or, at the extreme, that players should just never bother attempting a stolen base? Not necessarily. You'd have to look at the specific players involved - the pitcher, the on-deck hitter, the day-specific game conditions - to be able to say whether or not a specific stolen base attempt was a good or a bad play. Ditto for intentional walks, sacrifice bunt attempts, and any other strategic decisions employed by teams.

Calculating Win Probabilities

The basic concept of Win Probability posits that the situation in a baseball game can be uniquely described by considering the inning, the number and location of any baserunners, the number of outs, and the difference in score between the two teams. Given these four things, one can calculate a probability of each team winning the game.

The most obvious way to calculate win probabilities would be based purely on empirical observations, i.e., to determine the probability of a team winning when they lead by 4 runs with runners on 2nd and 3rd base with two outs in the top of the 8th inning, one could simply look at every time that a team has had runners on 2nd and 3rd base with two outs and a four-run lead in the top of the eighth inning and calculate what percentage of those teams ended up winning those games.

This approach has several problems. Because of the relative infrequency of many events, such a technique will lead to idiosyncratic results, whereby positive events may inadvertently decrease win probabilities or negative events may inadvertently increase win probabilities.

Compounding this problem, the true probability of winning in any given situation depends on the run-scoring environment in which the game is taking place. A 2-run lead was much easier to overcome in Coors Field in the late 1990s than it would have been in Dodger Stadium in the mid-1960s. Because of this, an ideal system would use a unique win probability matrix for each ballpark for each season to reflect differences in run-scoring environment. This is all but impossible using direct empirical observation, however, because of severe data limitations.

Finally, a system that makes empirical calculations based on the score-differential and the inning cannot begin with an assumption that the initial probability of winning is 50% for each team. This is because home teams actually win about 54% of major-league baseball games. Hence, teams that bat with nobody on and nobody out in a tie game in the top of the first inning don't win 50% of the time; they only win about 46% of the time.

This is a problem, however, if one's goal is to accurately and completely account for all of a team's wins and allocate them to the players on the team. In effect, the team's home ballpark will get some credit for home wins (and take some of the blame for road losses). But, the reason that more teams win at home is because players tend to play better at home: hitters tend to hit better at home and pitchers tend to pitch better at home, regardless of where home is. But, it seems clear to me that the players should get credit for this.

Hence, empirical calculations based on historical results using score-differential and inning are not, in my opinion, a proper starting point for calculating the win probabilities necessary to construct Player Won-Lost records.

In fact, however, the only empirical observations one needs to calculate a complete set of win probabilities are what I call a Base-Out Transition Matrix: the position of the baserunners and the number of outs before and after an event (typically a plate appearance). Given a Base-Out Transition Matrix, one can impute a full Win-Probability matrix in four "easy" steps.

(1)    Given a Base-Out Transition Matrix, one can compute a Base-Out Probability Matrix.

(2)    Given a Base-Out Probability Matrix, one can compute a Run Probability Matrix.

(3)    Given a Run Probability Matrix, one can compute an Inning Probability Matrix.

(4)    Given a Run Probability Matrix and an Inning Probability Matrix, one can compute a Win Probability Matrix.

Steps 1 through 4 above are explained next.

Base-Out Probability Matrix

The basic base-out transition matrix is 24 rows by 28 columns (24x28). The 24 rows identify the number of outs (0, 1, or 2) and the position of the baserunners (8 possibilities, identified here as 0, 1, 2, 3, 1-2, 1-3, 2-3, and 1-2-3) at the beginning of the event. The 28 columns identify the number of outs and the position of the baserunners at the end of the event. The four additional columns represent the additional possibility that this event results in the third out of the inning. The third out gets four columns to also reflect the number of runs which score on the play (0, 1, 2, or 3 - you couldn't score more than 3 runs while also making the third out). This breakdown of third-out plays is important in the next step, converting the base-out transition matrix to a run-probability matrix.

Throughout my work, I refer to base-out states in the following format, (o,b/r), where o is the number of outs, and b/r is either the position of the baserunners, or the number of runs scored on the play if o is equal to 3. For example, (0,0) indicates no outs and the bases empty, (1,2) indicates one out and a runner on second base, (2,2-3) indicates two outs and runners on second and third base, and (3,1) indicates three outs with one run scoring on the play on which the third out was recorded.

The initial base-out transition matrix simply includes the number of occurrences of a particular event in each cell. That is, the value in row (0,0), column (0,0) is equal to the number of times a plate appearance began with nobody on and nobody out and ended with nobody on and nobody out.

The first step in converting the base-out transition matrix into a win-probability matrix is to convert the base-out transition matrix into a base-out probability matrix. The base-out probability matrix is also 24 rows by 28 columns. Here, the value of each cell (r,c) is equal to the probability that a play will end in base-out state c, given that it started in base-out state r. The base-out transition matrix is converted into a base-out probability matrix by simply dividing the value in each cell of the base-out transition matrix by the sum of the values for that row of the base-out transition matrix. Hence, each row of the base-out probability matrix will sum to 100% by construction.

Individual base-out transition matrices are actually constructed for each individual ballpark for each individual season for which Player Won-Lost records are calculated as described here. The second step in converting the base-out transition matrix into a win-probability matrix is to convert the base-out probability matrix into a run-probability matrix.

Run Probability Matrix

The second step in converting a base-out transition matrix into a win-probability matrix is to convert the base-out probability matrix into a run-probability matrix. That is, given the initial base-out state, what is the probability of scoring exactly zero runs over the remainder of the inning, what is the probability of scoring exactly one run, what is the probability of scoring exactly two runs, etc?

If all of the events within the base-out probability matrix are plate appearances (i.e., stolen bases, wild pitches, et al. are not considered separate events), then each cell of the base-out probability matrix will generate a specific number of runs scored. The number of runs scored within a particular cell is equal to one (which represents the batter) plus the number of initial baserunners plus the number of initial outs minus the number of baserunners at the end of the play minus the number of outs at the end of the play (if less than three). In words, if the batter and baserunners can't be accounted for as either additional outs or baserunners after the play, then they must have scored. For example, if a play begins with one out and runners on first and second (1,1-2) and ends with two outs and a runner on third (2,3), then the number of runs scored is equal to one (1+2+1-1-2).

Knowing how many runs are scored from a specific event, then, one can calculate the probability of scoring any given number of runs given any initial base-out state. This is done recursively as follows.

We begin with the initial state, two outs and the bases loaded, (2,1-2-3)

The probability of scoring exactly zero runs is equal to the probability of transitioning from (2,1-2-3) to (3,0) (i.e., 3 outs, zero runs scored), because any other transition will necessarily involve at least one run scoring.

As an example, on average, between 2000 and 2006, the probability of this happening was 66.81%.

Next, consider the initial state, (2,2-3)

The probability of scoring exactly zero runs is equal to the probability of transitioning from (2,2-3) to (3,0) PLUS the probability of transitioning from (2,2-3) to (2,1-2-3) (which produces no runs scored) times the probability of scoring zero runs from (2,1-2-3) (which was solved for above). Again, any other final base-out state except for these two will have involved at least one run scoring.

Between 2000 and 2006, the major-league-wide probability of transitioning from (2,2-3) to (3,0) was 58.63%, the probability of transitioning from (2,2-3) to (2,1-2-3) was 20.51%, and the probability of scoring zero runs from (2,1-2-3) was 66.81% (as shown above). Hence, the probability of scoring zero runs from an initial base-out state of (2,2-3) is equal to 58.63% plus (20.51% times 66.81%) for a total probability of 72.33%.

Continuing onward, we can work up to the initial state, (2,1) - two out and a runner on first

The probability of scoring exactly zero runs is equal to the probability of transitioning from (2,1) to (3,0) plus the probability of transitioning from (2,1) to (2,1-2) times the probability of scoring zero runs from (2,1-2) plus the probability of transitioning from (2,1) to (2,1-3) times the probability of scoring zero runs from (2,1-3), ....

Eventually, one works back recursively to the initial state (0,0) - nobody on and nobody out. The probability of scoring zero runs is equal to the sum of the product of the probabilities of transitioning to each base-out state for which no runs score (i.e., any base-out state except for (0,0), which would produce one run) times the probability of scoring exactly zero runs from that base-out state. This works, without creating any kind of circular logic because, for every base-out transition state, if the initial and final base-out transition states are the same, then a run must score.

Having done this for zero runs scored, one can then do the same thing for one run scored. For any given base-out state, the probability of scoring exactly one run is equal to the sum of the product of the probabilities of transitioning to a state that generates exactly one run times the probability of scoring exactly zero runs from that state plus the sum of the product of the probabilities of transitioning to a state that generates exactly zero runs times the probability of scoring exactly one run from that state.

Generalizing further, the probability of scoring exactly X runs (for X greater than or equal to four) is equal to the sum of the product of the probabilities of transitioning to a state that generates exactly four runs times the probability of scoring exactly X - 4 runs from that state, plus the sum of the product of the probabilities of transitioning to a state that generates exactly three runs times the probability of scoring exactly X - 3 runs from that state, plus the sum of the product of the probabilities of transitioning to a state that generates exactly two runs times the probability of scoring exactly X - 2 runs from that state, plus the sum of the product of the probabilities of transitioning to a state that generates exactly one run times the probability of scoring exactly X - 1 runs from that state, plus the sum of the product of the probabilities of transitioning to a state that generates no runs times the probability of scoring exactly X runs from that state.

It is theoretically possible to score any number of runs in a single inning, even an infinite number of runs. As a practical consideration, however, in my work, I calculate exact probabilities, as outlined above, for scoring anywhere from 0 to 14 runs. I then estimate the probability of scoring exactly 15 runs as simply being equal to 1 minus the sum of all of the other probabilities. The choice of 15 runs was chosen arbitrarily as being sufficiently large that the probability of scoring 15 runs, calculated in this way, was always calculated to be less than the probability of scoring 14 runs, calculated correctly.

Once the run-probability matrix is constructed, an inning-probability matrix is constructed showing the probability of a team winning given the score differential at the start of an inning.

Inning Probability Matrix

Once the run-probability matrix is constructed, what I call an inning-probability matrix can be constructed, based upon the run-probabilities for the base-out state (0,0) - nobody on and nobody out. As with the run-probability matrix, the inning-probability matrix is constructed recursively.

The probability of a team winning that is trailing by X runs leading off the bottom of the ninth inning (or later) is equal to the probability of scoring X+1 or more runs given a base-out state of (0,0) plus 50% times the probability of scoring exactly X runs given a base-out state of (0,0).

So, for example, the probability of a team winning that is trailing by one run leading off the bottom of the ninth inning is equal to the probability of scoring 2 or more runs given a base-out state of (0,0) - 13.69% on average from 2000 - 2006 - plus 50% times the probability of scoring exactly 1 run (15.58%), for an overall probability of 21.48%. The probability of a team winning that is trailing by two runs leading off the bottom of the ninth inning was equal to 10.00% over this same time period.

Working backwards, the probability of a team winning if they lead by, say, one run leading off the top of the ninth is equal to the probability of scoring exactly zero runs times (one minus the probability of a team winning if they trail by one run entering the last of the ninth)^*, plus the probability of scoring exactly one run times (one minus the probability of a team winning if they trail by two runs entering the last of the ninth), plus the probability of scoring exactly two runs times (one minus the probability of a team winning if they trail by three runs entering the last of the ninth), ..., plus the probability of scoring fifteen runs times (one minus the probability of a team winning if they trail by sixteen runs entering the last of the ninth).
^*All probabilities are expressed as the probability of the batting team winning. The probability of the pitching team winning is, of course, equal to one minus the probability of the batting team winning.

For Major-League Baseball from 2000 - 2006, this becomes 70.73% times (1-21.48%) plus 15.58% times (1-10.00%) plus 7.36% times (1-4.58%) plus 3.49% times (1-2.03%) plus 1.61% times (1-0.87%) plus ... 0.00% times (1-0). If one works through the arithmetic, then, the probability of a team winning when leading by one run leading off the top of the ninth inning was equal to 82.82% on average over this time period.

In actuality, teams won 84.74% of the time when leading by exactly one run leading off the top of the ninth inning between 2000 and 2006. I suspect that this difference is because teams are more likely to use their closer to pitch the bottom of the ninth inning in such a situation and the average major-league closer is an above-average pitcher. It seems to me to be entirely appropriate to credit this above-average win probability to the above-average pitchers who are responsible for it.

Given the probabilities for the top of the ninth, one can calculate probabilities for the bottom of the eighth using the same logic as the preceding paragraph. Repeating this, one can ultimately work all the way back to the probability of winning leading off the top of the first inning (50%, by construction, as is the probability of winning leading off the top of any inning^*).
^*Technically, a very small error is introduced here because of my earlier restriction that no team can score more than 15 runs in one inning. This restriction, which essentially says that a team that enters the bottom of the ninth inning trailing by 16 or more runs can't win, would make the probability of the road team winning when entering an inning tied slightly different from 50%, typically in about the 8th decimal place or so.

A complete win-probability matrix can then be constructed from an inning-probability matrix and a run-probability matrix.

Win Probability Matrix

Given an inning-probability matrix, giving the probability of winning given the score differential at the start of any inning, and a run-probability matrix, which gives the probability of scoring any number of runs through the end of the given inning given the current base-out state, one can build a complete win-probability matrix.

Given a run differential of r, an initial base-out state b, and half-inning j, the probability of the batting team winning is equal to the probability of scoring exactly zero runs times the probability of the half-inning j batting team winning given a run differential of r leading off half-inning j+1 (i.e., one minus the probability of the team batting in half-inning j+1 winning given a run differential of -r), plus the probability of scoring one run times the probability of the half-inning j batting team winning given a run differential of r+1 leading off half-inning j+1, plus the probability of scoring exactly two runs times the probability of the half-inning j batting team winning given a run differential of r+2 leading off half-inning j+1, plus the probability of scoring exactly three runs times the probability of the half-inning j batting team winning given a run differential of r+3 leading off half-inning j+1, plus the probability of scoring exactly four runs times the probability of the half-inning j batting team winning given a run differential of r+4 leading off half-inning j+1, ..., plus the probability of scoring exactly fifteen runs times the probability of the half-inning j batting team winning given a run differential of r+15 leading off half-inning j+1.

The final win-probability matrices used here are 432 rows (24 initial base-out states times 18 half-innings) by 31 columns (run differential of -15 to +15, including zero). The win-probabilities for extra innings are identical to those for the ninth inning - both top and bottom.

Individual Win-Probability Matrices are actually constructed for each individual ballpark for each individual season for which Player Won-Lost records are calculated.

Context-Neutral Win Probabilities

Traditionally, win-probability systems are purely context-dependent. In fact, however, I do not think that this is necessarily the appropriate starting point for measuring player value. Rather, I am interested in beginning with an assessment of players' performances in the absence of the contexts in which the players actually performed. That is, what would the expected won-lost record be for a player, given his actual performance, assuming that performance had come in a neutral context? To answer this question, I construct a set of context-neutral Player Game Points. Once these are constructed, I can then add back in the contextual information in a way that clearly identifies how players' values were affected by the context in which they performed.

Context-Neutral win probabilities are constructed as follows. Player Game Points are divided into three categories for the purpose of calculating context-neutral win probabilities: independent events, base-state dependent events, and purely contextual events.

1. Independent Events

Most events can happen regardless of the base-out situation. One can strike out at any time, regardless of how many baserunners or outs there are. Similarly, a triple could happen at any time regardless of the number of baserunners. All batter results, except for double plays (which are base-state dependent), intentional walks, and bunts, fall into the category of independent events. Intentional walks and bunts are treated as purely contextual events, which are described below.

For independent events, the expected win probability of such an event is calculated for each event within the league-year using the Win Probability Matrix for the ballpark in which the event took place.

For example, the win probability of a home run at Wrigley Field in 2005 is calculated by taking every plate appearance that took place in a National League ballpark in 2005 and calculating, for that plate appearance, what the added win probability would have been had the game been played in Wrigley Field and the batter hit a home run. The context-neutral win probability of a home run at Wrigley Field in 2005 is then equal to the average of all of these probabilities. In this case, the average win probability added by a home run at Wrigley Field in 2005 was 0.141 wins.

In the case of events which may or may not lead to baserunner advancement - e.g., outs, singles, doubles - expected results are calculated based on average baserunner advancement, just as is done with contextual Player Game Points.

2. Base-State Dependent Events

Some events can only happen given certain baserunners or a certain number of outs. For example, one can only ground into a double play with at least one baserunner on and less than two outs. Any Player Game Points accumulated by a baserunner on third base can, of course, only be accumulated in a base-out state that includes a runner on third base.

For baserunner game points (except for stolen bases, which are treated as purely contextual events and discussed below) and double plays, the context-neutral win probability of the event is calculated the same as for independent events, except that the average win probability is only calculated across events with relevant base-out states.

So, for example, the context-neutral Player Game Points associated with a double play are calculated as the average win probability, given the ballpark in which the game takes place, added from hitting into a double play across double-play situations (runner on first base and less than two out). For a ground ball to the shortstop at Wrigley Field in 2005, the average win probability added by a double play is 0.011 losses (from the batter's perspective) (on top of the 0.046 losses accrued from the initial ground-out).

For baserunner advancements and baserunner outs, context-neutral win probabilities are only averaged given the specific batting event and hit type. That is, the context-neutral Player Game Points for a runner on third base advancing on a fly out are calculated only considering plays in which a runner on third base advances on a fly out. Similarly, the context-neutral Player Game Points for a runner on first base who only advances to second base on a single are calculated only considering plays in which a runner on first base does not advance to third on a single.

3. Purely Contextual Events

While it is possible to remove much, if not all, of the context from most plays, there are certain plays which are, essentially, purely elective plays, and are therefore inextricably tied to the context in which they take place. In my opinion, it would be wrong to attempt to divorce these plays from their context.

Three types of plays fall into this category: intentional walks, stolen base attempts (including stolen bases, caught stealings, pickoffs, and balks), and bunts (regardless of either situation or outcome). In each of these three cases, the context-neutral Player Game Points are simply set equal to context-dependent Player Game Points.

Context-neutral win values for specific events by season are presented and discussed in a separate article.

Context-neutral Player Game Points form the basis for context-neutral, teammate-adjusted Player wins and losses, which I call eWins and eLosses. The calculation of eWins and eLosses is described in more detail in a separate article.

pWins vs. Win Probability Advancements

As noted above, the central building block in calculating pWins and pLosses is the concept of Win Probability.

Win Probabilities are probably best known through a statistic called WPA (Win Probability Advancements). Player won-lost records, pWins and pLosses, are built from the same building blocks as Win Probability Advancements. In effect, the base for pWins is what Baseball-Reference (and others) calls WPA+, "positive Win Probability Added", and the base for pLosses is WPA-, "negative Win Probability Added". I do not end there, however, so that pWins are not simply equal to either WPA or WPA+ (nor are pLosses equal to WPA-).

As I explain in the basic article describing how I calculate pWins and pLosses, I normalize Win Probability Advancements (what I call "Player Game Points") by Game. The total number of Player Game Points accumulated in an average Major League Baseball game is around 3.3 per team. This number varies tremendously game-to-game, however, with some teams earning 2 wins in some team victories while some other teams may earn 6 wins in team losses. At the end of the day (or season), however, all wins are equal. Hence, in my work, I have chosen to assign each team one Player Win and one Player Loss for each team game. In addition, the winning team earns a second full Win, while the losing team earns a second full Loss. Ties are allocated as 1.5 Wins and 1.5 Losses for both teams. Context-neutral player decisions (eWins/eLosses) are also normalized to average three Player decisions per game. For eWins and eLosses, this normalization is done at the season level, rather than the game level, so that different numbers of context-neutral player decisions will be earned in different games.

pWins vs. WPA, Example 1: How Many Wins can One Player Get in One Game?

Win Probability Advancements are structured such that net Win Probability Advancements (WPA+ minus WPA-) sum to exactly 0.5 for every team win and exactly -0.5 for every team loss. Hence, team wins are equal to exactly two times WPA. In Game 6 of the 2011 World Series, David Freese and Lance Berkman accumulated a combined WPA of 1.8 (according to Baseball-Reference). In other words, Freese and Berkman combined to "win" 3.6 games that night - and their teammates (mostly the Cardinals' bullpen) combined to "lose" 2.6 games. But, of course, when Game 6 of the World Series was over, the Cardinals had only won 1 game, not 3.6, and they had certainly not lost 2.6 games.

Normalizing the Win Probability Advancements of the St. Louis Cardinals in that game, the players on the Cardinals earned a combined 2 pWins and a combined 1 pLoss by construction. And how many net pWins did Freese and Berkman combine for? It turns out that David Freese and Lance Berkman earned a combined 0.81 net pWins.

Sure, they were more responsible for that win than any of their teammates, but at the end of the day, it was still just one win - a very important, very dramatic win, but just one win.

pWins vs. WPA, Example 2: Solo Home Runs in 1-0 Games

In a separate article, I look at how context can affect value as I measure it via Player won-lost records by comparing two games during the 2002 season which the Los Angeles Doders won at Dodger Stadium by a score of 1-0 with the only run of the game scoring on a solo home run.

A comparison of these two games gives a nice example of how my Player wins differ from a straight application of Win Probability Advancements.

On August 28, 2002, starting pitcher Odalis Perez hit a solo home run with two outs in the bottom of the fifth inning off of Arizona's Rick Helling for the only run in a 1-0 Dodgers win.

On September 27, 2002, Paul LoDuca led off the bottom of the tenth inning with a home run off of San Diego's Jeremy Fikac to break a scoreless tie and give the Dodgers a walkoff 1-0 victory.

From a context-neutral perspective, Perez's and LoDuca's home runs were exactly equal in value: 0.1448 wins, since both home runs took place in the same run-scoring environment - Dodger Stadium in 2002. From a prospective perspective, on the other hand, LoDuca's home run, which ended the game, was more than twice as valuable, 0.4088 wins, as Perez's home run, at 0.1814 wins. These values correspond to a WPA valuation system: Baseball-Reference.com, for example, reports the WPA of these home runs as being 37% for LoDuca and 17% for Perez.

In retrospect, Perez's home run was more valuable than it seemed at the time, however, as it turned out to be the difference in the game. The normalization process which I employ reflects this by boosting the final value of this home run, relative to its prospective (WPA) values, to a final value of 0.2724 wins. On the other hand, LoDuca's home run is reduced in value retrospectively to 0.3619 wins since the high context at the time of LoDuca's home run was created in large part by the events which came before (most notably the fine pitching of LoDuca's teammates Omar Daal and Eric Gagne). The final result is that LoDuca's home run is still more valuable than Perez's, because it was a game-ending home run, while the Diamondbacks still had four more innings to come back from Perez's home run. But LoDuca's home run is not twice as valuable as Perez's but instead is less than one-third more valuable.

This is all summarized in the table below.

Date	Batter	Situation	eWins	Inter-Game	WPA (BB-Ref)	pWins	Inter-Game	Intra-Game	Combined
			Player Wins
				Prospective			Contexts
8/28/2002	Odalis Perez	Two out, bottom of 5th inning	0.1448	0.1814	0.17	0.2724	1.2527	1.5018	1.8813
9/27/2002	Paul LoDuca	Leading off bottom of 10th	0.1448	0.4088	0.37	0.3619	2.8231	0.8852	2.4991

I think that these results are a reasonable reflection of the relative value of these two home runs.

pWins vs. WPA, Example 3: Value of Breaking a Game Open Early vs. Coming up Clutch Late

In a separate article, I compare the 2005 seasons of David Ortiz and Alex Rodriguez when they finished 1-2 in voting for the AL MVP award.

David Ortiz's MVP candidacy that year rested in large part on his having been particularly clutch. For example, Big Papi led the American League in Win Probability Added (WPA) that season. WPA tracks what I call inter-game win adjustments. But I also adjust for what I call intra-game adjustments, which normalize the total number of pWins (and pLosses) to be constant across all team wins and losses.

While David Ortiz beat Alex Rodriguez (and everybody else) in inter-game win adjustments, Rodriguez beat Ortiz in intra-game adjustments. A comparison of two of their games is instructive in this regard.

On September 29, 2005, David Ortiz went 3-5 including a home run leading off the bottom of the 8th inning to tie the score 4-4 and a walkoff RBI single with one out in the bottom of the 9th inning. Baseball-Reference credits Ortiz with a WPA of 0.584 for the game. Obviously, those hits were huge for the Red Sox and Ortiz was rightly celebrated as the hero of that game.

On April 26, 2005, the Yankees defeated the Los Angeles Angels of Anaheim (or whatever they were calling themselves that season) 12-4. The Yankees took a 3-0 lead in the bottom of the first inning and led 10-2 by the end of the 4th inning. Obviously, there weren't a lot of "clutch" situations in this game. It was over early. Do you know why it was over early? Because Alex Rodriguez hit a 2-out, 3-run home run in the bottom of the first inning to give the Yankees that 3-0 lead, he hit a 2-out, 2-run home run in the bottom of the third inning to extend the Yankees' lead to 5-2, and he capped it off with a 2-out grand slam in the bottom of the 4th inning to give the Yankees that aforementioned 10-2 lead. For all of that, Baseball-Reference only credits Alex Rodriguez with a WPA of 0.490 for that game.

Take Ortiz's two RBIs off the scoreboard for the Red Sox in that September 29th game, and the Blue Jays would have won that game 4-3. Then again, if Ortiz made a (single) out in his final at-bat, Manny Ramirez would have come to bat with the potential winning run still in scoring position (albeit with two outs).

Take Rodriguez's ten RBIs off the scoreboard for the Yankees on April 26th and the Angels would have won that game 4-2. Moreover, all three of Rodriguez's home runs came with two outs in the inning. Turn them into outs and the Yankees would have had no further opportunities in any of those innings.

In retrospect, Alex Rodriguez's performance that day was not merely every bit as valuable as Ortiz's, but almost certainly more so, even if it was less "clutch" by a conventional inter-game "win probability" reckoning. My Player won-lost records credit David Ortiz with a batting won-lost record that day of 0.48 - 0.02, good for 0.46 net wins. Alex Rodriguez had a batting won-lost record on his big day of 0.91 - 0.02, good for 0.89 net wins.

As with the example of the home runs at Dodger Stadium, I think my adjustments result in more reasonable measures of the relative values of these two batting performances.

Pennants Won

The job of a Major League Baseball player is to help his team win games, for the ultimate purpose of making the playoffs and winning the World Series. That is my first sentence on this website's home page. And it's true. So, logically, if the goal of a team is to make the playoffs and win the World Series, shouldn't we be measuring how a player contributes to that goal, not simply the goal of winning individual games?

The concept of Win Probability that underlies my work here begins with the assumption that not all runs are created equal. A run in the ninth inning of a tie game is more valuable than a run in the ninth inning of a blowout. But, by the same token, not all wins are created equal. A win over your closest division rival when you're tied for the division lead with two games left in the season is obviously worth a lot more than a win over a 100-loss team in the middle of July. Given this, wouldn't it make sense to move from Win Probabilities to Pennant Probabilities, judging players based on how they increase their team's odds of winning the pennant?

There's certainly some merit to this argument. At a basic level, it is certainly logically compelling. I have chosen to not evaluate Pennant Probabilities here, however. My reason for this decision is a fundamental difference between Win Probabilities and Pennant Probabilities. This difference is best illustrated by an example.

From 1949 - 1958, the New York Yankees won 9 pennants. In those 9 pennant-winning seasons, they won between 92 and 99 games every year. The only season over that time period when the Yankees won over 100 games was in 1954, when the Yankees won 103 games. The 1954 Yankees, however, despite winning more games than any Yankee team between 1942 and 1961, was the only Yankees team between 1949 and 1958 to not win the pennant. Why? Because they had the misfortune to play in the same league as a Cleveland Indians team that won a then-American League record 111 games that year.

The 1953 New York Yankees split their season series with the Cleveland Indians, 11-11, but went 88-41 against the rest of the league, while the Indians went 81-51 against the same teams and finished 8-1/2 games behind the Yankees. The 1954 Yankees again split their season series with the Cleveland Indians, 11-11, and improved their record against the rest of the American League to 92-40. But the 1954 Indians went an astounding 100-32 against the rest of the American League and beat the Yankees by 8 games.

The 1954 Yankees did more to ensure themselves a pennant than the 1953 Yankees did. They played equally well against their chief rival and played better against the rest of the American League.

With Win Probabilities, the sum of the contributions of the players on a team will always add up to 50% when a team wins a game and to -50% when a team loses. Win Probabilities are a perfect accounting structure for team wins and losses.

Pennant Probabilities are not and cannot be a perfect accounting structure for team pennants. The 1954 Yankees did more to contribute to a pennant than the 1953 Yankees. The reason that the 1954 Yankees did not win a pennant had nothing to do with the 1954 Yankees and everything to do with the 1954 Cleveland Indians. The 1954 New York Yankees lost the pennant because the Cleveland Indians improved from 9-13 against the Boston Red Sox in 1953 to 20-2 against the Red Sox in 1954.

So who gets the Pennant Losses for the 1954 Yankees? The Red Sox? The 1954 Yankees won more games against the Red Sox (13) than had the 1953 pennant winners (11). The Indians? Certainly, the Indians deserve credit for doing what they needed to do to win the pennant in 1954. But the Yankees did more in 1954 than they had the year before too. From 1947 - 1967, the only American League team other than the Yankees to win more than 103 games was the 1954 Cleveland Indians. If the 1954 and 1959 Yankees teams (the latter of which went 79-75 and finished in 3rd place) traded places, the Yankees would have won one more pennant in the 1950s (the pennant-winning 1959 Chicago White Sox had the same record in both 1954 and 1959: 94-60).

If we think about the 1953 Yankees and the 1954 Yankees as both starting the season with a 12.5% (1 in 8) chance of winning the pennant, there's simply going to be no way to fairly allocate +87.5% Pennants Added to the 1953 Yankees and -12.5% Pennants Added to the 1954 Yankees, when the 1954 Yankees out-performed the 1953 Yankees in every meaningful way in which they controlled their own fate. Teams that split with their closest rival and play 0.669 baseball for the season will win the pennant far more often than they will lose it.

From 1900 - 1968 (when winning a pennant involved simply having the best regular season record), only 27 teams had better winning percentages than the 1954 Yankees. And of those 27 teams, only 2 - the 1909 Chicago Cubs and 1942 Brooklyn Dodgers - failed to win a pennant. In fact, after the Yankees you have to go down to the 45th best winning percentage during these years - the 1915 Detroit Tigers - to find the next best non-pennant winner. Moreover, in all three of these cases, the Cubs, Dodgers, and Tigers all lost the season series to the team that won the pennant (the Pirates, Cardinals, and Red Sox, respectively). The odds of the 1954 Yankees winning the pennant was certainly greater than 90% and given that they split with the Indians (and won the season series from everybody else) was probably greater than 95%.

But what good is a Pennants Added system that gives the 1954 Yankees the +80% or so Pennants Added that they likely deserve, when, after all, they didn't actually win the pennant? Not much, and that's why I don't use one.

Am I saying that players don't deserve extra credit for performing better against their team's top rivals? Not at all. I'm simply saying that I don't believe there is a simple objective means of evaluating how much credit that should be worth. Should players get bonus points for making the playoffs? All other things being equal, sure. Within a season, I can see valuing a playoff-making performance over a non-playoff performance.

But the 1980 Baltimore Orioles won 100 games while the 1987 Minnesota Twins won 85 games. Even though the Twins won a World Series, I find it hard to blame the Orioles for having the misfortune of being in the same division in the same year as a New York Yankees team that won 103 games (especially since the Orioles actually won their season series from the Yankees 7-6). I suppose you can blame the Orioles for not going 9-4 against the Yankees (which would have given them the division title with 102 wins to 101 for the Yankees) but ultimately the requirements for making the playoffs change every year and, more importantly, aren't known until the season is over.

And, how fair is a system that punishes the Orioles for only going 7-6 against their closest rival, while rewarding the Twins for going 5-8 against their closest rival (the Kansas City Royals, who would have beaten the Twins if they could have only gone 10-3 against them that year)?

As the season is unfolding, all a player can do is work toward winning the game at hand, taking the season day by day and game by game. And ultimately, that's what is reflected in the Player Won-Lost records that I have developed here.

All articles are written so that they pull data directly from the most recent version of the Player won-lost database. Hence, any numbers cited within these articles should automatically incorporate the most recent update to Player won-lost records. In some cases, however, the accompanying text may have been written based on previous versions of Player won-lost records. I apologize if this results in non-sensical text in any cases.

Home List of Articles