Baseball Player Won-Loss Records
Home     List of Articles



Event Probabilities

The key to properly assessing Player Wins and Losses is to credit each player with the change in Win Probability attributable to him, assuming average performances by all other players. In order to do this, it is necessary to determine what the probability is of certain events occurring, such as the probability of a baserunner advancing from 1st to 3rd on a single, the probability of a particular ball-in-play becoming an out versus a single versus an extra-base hit, and many other similar probabilities.

Wherever possible, the probabilities of relevant events are initially calculated by direct observation - that is, the probability of a baserunner being caught trying to steal second is calculated by simply summing up all of the runners caught stealing second and dividing by the total number of possible runners.

In some cases, however, because of differences in the underlying context in which events occur (e.g., caught stealings may occur in a higher average context than not-caught stealings), direct probabilities may produce more Player Wins than Player Losses (or vice versa) for a particular component or sub-component of Player Game Points. To help to maintain the underlying assumption that the overall winning percentage within a particular season should be 0.500 for every component and sub-component of interest, my results are therefore refined by scaling them back to 0.500 at the aggregate level.

1.    Stolen Bases, Caught Stealing, and Wild Pitches
In the case of stolen bases, caught stealings, wild pitches, and the like, unique probabilities are calculated for each of the 24 base-out states. The probabilities of positive stolen base events -- identified below as "stolen bases", but including advancements on errant pickoff attempts, errors on caught stealings, defensive indifference, and balks - caught stealing events (including pickoffs), and wild pitches / passed balls (I make no distinction between wild pitches and passed balls) by base-out state over the entire Retrosheet Era are presented below.

The specific probabilities actually used in calculating Player Won-Lost records are uniquely determined for each league and each season.

Outs Baserunners SB CS WP/PB
0 0 0.0%0.0%0.0%
0 1 5.3%3.1%1.8%
0 2 1.2%0.5%2.1%
0 3 0.2%0.1%1.6%
0 1-2 1.5%1.3%2.1%
0 1-3 4.0%1.1%2.1%
0 2-3 0.2%0.1%1.5%
0 1-2-3 0.2%0.1%1.4%
1 0 0.0%0.0%0.1%
1 1 5.8%3.7%1.9%
1 2 1.9%0.9%2.3%
1 3 0.3%0.6%1.6%
1 1-2 1.8%1.4%2.2%
1 1-3 4.7%1.8%2.1%
1 2-3 0.2%0.3%1.4%
1 1-2-3 0.2%0.3%1.4%
2 0 0.0%0.0%0.1%
2 1 6.7%3.7%1.8%
2 2 1.0%0.3%2.1%
2 3 0.4%0.2%1.6%
2 1-2 1.0%0.5%2.0%
2 1-3 7.1%2.0%2.1%
2 2-3 0.3%0.2%1.5%
2 1-2-3 0.3%0.2%1.4%


Except for the case where the bases are initially empty*, wild pitches and passed balls are somewhat uniform across base-out states, occurring between 1.5 and 2.5 percent of the time.
*Wild pitches and/or passed balls with the bases empty represent cases where the batter reached first base safely on a dropped third strike.

In general, wild pitches and passed balls are least common when there is a runner on third base who must score for any runners to advance (i.e., excluding runners on first and third), occurring 1.5% of the time, versus 2.0% of the time otherwise.

Stolen bases and caught stealings are much more dependent on both the position of the baserunners as well as on the number of outs than are wild pitches and passed balls.

Stolen base attempts of second base increase in frequency with the number of outs. With only a runner on first base, stolen base attempts (Stolen Bases plus Caught Stealings above) are somewhat less common with nobody out (8.4%) than with one or two outs (9.9%). Stolen base attempts of third base are far more frequent with one out (2.8% for a runner on second base only, 3.2% for runners on first and second) than with either zero (1.7% for a runner on second base only, 2.8% for runners on first and second) or two outs (1.3%, 1.4%). Apparently, baseball players and managers take seriously the old adage, "Don't make the first or last out at third base."

Stolen base success rates are much greater with two outs (e.g., 64.4% with a runner on first base only, 79.0% with a runner on second base) than with zero or one outs. Overall, over the time period studied here, stolen base success rates were 67.1% with two outs, 61.8% with one out, and 63.2% with no outs.

2.    Balls in Play
In the case of balls in play, the probabilities of many things are dependent on the exact location of the ball and how it was hit. For example, the probability of driving in a runner from third is vastly different on a ground out to the pitcher (23.5% over the full Retrosheet Era) versus a fly out to center field (84.3% over the same time period). Hence, in theory, ball-in-play probabilities should be calculated for each unique location/hit type combination.

My data source is Retrosheet event files. Some Retrosheet event files provide some data on the location of balls in play and hit types. Retrosheet event locations are described here. For my purposes, I consider there to be 17 locations, defined by the fielder(s) nearest to the play -- 1, 13, 15, 2, 23, 25, 3, 34, 4, 5, 56, 6, 7, 78, 8, 89, and 9 - three possible depths of the hit - shallow (S), medium (M), and deep (D) - and four hit types - bunts (B), ground balls (G), fly balls (F or P), and line drives (L).

The exact level of detail available from Retrosheet with respect to locations and hit types for balls in play varies a great deal over time. As of my writing this, complete (or very nearly complete) location data are only available from Retrosheet from 1989 through 1999. For most balls in play, more recent event files only provide very limited location data. For example, an event file may identify a single as being a single to left field (S7). But there's a big difference between a ground ball through the hole at shortstop and a line drive off the left field wall that the left fielder manages to play well enough to hold the batter at first base.

Hit type data - i.e., bunts vs. ground balls vs. fly balls vs. line drives - are available for non-outs (i.e., singles, doubles, and triples) from 2003 onward. For other years (except for 1989 - 1999, where complete hit type data are available), hit type data are mostly limited only to outs (i.e., ground outs are distinguished from fly outs). Unfortunately, as we go farther back in time, Retrosheet play-by-play data tell us less and less. Prior to 1988, it is common (but not universal) for play-by-play data to not even identify the fielder on base hits (i.e., singles to left field are not distinguished from singles to right field). Going further back, there are even some outs for which the fielder is not identified.

a.    Batted Ball Results

My solution to the lack of location data for most years is to use the 1989 - 1999 event files to generate weights by location and to impute estimated locations based upon complete location data from 1989 - 1999 and what information is available in other event files. I discuss and my use of location data in this way in a separate article.

For calculating Player wins and losses, I divide balls in play based on the end result of the play - out, single, double, triple (batters reaching on fielding errors are treated as "singles" here) - and the first fielder to touch the ball. For outs, that is the player who recorded the first assist/unassisted putout. For hits, it is the fielder occupying the field to which the ball was hit - e.g., the left fielder for a single to left field (S7). If the hit type of the play is identified (bunt, ground ball, fly ball, line drive), this information is also used in determining the appropriate probabilities.

From the 1989 - 1999 data, I then calculate the expected location of the event. That is, for, say, a single to left field, I look at all singles to left field from 1989 - 1999 and calculate the probability that such a play was in each of the relevant locations. For example, the table below shows the distribution by location of line drive singles to left field. From the expected location, I then calculate the expected probabilities of all relevant events - probability of a base hit, probability of particular players making the out, probability of a runner scoring from second, etc. Data on hit type (ground ball, fly ball, etc.) is used for recent seasons for which hit-type data are available for all, or virtually all, plays (2003 onward). For earlier years (2000 - 2002 and pre-1989 seasons), data on hit type is generally not available, especially on base hits. Hence, these data are not used in these cases.

Line Drive Single to Left Field, 1989 - 1999

Total Balls in Play Probability of Line Drive being a _
at given Location
Location Depth Number Percent Out Single Double Triple
Unknown Unknown 292 0.83%
1 Unknown 5 0.01% 73.14% 26.08% 0.78% 0.00%
34 Unknown 3 0.01% 57.47% 42.07% 0.46% 0.00%
34 Deep 4 0.01% 5.32% 92.74% 1.85% 0.09%
4 Medium 1 0.00% 40.67% 57.23% 0.96% 1.14%
4 Deep 5 0.01% 11.92% 86.42% 1.66% 0.00%
5 Unknown 181 0.52% 77.34% 8.07% 14.59% 0.00%
5 Shallow 7 0.02% 84.08% 7.02% 5.48% 3.42%
5 Deep 731 2.08% 0.80% 38.97% 60.23% 0.00%
56 Unknown 1,093 3.11% 61.11% 38.65% 0.24% 0.00%
56 Shallow 23 0.07% 91.20% 8.50% 0.00% 0.29%
56 Deep 3,684 10.48% 3.33% 95.18% 1.47% 0.02%
6 Unknown 177 0.50% 88.83% 10.82% 0.35% 0.00%
6 Shallow 2 0.01% 38.32% 61.08% 0.60% 0.00%
6 Medium 5 0.01% 40.84% 58.16% 0.60% 0.41%
6 Deep 2,100 5.98% 6.96% 90.64% 1.85% 0.55%
7 Unknown 2,755 7.84% 29.09% 41.85% 28.33% 0.72%
7 Shallow 12,438 35.39% 9.45% 73.37% 16.73% 0.45%
7 Medium 4,735 13.47% 41.63% 34.38% 23.31% 0.67%
7 Deep 574 1.63% 41.49% 7.76% 49.72% 1.02%
78 Unknown 651 1.85% 21.75% 46.59% 30.71% 0.94%
78 Shallow 3,721 10.59% 7.68% 84.46% 6.73% 1.12%
78 Medium 1,661 4.73% 35.13% 44.91% 15.75% 4.21%
78 Deep 230 0.65% 36.61% 6.98% 55.73% 0.69%
8 Unknown 4 0.01% 36.54% 58.08% 3.82% 1.55%
8 Shallow 32 0.09% 8.23% 89.53% 1.43% 0.81%
8 Medium 9 0.03% 42.02% 50.50% 3.84% 3.64%
8 Deep 4 0.01% 71.06% 7.58% 19.77% 1.59%
89 Unknown 1 0.00% 22.87% 50.25% 25.12% 1.75%
89 Shallow 5 0.01% 7.49% 84.83% 4.74% 2.94%
89 Medium 2 0.01% 31.72% 44.13% 12.48% 11.68%
9 Unknown 3 0.01% 32.53% 40.79% 20.63% 6.05%
9 Shallow 3 0.01% 9.24% 73.65% 14.93% 2.18%
9 Deep 1 0.00% 48.02% 8.69% 43.28% 0.00%
TOTALS 35,142 100.00% 18.85% 64.20% 16.24% 0.71%


As the above example shows, a line drive single to left field had an a priori probability of becoming an out of 18.85%. For fly balls, the a priori probability of becoming an out was 84.05%. For ground balls, the a priori probability of becoming an out was 55.23%. Overall, the a priori probability of a single to left field becoming an out, regardless of hit type, was 37.41%.

The individual probabilities of fielders having made an out on a play which becomes a generic single to left field ranged from 46.4% by the left fielder to 26.1% by the shortstop, 23.9% by the third baseman, 3.1% by the center fielder, and 0.4% by other fielders. For ground-ball singles, the relevant probabilities are 55.8% for the third baseman and 43.6% for the shortstop. For fly ball singles, on the other hand, left fielders would be expected to convert 74.9% of all outs (74.3% for line drives) with center fielders expected to convert 9.8% (4.4% for line drives).

For the probabilities of the play becoming an out, single, double, or triple, one further adjustment is made for recent years. For recent years, while these various weights are not known for specific locations, they are, of course, known for the league as a whole. For example, in the 2005 American League, 32.09% of all line drives (excluding home runs) were converted into outs. From 1989 - 1999, however, 33.41% of line drives were converted into outs. The weights used for other leagues are adjusted to tie the aggregate weights to the actual league-wide percentages. This is done through the use of the Matchup Formula.

b.    Baserunner Outs/ Advancements

In addition to the probability of a ball-in-play becoming an out, single, double, or triple, and the probability the ball is played by each of the various fielders, the other events for which weights are needed are the probability of baserunner advancements and/or baserunning outs. Such probabilities are, of course, a function of the specific baserunner - batter, runner on first, runner on second, runner on third; the batting event - out, single, double, triple; and, where available, the hit type (bunt, ground ball, fly ball, line drive) of the ball. For these events, probabilities are simply calculated by direct observation for the baserunner/bat event/hit type combinations for which data are available. For earlier years, probabilities are calculated by hit type for outs where possible (i.e., unique probabilities are calculated for a baserunner scoring from third base on a ground out versus a fly out), but not for hits (when hit type data are not recorded for hits).

In the case of baserunner advancements on singles and doubles, separate probabilities are also calculated based on the number of outs (two versus less than two).

Event probabilities for a specific league are linked through the league stats page for the league. An example of the event probabilities for the 2011 American League can be found here.

Article revised on February 20, 2020



All articles are written so that they pull data directly from the most recent version of the Player won-lost database. Hence, any numbers cited within these articles should automatically incorporate the most recent update to Player won-lost records. In some cases, however, the accompanying text may have been written based on previous versions of Player won-lost records. I apologize if this results in non-sensical text in any cases.

Home     List of Articles