Event Probabilities
1. Stolen Bases, Caught Stealing, and Wild PitchesIn the case of stolen bases, caught stealings, wild pitches, and the like, unique probabilities are calculated for each of the 24 base-out states. The probabilities of positive stolen base events -- identified below as "stolen bases", but including advancements on errant pickoff attempts, errors on caught stealings, defensive indifference, and balks - caught stealing events (including pickoffs), and wild pitches / passed balls (I make no distinction between wild pitches and passed balls) by base-out state over the entire Retrosheet Era are presented below.
Outs | Baserunners | SB | CS | WP/PB |
0 | 0 | 0.0% | 0.0% | 0.0% |
0 | 1 | 5.3% | 3.1% | 1.8% |
0 | 2 | 1.2% | 0.5% | 2.1% |
0 | 3 | 0.2% | 0.1% | 1.6% |
0 | 1-2 | 1.5% | 1.3% | 2.1% |
0 | 1-3 | 4.0% | 1.1% | 2.1% |
0 | 2-3 | 0.2% | 0.1% | 1.5% |
0 | 1-2-3 | 0.2% | 0.1% | 1.4% |
1 | 0 | 0.0% | 0.0% | 0.1% |
1 | 1 | 5.8% | 3.7% | 1.9% |
1 | 2 | 1.9% | 0.9% | 2.3% |
1 | 3 | 0.3% | 0.6% | 1.6% |
1 | 1-2 | 1.8% | 1.4% | 2.2% |
1 | 1-3 | 4.7% | 1.8% | 2.1% |
1 | 2-3 | 0.2% | 0.3% | 1.4% |
1 | 1-2-3 | 0.2% | 0.3% | 1.4% |
2 | 0 | 0.0% | 0.0% | 0.1% |
2 | 1 | 6.7% | 3.7% | 1.8% |
2 | 2 | 1.0% | 0.3% | 2.1% |
2 | 3 | 0.4% | 0.2% | 1.6% |
2 | 1-2 | 1.0% | 0.5% | 2.0% |
2 | 1-3 | 7.1% | 2.0% | 2.1% |
2 | 2-3 | 0.3% | 0.2% | 1.5% |
2 | 1-2-3 | 0.3% | 0.2% | 1.4% |
2. Balls in PlayIn the case of balls in play, the probabilities of many things are dependent on the exact location of the ball and how it was hit. For example, the probability of driving in a runner from third is vastly different on a ground out to the pitcher (23.5% over the full Retrosheet Era) versus a fly out to center field (84.3% over the same time period). Hence, in theory, ball-in-play probabilities should be calculated for each unique location/hit type combination.
a. Batted Ball Results
My solution to the lack of location data for most years is to use the 1989 - 1999 event files to generate weights by location and to impute estimated locations based upon complete location data from 1989 - 1999 and what information is available in other event files. I discuss and my use of location data in this way in a separate article.
For calculating Player wins and losses, I divide balls in play based on the end result of the play - out, single, double, triple (batters reaching on fielding errors are treated as "singles" here) - and the first fielder to touch the ball. For outs, that is the player who recorded the first assist/unassisted putout. For hits, it is the fielder occupying the field to which the ball was hit - e.g., the left fielder for a single to left field (S7). If the hit type of the play is identified (bunt, ground ball, fly ball, line drive), this information is also used in determining the appropriate probabilities.
From the 1989 - 1999 data, I then calculate the expected location of the event. That is, for, say, a single to left field, I look at all singles to left field from 1989 - 1999 and calculate the probability that such a play was in each of the relevant locations. For example, the table below shows the distribution by location of line drive singles to left field. From the expected location, I then calculate the expected probabilities of all relevant events - probability of a base hit, probability of particular players making the out, probability of a runner scoring from second, etc. Data on hit type (ground ball, fly ball, etc.) is used for recent seasons for which hit-type data are available for all, or virtually all, plays (2003 onward). For earlier years (2000 - 2002 and pre-1989 seasons), data on hit type is generally not available, especially on base hits. Hence, these data are not used in these cases.
Line Drive Single to Left Field, 1989 - 1999
Total Balls in Play | Probability of Line Drive being a _ at given Location |
||||||
Location | Depth | Number | Percent | Out | Single | Double | Triple |
Unknown | Unknown | 292 | 0.83% | ||||
1 | Unknown | 5 | 0.01% | 73.14% | 26.08% | 0.78% | 0.00% |
34 | Unknown | 3 | 0.01% | 57.47% | 42.07% | 0.46% | 0.00% |
34 | Deep | 4 | 0.01% | 5.32% | 92.74% | 1.85% | 0.09% |
4 | Medium | 1 | 0.00% | 40.67% | 57.23% | 0.96% | 1.14% |
4 | Deep | 5 | 0.01% | 11.92% | 86.42% | 1.66% | 0.00% |
5 | Unknown | 181 | 0.52% | 77.34% | 8.07% | 14.59% | 0.00% |
5 | Shallow | 7 | 0.02% | 84.08% | 7.02% | 5.48% | 3.42% |
5 | Deep | 731 | 2.08% | 0.80% | 38.97% | 60.23% | 0.00% |
56 | Unknown | 1,093 | 3.11% | 61.11% | 38.65% | 0.24% | 0.00% |
56 | Shallow | 23 | 0.07% | 91.20% | 8.50% | 0.00% | 0.29% |
56 | Deep | 3,684 | 10.48% | 3.33% | 95.18% | 1.47% | 0.02% |
6 | Unknown | 177 | 0.50% | 88.83% | 10.82% | 0.35% | 0.00% |
6 | Shallow | 2 | 0.01% | 38.32% | 61.08% | 0.60% | 0.00% |
6 | Medium | 5 | 0.01% | 40.84% | 58.16% | 0.60% | 0.41% |
6 | Deep | 2,100 | 5.98% | 6.96% | 90.64% | 1.85% | 0.55% |
7 | Unknown | 2,755 | 7.84% | 29.09% | 41.85% | 28.33% | 0.72% |
7 | Shallow | 12,438 | 35.39% | 9.45% | 73.37% | 16.73% | 0.45% |
7 | Medium | 4,735 | 13.47% | 41.63% | 34.38% | 23.31% | 0.67% |
7 | Deep | 574 | 1.63% | 41.49% | 7.76% | 49.72% | 1.02% |
78 | Unknown | 651 | 1.85% | 21.75% | 46.59% | 30.71% | 0.94% |
78 | Shallow | 3,721 | 10.59% | 7.68% | 84.46% | 6.73% | 1.12% |
78 | Medium | 1,661 | 4.73% | 35.13% | 44.91% | 15.75% | 4.21% |
78 | Deep | 230 | 0.65% | 36.61% | 6.98% | 55.73% | 0.69% |
8 | Unknown | 4 | 0.01% | 36.54% | 58.08% | 3.82% | 1.55% |
8 | Shallow | 32 | 0.09% | 8.23% | 89.53% | 1.43% | 0.81% |
8 | Medium | 9 | 0.03% | 42.02% | 50.50% | 3.84% | 3.64% |
8 | Deep | 4 | 0.01% | 71.06% | 7.58% | 19.77% | 1.59% |
89 | Unknown | 1 | 0.00% | 22.87% | 50.25% | 25.12% | 1.75% |
89 | Shallow | 5 | 0.01% | 7.49% | 84.83% | 4.74% | 2.94% |
89 | Medium | 2 | 0.01% | 31.72% | 44.13% | 12.48% | 11.68% |
9 | Unknown | 3 | 0.01% | 32.53% | 40.79% | 20.63% | 6.05% |
9 | Shallow | 3 | 0.01% | 9.24% | 73.65% | 14.93% | 2.18% |
9 | Deep | 1 | 0.00% | 48.02% | 8.69% | 43.28% | 0.00% |
TOTALS | 35,142 | 100.00% | 18.85% | 64.20% | 16.24% | 0.71% |
b. Baserunner Outs/ Advancements
In addition to the probability of a ball-in-play becoming an out, single, double, or triple, and the probability the ball is played by each of the various fielders, the other events for which weights are needed are the probability of baserunner advancements and/or baserunning outs. Such probabilities are, of course, a function of the specific baserunner - batter, runner on first, runner on second, runner on third; the batting event - out, single, double, triple; and, where available, the hit type (bunt, ground ball, fly ball, line drive) of the ball. For these events, probabilities are simply calculated by direct observation for the baserunner/bat event/hit type combinations for which data are available. For earlier years, probabilities are calculated by hit type for outs where possible (i.e., unique probabilities are calculated for a baserunner scoring from third base on a ground out versus a fly out), but not for hits (when hit type data are not recorded for hits).
In the case of baserunner advancements on singles and doubles, separate probabilities are also calculated based on the number of outs (two versus less than two).
Event probabilities for a specific league are linked through the league stats page for the league. An example of the event probabilities for the 2011 American League can be found here.
Article revised on February 20, 2020
All articles are written so that they pull data directly from the most recent version of the Player won-lost database. Hence, any numbers cited within these articles should automatically incorporate the most recent update to Player won-lost records. In some cases, however, the accompanying text may have been written based on previous versions of Player won-lost records. I apologize if this results in non-sensical text in any cases.