Basic Formulas

One of the coolest formulas I’ve come across in sabermetrics is the Matchup Formula, sometimes called the Log_{5} Matchup Formula, which I learned from Bill James.

I’ll begin with the simplest version of the Matchup Formula. Let W_{1} be the winning percentage of Team 1 and W_{2} be the winning percentage of Team 2. The probability of Team 1 beating Team 2 is given by the following formula:

If a team with a 0.667 winning percentage faces a team with a 0.450 winning percentage, how often would you expect the 0.667 team to win?The answers to all of these questions can be solved with the Matchup Formula.

If a 0.300 hitter faces a pitcher with a 0.290 batting average against, in a league with a 0.250 batting average, how well do we expect the batter to hit?

If 49.0% of a particular type of ball were turned into outs in a league when 70.2% of all balls-in-play were turned into outs, what percentage of these would be outs in a league where only 69.4% of all balls-in-play were converted into outs?

1. Overview of Matchup Formula

I’ll begin with the simplest version of the Matchup Formula. Let W

Probability of Team 1 Winning = W_{1}•(1 – W_{2}) / [W_{1}•(1 – W_{2}) + W_{2}•(1 – W_{1})]

So, for example, a team with a 0.667 winning percentage will beat a team with a 0.450 winning percentage approximately
71.0% of the time.

The formula implicitly assumes that both of these teams faced average (or at least equivalent) opposition in compiling those winning percentages. So, what if Team 1 has a 0.667 winning percentage, but the average record of their opponents was only 0.440, while Team 2’s 0.450 winning percentage was amassed against opponents with an average winning percentage of 0.520?

Let W_{1} be Team 1’s winning percentage, O_{1} be the average record of Team 1’s opponents, W_{2} be Team 2’s winning percentage, and O_{2} be the average record of Team 2’s opponents. In this case, the probability of Team 1 beating Team 2 follows the same basic formula, but with a twist:

The formula implicitly assumes that both of these teams faced average (or at least equivalent) opposition in compiling those winning percentages. So, what if Team 1 has a 0.667 winning percentage, but the average record of their opponents was only 0.440, while Team 2’s 0.450 winning percentage was amassed against opponents with an average winning percentage of 0.520?

Let W

Probability of Team 1 Winning = W’_{1}•(1 – W’_{2}) / [W’_{1}•(1 – W’_{2}) + W’_{2}•(1 – W’_{1})]

where

W’_{1} = W_{1}•O_{1} / [W_{1}•O_{1} + (1- W_{1})•(1- O_{1})]

W’_{2} = W_{2}•O_{2} / [W_{2}•O_{2} + (1- W_{2})•(1- O_{2})]

Plugging in the numbers from above (0.667 against 0.440 opponents versus 0.450 against 0.520 opponents), we find that Team 1 has a 64.0% chance of defeating Team 2.

There is still one more additional piece of information. The formula so far assumes that all of the numbers within the formula are relative to a 0.500 context. What if we return to our batting average example from the first paragraph? If a 0.300 hitter faces a pitcher with a 0.290 batting average against in a league with a 0.250 batting average, how well do we expect the batter to hit? Relating this to our earlier formulae, the 0.300 corresponds to W_{1}, the 0.290 actually equals (1 – W_{2}) (W_{2} would be the pitcher’s success rate, which is 0.710 in this case). Let’s complicate this further and assume that the 0.300 hitter has faced pitchers with an average batting average against of 0.265 – so, O_{1} equals 0.735 here (1 – 0.265) – and the pitcher has faced opponents with an average batting average of 0.270 (which will equal O_{2}). Now, we have one more new piece of information – we’ll call it L – the league batting average, which is 0.250 in this example.

Let P_{1} equal the probability that Batter 1 gets a hit against Pitcher 2. Here, the Matchup Formula, in its entirety, becomes the following:

There is still one more additional piece of information. The formula so far assumes that all of the numbers within the formula are relative to a 0.500 context. What if we return to our batting average example from the first paragraph? If a 0.300 hitter faces a pitcher with a 0.290 batting average against in a league with a 0.250 batting average, how well do we expect the batter to hit? Relating this to our earlier formulae, the 0.300 corresponds to W

Let P

P_{0} = W’_{1}•(1 – W’_{2}) / [W’_{1}•(1 – W’_{2}) + W’_{2}•(1 – W’_{1})]

where

W’_{1} = W_{1}•O_{1} / [W_{1}•O_{1} + (1- W_{1})•(1- O_{1})]

W’_{2} = W_{2}•O_{2} / [W_{2}•O_{2} + (1- W_{2})•(1- O_{2})]

and

P_{1} = P_{0}•(1 – L) / [P_{0}•(1 – L) + L•(1 – P_{0})]

And, in our example, our 0.300 hitter would be expected to bat
0.304 against this particular pitcher in this particular league.

The probabilities that underlie the calculation of basic Player Game Points are dependent on the exact location of the ball and how it was hit. For example, the probability of driving in a runner from third is vastly different on a ground out to the pitcher (16.2% in the 2006 National League) versus a fly out to center field (84.7% in the 2006 National League). Hence, in theory, ball-in-play probabilities should be calculated for each unique location/hit type combination.

My data source is Retrosheet event files. The amount of information on locations and hit types provided by Retrosheet event files varies considerably by year. Full location and hit type information are available for most balls-in-play for the years 1989 – 1999. For other years, event probabilities are imputed based on final outcomes in the year of interest and location probabilities for the 1989 – 1998 period using the Matchup Formula. This process is described here .

Hopefully, one example will give some indication how this works. Based on 1989 – 1998 data, a line drive single to left field had an a priori probability of being an out of 18.81%. This is W_{1} in the Matchup Formula. Overall, line drives turned into outs 33.49% of the time between 1989 and 1998. This is equivalent to (1 – O_{1}) in the Matchup Formula. In the 2005 National League, line drives turned into outs 26.16% of the time. This becomes L, the new “league” context in which we are interested. In this case, it’s not a “matchup” per se, and W_{2} and O_{2} simply drop out of the equation (i.e., they’re set equal to each other).

Plugging in all of that, then, we would expect line drive singles to left field to have an a priori probability of having been outs of 13.75% in the 2005 National League.^{*}

^{*} Actually, if you plug the above numbers into the Matchup Formula, you get an a priori out probability on a line drive single to left field in the 2005 National League of
14.02%. If you do the same for the a priori probabilities of a single, double, and triple, however, the sum of these probabilities is 101.96%. The probabilities are then normalized to sum to 100%, which yields the final probability of an out of 13.75%.

For those components where multiple players share credit for Player Game Points, such as pitchers and catchers with respect to stolen bases, the relative credit is divided between the relevant players through a process described here .

The major drawback to Player Won-Lost records that are tied to team records as developed here is that, for a particular play, the pitcher and catcher are assumed to bear equal responsibility – not in terms of equivalent Player Game Points, but in terms of the fact that wins are credited to both pitchers and catchers for plays in which the defensive team earns wins and losses are debited to both pitchers and catchers for plays in which the defensive team earns losses. In reality, it is perfectly reasonable to envision a scenario whereby, for example, a pitcher does a terrible job of holding a baserunner on and is only saved by a perfect throw from the catcher to catch the runner stealing. In such a case, it may be more reasonable to credit the pitcher with a loss for his role in preventing stolen bases while crediting the catcher with more wins than he currently receives. Another example of this would be a catcher who, while normally excellent at preventing wild pitches and avoiding passed balls, has the misfortune of regularly catching a knuckleball pitcher.

In terms of Context-Dependent Wins and Losses (pWins/pLosses), where the object is to ensure that Player Wins and Losses relate perfectly to team wins and losses, such a situation is largely unavoidable. If one wants to neutralize individual player records in order to move beyond team records, however, then, at a seasonal level, one could use the Matchup Formula to adjust for the performance of the other players with whom a particular player shared credits.

Suppose, for example, that a pitcher compiled a Component 1 (basestealing) winning percentage of 0.515 but that the catchers with whom he shared that Component 1 credit compiled an average winning percentage (weighted by the number of Component 1 points which they shared with this particular pitcher) of 0.535.

In such a case, the Matchup Formula can be used to adjust the pitcher’s Component 1 winning percentage. Here, the pitcher’s winning percentage (0.515) would correspond to W_{1} in the Matchup Formula above. The average winning percentage of his catchers (0.535) would correspond to O_{1}, the context in which the pitcher performed. Plugging these values into the Matchup Formula would produce an adjusted
Component 1
winning percentage for this pitcher of 0.480.

In order to properly adjust both pitchers’ and catchers’ winning percentages in this way, one needs to use an iterative process. That is, one first adjusts pitchers’ winning percentages given the winning percentages of their catchers. One would then need to adjust catchers’ winning percentages given the adjusted winning percentages of their pitchers. Having adjusted the catchers’ winning percentages, however, one would want to re-estimate adjusted winning percentages for pitchers using these new adjusted catcher winning percentages. This process would continue until neither pitcher nor catcher winning percentages change between iterations. This process is repeated four times here. These results are used in constructing Context-Neutral, Teammate-Adjusted Player Won-Lost records (eWins and eLosses) as well as to determine the appropriate allocation of Player Game Points across players.

The final way in which the Matchup Formula could be useful in adjusting Player Game Points would be to adjust Player Game Points based on differences in the average level of competition faced by different players. That is, if two batters compiled identical offensive winning percentages (say 0.510), but one faced pitchers with an average winning percentage above 0.500 (say 0.505) and the other faced pitchers with an average winning percentage below 0.500 (say 0.495), the former batter would actually be a better hitter. The Matchup Formula, in fact, would say that the first batter (0.510 versus 0.505 pitchers) actually accumulated an adjusted winning percentage of 0.515 while the second batter, with a 0.510 winning percentage against 0.495 pitchers, accumulated an adjusted winning percentage of 0.505.

As with the shared Player Game Points, adjustments of this type would have to be made through an iterative process. As of now, I have not yet made any such adjustments.

2. Use of Matchup Formula to Estimate Event Weights

The probabilities that underlie the calculation of basic Player Game Points are dependent on the exact location of the ball and how it was hit. For example, the probability of driving in a runner from third is vastly different on a ground out to the pitcher (16.2% in the 2006 National League) versus a fly out to center field (84.7% in the 2006 National League). Hence, in theory, ball-in-play probabilities should be calculated for each unique location/hit type combination.

My data source is Retrosheet event files. The amount of information on locations and hit types provided by Retrosheet event files varies considerably by year. Full location and hit type information are available for most balls-in-play for the years 1989 – 1999. For other years, event probabilities are imputed based on final outcomes in the year of interest and location probabilities for the 1989 – 1998 period using the Matchup Formula. This process is described here .

Hopefully, one example will give some indication how this works. Based on 1989 – 1998 data, a line drive single to left field had an a priori probability of being an out of 18.81%. This is W

Plugging in all of that, then, we would expect line drive singles to left field to have an a priori probability of having been outs of 13.75% in the 2005 National League.

3. Use of Matchup Formula in Allocating Credit for Player Game Points

For those components where multiple players share credit for Player Game Points, such as pitchers and catchers with respect to stolen bases, the relative credit is divided between the relevant players through a process described here .

The major drawback to Player Won-Lost records that are tied to team records as developed here is that, for a particular play, the pitcher and catcher are assumed to bear equal responsibility – not in terms of equivalent Player Game Points, but in terms of the fact that wins are credited to both pitchers and catchers for plays in which the defensive team earns wins and losses are debited to both pitchers and catchers for plays in which the defensive team earns losses. In reality, it is perfectly reasonable to envision a scenario whereby, for example, a pitcher does a terrible job of holding a baserunner on and is only saved by a perfect throw from the catcher to catch the runner stealing. In such a case, it may be more reasonable to credit the pitcher with a loss for his role in preventing stolen bases while crediting the catcher with more wins than he currently receives. Another example of this would be a catcher who, while normally excellent at preventing wild pitches and avoiding passed balls, has the misfortune of regularly catching a knuckleball pitcher.

In terms of Context-Dependent Wins and Losses (pWins/pLosses), where the object is to ensure that Player Wins and Losses relate perfectly to team wins and losses, such a situation is largely unavoidable. If one wants to neutralize individual player records in order to move beyond team records, however, then, at a seasonal level, one could use the Matchup Formula to adjust for the performance of the other players with whom a particular player shared credits.

Suppose, for example, that a pitcher compiled a Component 1 (basestealing) winning percentage of 0.515 but that the catchers with whom he shared that Component 1 credit compiled an average winning percentage (weighted by the number of Component 1 points which they shared with this particular pitcher) of 0.535.

In such a case, the Matchup Formula can be used to adjust the pitcher’s Component 1 winning percentage. Here, the pitcher’s winning percentage (0.515) would correspond to W

In order to properly adjust both pitchers’ and catchers’ winning percentages in this way, one needs to use an iterative process. That is, one first adjusts pitchers’ winning percentages given the winning percentages of their catchers. One would then need to adjust catchers’ winning percentages given the adjusted winning percentages of their pitchers. Having adjusted the catchers’ winning percentages, however, one would want to re-estimate adjusted winning percentages for pitchers using these new adjusted catcher winning percentages. This process would continue until neither pitcher nor catcher winning percentages change between iterations. This process is repeated four times here. These results are used in constructing Context-Neutral, Teammate-Adjusted Player Won-Lost records (eWins and eLosses) as well as to determine the appropriate allocation of Player Game Points across players.

4. Adjusting Player Game Points for the Level of Competition

The final way in which the Matchup Formula could be useful in adjusting Player Game Points would be to adjust Player Game Points based on differences in the average level of competition faced by different players. That is, if two batters compiled identical offensive winning percentages (say 0.510), but one faced pitchers with an average winning percentage above 0.500 (say 0.505) and the other faced pitchers with an average winning percentage below 0.500 (say 0.495), the former batter would actually be a better hitter. The Matchup Formula, in fact, would say that the first batter (0.510 versus 0.505 pitchers) actually accumulated an adjusted winning percentage of 0.515 while the second batter, with a 0.510 winning percentage against 0.495 pitchers, accumulated an adjusted winning percentage of 0.505.

As with the shared Player Game Points, adjustments of this type would have to be made through an iterative process. As of now, I have not yet made any such adjustments.