One of the coolest formulas I’ve come across in sabermetrics is the Matchup Formula, sometimes called the Log5 Matchup Formula, which I learned from Bill James.
If a team with a 0.667 winning percentage faces a team with a 0.450 winning percentage, how often would you expect the 0.667 team to win?The answers to all of these questions can be solved with the Matchup Formula.
If a 0.300 hitter faces a pitcher with a 0.290 batting average against, in a league with a 0.250 batting average, how well do we expect the batter to hit?
If 49.0% of a particular type of ball were turned into outs in a league when 70.2% of all balls-in-play were turned into outs, what percentage of these would be outs in a league where only 69.4% of all balls-in-play were converted into outs?
1.  Overview of Matchup Formula
I’ll begin with the simplest version of the Matchup Formula. Let W1 be the winning percentage of Team 1 and W2 be the winning percentage of Team 2. The probability of Team 1 beating Team 2 is given by the following formula:
Probability of Team 1 Winning = W1•(1 – W2) / [W1•(1 – W2) + W2•(1 – W1)]
So, for example, a team with a 0.667 winning percentage will beat a team with a 0.450 winning percentage approximately
71.0% of the time.
The formula implicitly assumes that both of these teams faced average (or at least equivalent) opposition in compiling those winning percentages. So, what if Team 1 has a 0.667 winning percentage, but the average record of their opponents was only 0.440, while Team 2’s 0.450 winning percentage was amassed against opponents with an average winning percentage of 0.520?
Let W1 be Team 1’s winning percentage, O1 be the average record of Team 1’s opponents, W2 be Team 2’s winning percentage, and O2 be the average record of Team 2’s opponents. In this case, the probability of Team 1 beating Team 2 follows the same basic formula, but with a twist:
Probability of Team 1 Winning = W’1•(1 – W’2) / [W’1•(1 – W’2) + W’2•(1 – W’1)]
W’1 = W1•O1 / [W1•O1 + (1- W1)•(1- O1)]
Plugging in the numbers from above (0.667 against 0.440 opponents versus 0.450 against 0.520 opponents), we find that Team 1 has a 64.0% chance of defeating Team 2.
W’2 = W2•O2 / [W2•O2 + (1- W2)•(1- O2)]
There is still one more additional piece of information. The formula so far assumes that all of the numbers within the formula are relative to a 0.500 context. What if we return to our batting average example from the first paragraph? If a 0.300 hitter faces a pitcher with a 0.290 batting average against in a league with a 0.250 batting average, how well do we expect the batter to hit? Relating this to our earlier formulae, the 0.300 corresponds to W1, the 0.290 actually equals (1 – W2) (W2 would be the pitcher’s success rate, which is 0.710 in this case). Let’s complicate this further and assume that the 0.300 hitter has faced pitchers with an average batting average against of 0.265 – so, O1 equals 0.735 here (1 – 0.265) – and the pitcher has faced opponents with an average batting average of 0.270 (which will equal O2). Now, we have one more new piece of information – we’ll call it L – the league batting average, which is 0.250 in this example.
Let P1 equal the probability that Batter 1 gets a hit against Pitcher 2. Here, the Matchup Formula, in its entirety, becomes the following:
P0 = W’1•(1 – W’2) / [W’1•(1 – W’2) + W’2•(1 – W’1)]
W’1 = W1•O1 / [W1•O1 + (1- W1)•(1- O1)]
W’2 = W2•O2 / [W2•O2 + (1- W2)•(1- O2)]
P1 = P0•(1 – L) / [P0•(1 – L) + L•(1 – P0)]
And, in our example, our 0.300 hitter would be expected to bat
0.304 against this particular pitcher in this particular league.
2.  Use of Matchup Formula to Estimate Event Weights
The probabilities that underlie the calculation of basic Player Game Points are dependent on the exact location of the ball and how it was hit. For example, the probability of driving in a runner from third is vastly different on a ground out to the pitcher (16.5% in the 2006 National League) versus a fly out to center field (84.5% in the 2006 National League). Hence, in theory, ball-in-play probabilities should be calculated for each unique location/hit type combination.
My data source is Retrosheet event files. The amount of information on locations and hit types provided by Retrosheet event files varies considerably by year. Full location and hit type information are available for most balls-in-play for the years 1989 – 1999. For other years, event probabilities are imputed based on final outcomes in the year of interest and location probabilities for the 1989 – 1998 period using the Matchup Formula. This process is described
Hopefully, one example will give some indication how this works. Based on 1989 – 1998 data, a line drive single to left field had an a priori probability of being an out of 18.81%. This is W1 in the Matchup Formula. Overall, line drives turned into outs 33.49% of the time between 1989 and 1998. This is equivalent to (1 – O1) in the Matchup Formula. In the 2005 National League, line drives turned into outs 26.16% of the time. This becomes L, the new “league” context in which we are interested. In this case, it’s not a “matchup” per se, and W2 and O2 simply drop out of the equation (i.e., they’re set equal to each other).
Plugging in all of that, then, we would expect line drive singles to left field to have an a priori probability of having been outs of 13.75% in the 2005 National League.*
* Actually, if you plug the above numbers into the Matchup Formula, you get an a priori out probability on a line drive single to left field in the 2005 National League of
14.02%. If you do the same for the a priori probabilities of a single, double, and triple, however, the sum of these probabilities is 101.96%. The probabilities are then normalized to sum to 100%, which yields the final probability of an out of 13.75%.
3.  Use of Matchup Formula in Allocating Credit for Player Game Points
For those components where multiple players share credit for Player Game Points, such as pitchers and catchers with respect to stolen bases, the relative credit is divided between the relevant players through a process described
The major drawback to Player Won-Lost records that are tied to team records as developed here is that, for a particular play, the pitcher and catcher are assumed to bear equal responsibility – not in terms of equivalent Player Game Points, but in terms of the fact that wins are credited to both pitchers and catchers for plays in which the defensive team earns wins and losses are debited to both pitchers and catchers for plays in which the defensive team earns losses. In reality, it is perfectly reasonable to envision a scenario whereby, for example, a pitcher does a terrible job of holding a baserunner on and is only saved by a perfect throw from the catcher to catch the runner stealing. In such a case, it may be more reasonable to credit the pitcher with a loss for his role in preventing stolen bases while crediting the catcher with more wins than he currently receives. Another example of this would be a catcher who, while normally excellent at preventing wild pitches and avoiding passed balls, has the misfortune of regularly catching a knuckleball pitcher.
In terms of Context-Dependent Wins and Losses (pWins/pLosses), where the object is to ensure that Player Wins and Losses relate perfectly to team wins and losses, such a situation is largely unavoidable. If one wants to neutralize individual player records in order to move beyond team records, however, then, at a seasonal level, one could use the Matchup Formula to adjust for the performance of the other players with whom a particular player shared credits.
Suppose, for example, that a pitcher compiled a
(basestealing) winning percentage of 0.515 but that the catchers with whom he shared that
credit compiled an average winning percentage (weighted by the number of
points which they shared with this particular pitcher) of 0.535.
In such a case, the Matchup Formula can be used to adjust the pitcher’s
winning percentage. Here, the pitcher’s winning percentage (0.515) would correspond to W1 in the Matchup Formula above. The average winning percentage of his catchers (0.535) would correspond to O1, the context in which the pitcher performed. Plugging these values into the Matchup Formula would produce an adjusted
winning percentage for this pitcher of 0.480.
In order to properly adjust both pitchers’ and catchers’ winning percentages in this way, one needs to use an iterative process. That is, one first adjusts pitchers’ winning percentages given the winning percentages of their catchers. One would then need to adjust catchers’ winning percentages given the adjusted winning percentages of their pitchers. Having adjusted the catchers’ winning percentages, however, one would want to re-estimate adjusted winning percentages for pitchers using these new adjusted catcher winning percentages. This process would continue until neither pitcher nor catcher winning percentages change between iterations. This process is repeated four times here. These results are used in constructing Context-Neutral, Teammate-Adjusted Player Won-Lost records (eWins and eLosses) as well as to determine the appropriate allocation of Player Game Points across players.
4.  Adjusting Player Game Points for the Level of Competition
The final way in which the Matchup Formula could be useful in adjusting Player Game Points would be to adjust Player Game Points based on differences in the average level of competition faced by different players. That is, if two batters compiled identical offensive winning percentages (say 0.510), but one faced pitchers with an average winning percentage above 0.500 (say 0.505) and the other faced pitchers with an average winning percentage below 0.500 (say 0.495), the former batter would actually be a better hitter. The Matchup Formula, in fact, would say that the first batter (0.510 versus 0.505 pitchers) actually accumulated an adjusted winning percentage of 0.515 while the second batter, with a 0.510 winning percentage against 0.495 pitchers, accumulated an adjusted winning percentage of 0.505.
As with the shared Player Game Points, adjustments of this type would have to be made through an iterative process. As of now, I have not yet made any such adjustments.
All articles are written so that they pull data directly from the most recent version of the Player won-lost database. Hence, any numbers cited within these articles should automatically incorporate the most recent update to Player won-lost records. In some cases, however, the accompanying text may have been written based on previous versions of Player won-lost records. I apologize if this results in non-sensical text in any cases.
List of Articles