Baseball Player Won-Loss Records
Home     List of Articles

Shared Credit

In many cases, it is not clear exactly who should get credit for a particular play. For example, pitchers and catchers share responsibility for Component 1 (basestealing) Player decisions. The allocation of Player decisions in these cases is done based on the relative skill level apparent by the relevant players.

The technique outlined here is used to divide responsibility between pitchers and catchers for Component 1 (basestealing) and Component 2 (wild pitches and passed balls) Player decisions, between pitchers and fielders for Components 5 (hits vs. outs), 6 (single vs. double vs. triple), and 7 (double plays), and between batters and baserunners for Components 7, 8 (baserunner outs), and 9 (baserunner advancements).

The division of Component 1 Player decisions between pitchers and catchers is used here as an illustration of the general technique.

1.    Basic Theory
How does one determine how to divide credit between pitchers and catchers for Component 1 (basestealing) Player decisions?

Let’s begin by asking, what if somebody deserved no credit for a particular component of Player decisions but we allocated Player decisions to them anyway? For example, what if we assigned Component 1 Player decisions to the defensive team’s right fielder? What would we expect Component 1 Player decisions to look like in that case? Essentially, we would expect every right fielder to have a Component 1 winning percentage of 0.500 plus or minus some random variation.

Suppose we were to try to predict a right fielder’s Component 1 winning percentage over some time period based on his Component 1 winning percentage over some other time period. We would expect, in such a persistence equation, for there to be no predictive ability of this component.

Alternately, what would we expect Component 1 Player decisions to look like if we assigned them to players who had different levels of talent in terms of affecting the opponents’ basestealing? In such a case, we would expect a player’s Component 1 winning percentage to be equal to his “true” winning percentage (his “true-talent”) plus or minus some random variation and for a player’s Component 1 winning percentage over some time period to have significant predictive capacity over other time periods.

In other words, the extent to which a player’s winning percentage at some point in time is predictive of his winning percentage at some other point is suggestive of the extent to which there is a true skill involved in a particular component. Based on this, Player wins and losses are allocated in proportion to the extent to which a player’s winning percentage has predictive power.

2.    Mathematics
The basis for dividing shared Player decisions is Persistence Equations. I divide the plays that took place in a particular season into two pools: odd and even. To evaluate the persistence of skills, I then fit a simple equation which attempts to explain winning percentage by component on even plays as a function of the same factor for odd plays:

(Win %)Even = b•(Win %)Odd + (1-b)•(Win %)Baseline

where (Win %)Baseline represents a baseline toward which Component winning percentage regresses over time.

The coefficient b in the persistence equation measures the persistence of Component winning percentage between the two samples (even plays v. odd plays) and, hence, the extent to which Component winning percentage is a true “skill” for the relevant set of players being evaluated.

This equation is estimated using a Weighted Least Squares technique which weights observations by the harmonic mean of the number of games over which the even and odd winning percentages have been compiled squared.

3.    Complication: Controlling for the Talent of the Other Players Involved
Earlier, I identified a defensive team’s right fielder as an example of a player for whom we would expect his Component 1 winning percentage to simply be randomly distributed. In fact, however, some of you might have seen a flaw in my example.

In 2004, the Montreal Expos allowed only 58 stolen bases on the season, while catching 41 opposing baserunners attempting to steal. Based on this, the Montreal Expos compiled a team-wide Component 1.1 (basestealing by runners on first base) winning percentage of 0.645. Of course, this means that Expos right-fielders would have a combined Component 1.1 winning percentage of 0.645, not 0.500, not because Expos right fielders had some innate ability to prevent the other team from stealing bases, but because they had the good fortune to be teammates with Brian Schneider, who amassed an unadjusted Component 1.1 winning percentage of 0.660 at catcher.

On the other hand, the 2002 New York Mets allowed 151 stolen bases against only 53 caught stealing, leading to a team-wide context-neutral Component 1.1 winning percentage of 0.431, due, in part, to the notorious problems of their catcher, Mike Piazza, who allowed 125 stolen bases (which led the National League) against 27 caught stealing in 121 games caught, for a context-neutral Component 1.1 winning percentage of 0.320.

Unfortunately, this problem with attempting to measure “true-talent” Component 1 winning percentage is not limited to outfielders, where we know that no such talent exists. In fact, on average, the context-neutral Component 1.1 winning percentage for Montreal Expos pitchers in 2004 was 0.645, not necessarily because Expos pitchers were particularly adept at holding runners on base, but, in large part, because Brian Schneider was their catcher. Yet, pitchers do have some ability here. The key is to separate the ability of Montreal Expos pitchers from the ability of Montreal Expos catchers.

The first step before one can accurately assess “true-talent” Component 1 winning percentages is to adjust player winning percentages for the context in which these percentages were amassed. Specifically, pitchers’ Component 1 winning percentages are adjusted to control for the Component 1 winning percentages of their catchers, and catchers’ Component 1 winning percentages are adjusted to control for the Component 1 winning percentages of their pitchers. Similar adjustments are done for all Components for which Player Game Points are to be shared.

This is done iteratively. First, pitchers’ Component 1 winning percentages are adjusted to control for the Component 1 winning percentages of their catchers. This is done using the Matchup Formula.

After pitchers’ winning percentages are adjusted based on catcher winning percentages, catcher winning percentages are then adjusted based on these newly-adjusted pitcher winning percentages. Ideally, one would probably prefer to continue the iterative process until all Component 1 winning percentages do not change between iterations. For computational simplicity, I simply repeated this process three more times for both pitchers and catchers.

Returning to the earlier examples, the adjusted Component 1.1 winning percentages for Montreal Expos pitchers was 0.539 in 2004 (versus 0.645 unadjusted), while Montreal Expos catchers put up a combined adjusted Component 1.1 winning percentage of 0.641 (versus 0.645 unadjusted). Here, because Expos pitchers and catchers were both above-average in this component in 2004, their combined winning percentage ends up being greater than either of their individual winning percentages. The whole is greater than the sum of the parts.

For the 2002 New York Mets, their pitchers’ adjusted winning percentage was 0.519 (versus 0.431 unadjusted) while Mets’ catchers had an adjusted winning percentage of 0.415 (0.320 for Mike Piazza and 0.708 for other Mets’ catchers). Mets pitchers weren’t bad at preventing stolen bases in 2002; they simply had the misfortune of pitching to one of the worst catchers in modern times at stopping an opponent’s running game.

The Persistence Equations by which Shared Player Wins and Losses are calculated are estimated using component winning percentages which have been adjusted in this way for the winning percentages of players’ teammates.

4.    Example Persistence Equations
Persistence equations are estimated using all of the seasons for which I have estimated Player won-lost records, which model player winning percentage for the Component of interest on even-numbered plays as a function of player winning percentage for the Component of interest on odd-numbered plays:

(Component Win Pct)Even = b•(Component Win Pct)Odd + (1-b)•(WinPct)Baseline

where (WinPct)Baseline represents a baseline winning percentage toward which Component winning percentages regress over time.

The results for Component 1.1, Component 1 (basestealing) for the baserunner on first base, are shown below.

Persistence of Component 1 Winning Percentage: Baserunner on First Base
Pitchers:  n = 36,236, R2 = 0.0551
WinPctEven = (26.89%)•WinPctOdd + (73.11%)•0.5000 (53.53)
Catchers:  n = 7,612, R2 = -0.0018
WinPctEven = (23.50%)•WinPctOdd + (76.50%)•0.5000 (20.85)
The number n is the number of players over whom the equation was estimated, that is, who accumulated any Player wins and/or losses on both odd- and even-numbered plays. The value R2 measures the percentage of variation in the dependent variable (WinPctEven) explained by the equation (i.e., explained by WinPctOdd).

The baseline, toward which WinPctEven regresses - (Win %)Baseline in the persistence equation - is set equal to 0.500. This is done for all of the persistence equations which I use to allocate shared credit. I did this based on emprical experimentation with alternatives, including freely estimating (Win %)Baseline. I thought the results when (Win %)Baseline was constrained to 0.500 worked best.

The numbers in parentheses are t-statistics. T-statistics measure the significance of b, that is, the confidence we have that b is greater than zero. The greater the t-statistic, the more confident we are that the true value of b is greater than zero. Roughly speaking, if a t-statistic is greater than 2, then we can be at least 95% certain that the true value of b is greater than zero (assuming that certain statistical assumptions regarding our model hold).

For baserunners on first base, Component 1 win percentage is significantly persistent for both pitchers and catchers with t-statistics far greater than two for both sets of players. The persistence is somewhat weaker for catchers (23.5%) than for pitchers (26.9%), although the two numbers are very close. The percentage of Component 1 Player decisions with a runner on first base (Component 1.1) which are attributed to pitchers is set equal to the pitcher persistence coefficient (26.9%) divided by the sum of the persistence coefficients for pitchers and catchers (26.9% + 23.5%). This leads to 53.4% of Component 1.1 decisions being allocated to pitchers and 46.6% of Component 1.1 decisions allocated to catchers.

5.    Changes in Component Splits over Time
There is no reason to believe that the split of credit between positions should be constant over time. On the other hand, if a distinct persistence equation is estimated every year, this could well produce significant year-to-year shifts because of statistical quirks from small sample sizes. Ideally, what we would like to do is allow for gradual changes in component splits over time, but do so in a way that reduces the likelihood of flukish year-to-year changes.

To accomplish this, I estimate unique Persistence Equations for every season, but I use all of my data in all of these equations. I simply weight the data based on how close to the season of interest it is. Each observation is multiplied by a YearWeight, which is equal to the following:

YearWeight = 1 - abs(Year - YearTarget) / 100

where "Year" is the year in which the observation occurred, and YearTarget is the year for which shares are being estimated. So observations in the target year get a YearWeight of 1.0, observations one year before or after the target year get a YearWeight of 0.99, observations two years removed from the target year get a YearWeight of 0.98, etc.

The result is a set of share weights that vary by year but do so fairly gradually. For example, the share of credit for Component 1.1 (basestealing by runners on first base) attributed to pitchers varies by season within a range of 51.4% to 56.0%.
6.    Final Proportions of Shared Player Game Points
Separate persistence equations and, hence, separate share weights, are calculated for specific fielders and by specific baserunners, so that, for example, Component 5 shares for first basemen and third basemen will differ. Also, as noted above, these share weights vary by season. Splits by season are presented on the pages for specific leagues (e.g., 2010 National League).

Average breakdowns of shared components over the full Retrosheet Era are summarized in the table below. The numbers below are averages across all fielders/baserunners and across all seasons, so do not necessarily apply precisely for any specific players or seasons.

Shared Components based on Persistence Equations

Component Pitcher Fielder
Component 1 52.2%47.8%
Component 2 76.3%23.7%
Component 5 31.4%68.6%
Component 6 25.9%74.1%
Component 7 36.2%63.8%
Component Batter Baserunner
Component 7 79.6%20.4%
Component 8 53.3%46.7%
Component 9 48.9%51.1%

All articles are written so that they pull data directly from the most recent version of the Player won-lost database. Hence, any numbers cited within these articles should automatically incorporate the most recent update to Player won-lost records. In some cases, however, the accompanying text may have been written based on previous versions of Player won-lost records. I apologize if this results in non-sensical text in any cases.

Home     List of Articles