Components


Components of Player Won-Lost Records



General Overview of Nine Components of Player Won-Lost Records

Player Wins and Losses are calculated using a nine-step process, each step of which assumes average performance in all subsequent steps. Each step of the process is associated with a Component of Player Wins and Losses (Player Decisions). The purpose of this article is to describe these nine components in detail.

There are four basic positions from which a player can contribute toward his baseball team’s probability of winning: Batter, Baserunner, Pitcher, and Fielder. Player Decisions are allocated to each of these four positions, as appropriate, within each of the following nine Components.

Component 1: Basestealing
Player Decisions are assessed to baserunners, pitchers, and catchers for stolen bases, caught stealing, pickoffs, and balks.

Component 2: Wild Pitches and Passed Balls
Player Decisions are assessed to baserunners, pitchers, and catchers for wild pitches and passed balls.

Component 3: Balls not in Play
Player Decisions are assessed to batters and pitchers for plate appearances that do not involve the batter putting the ball in play: i.e., strikeouts, walks, and hit-by-pitches.

Component 4: Balls in Play
Player Decisions are assessed to batters and pitchers on balls that are put in play, including home runs, based on how and where the ball is hit.

Component 5: Hits versus Outs on Balls in Play
Player Decisions are assessed to batters, pitchers, and fielders on balls in play, based on whether they are converted into outs or not.

Component 6: Singles versus Doubles versus Triples
Player Decisions are assessed to batters, pitchers, and fielders on hits in play, on the basis of whether the hit becomes a single, a double, or a triple.

Component 7: Double Plays
Player Decisions are assessed to batters, baserunners, pitchers, and fielders on ground-ball outs in double-play situations, based on whether or not the batter grounds into a double play.
Component 8: Baserunner Outs
Player Decisions are assessed to batters, baserunners, and fielders on the basis of baserunner outs.

Component 9: Baserunner Advancements
Player Decisions are assessed to batters, baserunners, and fielders on the basis of how many bases, if any, baserunners advance on balls in play.

The distribution of Player Wins and Losses by Component varies across seasons and across leagues, depending on the exact distribution of plays. The average distribution of Player decisions by Component across all seasons of the Retrosheet Era (1934 - 2013 for now) is as follows.

Breakdowns of Player Game Points by Component: 1934 - 2013
Distribution of Player Decisions
Percent of Total Percent of Offensive/Defensive Component Decisions Allocated to Player Decisions
Batters Baserunners Pitchers Fielders
Component 1: Stolen Bases, etc.2.2%0.0%100.0%52.7%47.3%
Component 2: Wild Pitches, Passed Balls1.3%0.0%100.0%76.7%23.3%
Component 3: Balls Not in Play15.5%100.0%0.0%100.0%0.0%
Component 4: Balls in Play32.9%100.0%0.0%100.0%0.0%
Component 5: Hit vs. Out34.5%100.0%0.0%37.0%63.0%
Component 6: Single v. Double v. Triple3.4%100.0%0.0%24.0%76.0%
Component 7: Double Plays1.6%78.5%21.5%35.3%64.7%
Component 8: Baserunner Outs2.3%41.3%58.7%0.0%100.0%
Component 9: Baserunner Advancements6.2%46.6%53.4%0.0%100.0%
Total Offensive/Defensive Decisions91.4%8.6%64.8%35.2%
Total Player Decisions45.7%4.3%32.4%17.6%



Offensive Wins and Losses are divided between batters and baserunners. The batter/baserunner breakdown is approximately 91% batters, 9% baserunners. Defensive Wins and Losses are divided between pitchers and fielders. In general, pitchers are credited with just under two-thirds (66.6 percent) of total Defensive Wins and Losses, including their role in preventing or allowing stolen bases as well as pitcher fielding. Fielders other than pitchers account for the other 33.4 percent of Defensive Game Points.

The breakdown of Player decisions by component has changed somewhat over time. Results from the most recent decade, 2003 – 2012, are shown below.

Breakdowns of Player Game Points by Component: 2003 – 2012
Distribution of Player Decisions
Percent of Total Percent of Offensive/Defensive Component Decisions Allocated to Player Decisions
Batters Baserunners Pitchers Fielders
Component 1: Stolen Bases, etc.1.8%0.0%100.0%52.4%47.6%
Component 2: Wild Pitches, Passed Balls1.3%0.0%100.0%75.7%24.3%
Component 3: Balls Not in Play16.5%100.0%0.0%100.0%0.0%
Component 4: Balls in Play35.7%100.0%0.0%100.0%0.0%
Component 5: Hit vs. Out32.3%100.0%0.0%34.4%65.6%
Component 6: Single v. Double v. Triple3.1%100.0%0.0%22.4%77.6%
Component 7: Double Plays2.1%82.2%17.8%27.7%72.3%
Component 8: Baserunner Outs1.7%40.0%60.0%0.0%100.0%
Component 9: Baserunner Advancements5.5%45.6%54.4%0.0%100.0%
Total Offensive/Defensive Decisions92.5%7.5%66.5%33.5%
Total Player Decisions46.2%3.8%33.3%16.7%



On the offensive side, baserunning has become somewhat less important recently, falling to 7.5% of total offensive decisions. On the defensive side, strikeouts and walks (Component 3) are higher in recent years, which has reduced the importance of fielding to 16.7% of total Player decisions. Overall, though, the results are generally quite similar over the entire 60+ years of the “Retrosheet Era”.

Overall, the breakdown of batting / baserunning / pitching / and fielding (counting pitcher fielding as "pitching") is 45.7% / 4.3% / 33.3% / 16.7%. This creates a breakdown between pitchers (excluding pitcher hitting and baserunning, but including pitcher fielding) and non-pitchers of 33.3% v. 66.7%, or slightly under 2-to-1 for position players vis-ŕ-vis pitchers.

The breakdown of Fielding Player Wins and Losses (on balls in play) by Component by Fielding Position are summarized below.

Breakdown of Fielding Decisions by Position: 1934 - 2013

Percent of Component Decisions by Fielder
P C 1B 2B 3B SS LF CF RF
Component 55.7%1.1%7.0%14.6%13.9%17.6%13.7%12.9%13.6%
Component 61.3%0.0%2.7%1.0%3.2%0.7%34.7%26.9%29.5%
Component 76.9%2.1%3.6%39.4%4.5%43.5%0.0%0.0%0.0%
Component 83.0%0.9%5.8%8.4%5.4%7.4%23.5%21.5%24.1%
Component 95.9%0.8%5.9%8.8%8.8%10.8%18.8%20.9%19.2%
Total Fielding5.3%1.0%6.3%12.8%11.3%15.2%16.5%15.6%16.1%

note: Pitcher numbers here represent only the "fielding" portion of the pitcher's credit, not the "pitching" portion of the credit.

Overall, outfielders accumulate just under 50% of Fielding Player decisions, excluding pitchers and catchers (48.2%). In contrast, Bill James’s Win Shares credit only 36% of non-catcher fielding Win Shares to outfielders. As such, my allocation of Player fielding decisions to outfielders may seem excessive. It is important to remember, however, that while Bill James’s fielding allocation is imposed, the fielding allocation here is derived from the actual results.

In 2006, for example, in games played in American League ballparks, 37.7% of all batting outs recorded by somebody other than the pitcher or catcher were recorded by outfielders. This is similar to Bill James’s Win Shares allocation (and is not terribly different from the 40.2% of Component 5 fielding decisions recorded by outfielders). Of all balls in play that were fielded by somebody other than the pitcher or catcher, however, outfielders were the first fielder to handle 53.5% of all such plays. This is reflected in my work by the large amount of Component 6, 8, and 9 Player decisions accumulated by outfielders. As such, I believe that the distribution of fielding games coming out of my work is reasonable.

Further, it is important to understand that I am not saying that outfield defense is therefore more valuable than infield defense. Outfielders accumulate more fielding wins but they also accumulate more fielding losses. An outstanding defensive outfielder may accumulate more value than a similarly outstanding defensive infielder but only to the extent that the outfielder would be expected to field more balls in play. An outfielder will not accumulate any more value than an infielder simply by virtue of his being an outfielder.

Event Probabilities

The key to properly assessing Player Wins and Losses is to credit each player with the change in Win Probability attributable to him assuming average performances by all other players. In order to do this, it is necessary to determine what the probability is of certain events occurring, such as the probability of a baserunner advancing from 1st to 3rd on a single, the probability of a particular ball-in-play becoming an out versus a single versus an extra-base hit, and many other similar probabilities.

Wherever possible, the probabilities of relevant events are initially calculated by direct observation – that is, the probability of a baserunner being caught trying to steal second is calculated by simply summing up all of the runners caught stealing second and dividing by the total number of possible runners.

In some cases, however, because of differences in the underlying context in which events occur (e.g., caught stealings may occur in a higher average context than not-caught stealings), direct probabilities may produce more Player Wins than Player Losses (or vice versa) for a particular component or sub-component of Player Game Points. To help to maintain the underlying assumption that the overall winning percentage within a particular season should be 0.500 for every component and sub-component of interest, my results are therefore refined by scaling them back to 0.500 at the aggregate level.

1.    Stolen Bases, Caught Stealing, and Wild Pitches
In the case of stolen bases, caught stealings, wild pitches, and the like, unique probabilities are calculated for each of the 24 base-out states. The probabilities of positive stolen base events -- identified below as “stolen bases”, but including advancements on errant pickoff attempts, errors on caught stealings, defensive indifference, and balks – caught stealing events (including pickoffs), and wild pitches / passed balls (I make no distinction between wild pitches and passed balls) by base-out state over the entire Retrosheet Era are presented below.

The specific probabilities actually used in calculating Player Won-Lost records are uniquely determined for each league and each season.

Outs Baserunners SB CS WP/PB
0 0 0.0%0.0%0.0%
0 1 5.8%3.3%1.9%
0 2 1.2%0.5%2.2%
0 3 0.2%0.2%1.7%
0 1-2 1.5%1.4%2.2%
0 1-3 4.2%1.1%2.1%
0 2-3 0.2%0.1%1.6%
0 1-2-3 0.2%0.1%1.5%
1 0 0.0%0.0%0.1%
1 1 6.1%3.7%1.9%
1 2 1.9%0.9%2.4%
1 3 0.3%0.7%1.7%
1 1-2 1.8%1.4%2.2%
1 1-3 4.7%1.9%2.1%
1 2-3 0.2%0.3%1.4%
1 1-2-3 0.2%0.3%1.5%
2 0 0.0%0.0%0.1%
2 1 6.7%3.3%1.9%
2 2 0.9%0.2%2.2%
2 3 0.4%0.2%1.6%
2 1-2 0.9%0.4%2.0%
2 1-3 5.9%1.4%2.1%
2 2-3 0.3%0.1%1.5%
2 1-2-3 0.3%0.2%1.4%

Except for the case where the bases are initially empty*, wild pitches and passed balls are somewhat uniform across base-out states, occurring between 1.5 and 2.5 percent of the time.
*Wild pitches and/or passed balls with the bases empty represent cases where the batter reached first base safely on a dropped third strike.

In general, wild pitches and passed balls are least common when there is a runner on third base who must score for any runners to advance (i.e., excluding runners on first and third), occurring 1.5% of the time, versus 2.0% of the time otherwise.

Stolen bases and caught stealings are much more dependent on both the position of the baserunners as well as on the number of outs than are wild pitches and passed balls.

Stolen base attempts of second base increase in frequency with the number of outs. With only a runner on first base, stolen base attempts (Stolen Bases plus Caught Stealings above) are somewhat less common with nobody out (9.1%) than with one or two outs (9.9%). Stolen base attempts of third base are far more frequent with one out (2.8% for a runner on second base only, 3.2% for runners on first and second) than with either zero (1.7% for a runner on second base only, 2.9% for runners on first and second) or two outs (1.1%, 1.3%). Apparently, baseball players and managers take seriously the old adage, “Don’t make the first or last out at third base.”

Stolen base success rates are much greater with two outs (e.g., 67.5% with a runner on first base only, 80.4% with a runner on second base) than with zero or one outs. Overall, over the time period studied here, stolen base success rates were 69.8% with two outs, 62.5% with one out, and 64.2% with no outs.

2.    Balls in Play
In the case of balls in play, the probabilities of many things are dependent on the exact location of the ball and how it was hit. For example, the probability of driving in a runner from third is vastly different on a ground out to the pitcher (17.1% over the full Retrosheet Era) versus a fly out to center field (83.7% over the same time period). Hence, in theory, ball-in-play probabilities should be calculated for each unique location/hit type combination.

My data source is Retrosheet event files. Some Retrosheet event files provide some data on the location of balls in play and hit types. Retrosheet event locations are described here. For my purposes, I consider there to be 17 locations, defined by the fielder(s) nearest to the play -- 1, 13, 15, 2, 23, 25, 3, 34, 4, 5, 56, 6, 7, 78, 8, 89, and 9 – three possible depths of the hit – shallow (S), medium (M), and deep (D) – and four hit types – bunts (B), ground balls (G), fly balls (F or P), and line drives (L).

The exact level of detail available from Retrosheet with respect to locations and hit types for balls in play varies a great deal over time. As of my writing this, complete (or very nearly complete) location data is only available from Retrosheet from 1989 through 1999. For most balls in play, more recent event files only provide very limited location data. For example, an event file may identify a single as being a single to left field (S7). But there’s a big difference between a ground ball through the hole at shortstop and a line drive off the left field wall that the left fielder manages to play well enough to hold the batter at first base.

Hit type data – i.e., bunts vs. ground balls vs. fly balls vs. line drives – are available for non-outs (i.e., singles, doubles, and triples) from 2003 onward. For other years (except for 1989 – 1999, where complete hit type data are available), hit type data is mostly limited only to outs (i.e., ground outs are distinguished from fly outs). Unfortunately, as we go farther back in time, Retrosheet play-by-play data tell us less and less. Prior to 1988, it is common (but not universal) for play-by-play data to not even identify the fielder on base hits (i.e., singles to left field are not distinguished from singles to right field). Going further back, there are even some outs for which the fielder is not identified.

     a.    Batted Ball Results
My solution to the lack of location data for most years is to use the 1989 – 1999 event files to generate weights by location and to impute estimated locations based upon complete location data from 1989 – 1999 and what information is available in other event files. I discuss and attempt to justify my use of location data in this way later in this article.

For calculating Player wins and losses, I divide balls in play based on the end result of the play – out, single, double, triple (batters reaching on fielding errors are treated as “singles” here) – and the first fielder to touch the ball. For outs, that is the player who recorded the first assist/unassisted putout. For hits, it is the fielder occupying the field to which the ball was hit – e.g., the left fielder for a single to left field (S7). If the hit type of the play is identified (bunt, ground ball, fly ball, line drive), this information is also used in determining the appropriate probabilities.

From the 1989 – 1999 data, I then calculate the expected location of the event. That is, for, say, a single to left field, I look at all singles to left field from 1989 – 1999 and calculate the probability that such a play was in each of the relevant locations. For example, the table below shows the distribution by location of line drive singles to left field. From the expected location, I then calculate the expected probabilities of all relevant events – probability of a base hit, probability of particular players making the out, probability of a runner scoring from second, etc. Data on hit type (ground ball, fly ball, etc.) is used for recent seasons for which hit-type data is available for all, or virtually all, plays (2003 onward). For earlier years (2000 – 2002 and pre-1989 seasons), data on hit type is generally not available, especially on base hits. Hence, this data is not used in these cases.

Line Drive Single to Left Field, 1989 - 1999
Total Balls in Play Probability of Line Drive being a _
at given Location
Location Depth Number Percent Out Single Double Triple
Unknown Unknown 245 0.86%
1 Unknown 5 0.02% 70.74% 28.32% 0.93% 0.00%
34 Unknown 2 0.01% 62.94% 36.46% 0.60% 0.00%
34 Deep 4 0.01% 5.30% 92.83% 1.82% 0.06%
4 Medium 1 0.00% 41.76% 56.16% 0.91% 1.18%
4 Deep 5 0.02% 10.30% 88.05% 1.65% 0.00%
5 Unknown 131 0.46% 78.31% 8.40% 13.29% 0.00%
5 Deep 576 2.01% 0.79% 40.19% 59.02% 0.00%
56 Unknown 702 2.45% 64.94% 34.83% 0.24% 0.00%
56 Shallow 9 0.03% 94.85% 5.15% 0.00% 0.00%
56 Deep 2,732 9.55% 3.55% 95.02% 1.37% 0.06%
6 Unknown 100 0.35% 89.49% 10.20% 0.31% 0.00%
6 Shallow 1 0.00% 39.05% 60.32% 0.63% 0.00%
6 Medium 3 0.01% 42.15% 56.87% 0.55% 0.44%
6 Deep 1,591 5.56% 5.86% 91.64% 1.91% 0.59%
7 Unknown 2,618 9.15% 29.08% 42.51% 27.79% 0.61%
7 Shallow 10,355 36.20% 9.69% 73.75% 16.15% 0.41%
7 Medium 4,008 14.01% 42.13% 34.65% 22.50% 0.73%
7 Deep 450 1.57% 42.12% 7.13% 49.65% 1.09%
78 Unknown 621 2.17% 21.56% 46.93% 30.63% 0.88%
78 Shallow 2,877 10.06% 7.01% 85.07% 6.63% 1.29%
78 Medium 1,344 4.70% 31.98% 46.12% 17.21% 4.69%
78 Deep 186 0.65% 32.28% 7.10% 59.96% 0.66%
8 Unknown 4 0.01% 36.88% 58.31% 3.88% 0.93%
8 Shallow 16 0.06% 8.92% 89.35% 1.07% 0.66%
8 Medium 4 0.01% 43.72% 49.98% 3.11% 3.18%
8 Deep 1 0.00% 73.96% 7.63% 16.51% 1.90%
89 Unknown 1 0.00% 22.45% 50.71% 24.85% 1.99%
89 Shallow 4 0.01% 6.73% 85.26% 4.81% 3.20%
89 Medium 2 0.01% 28.30% 45.32% 13.59% 12.79%
9 Unknown 3 0.01% 32.59% 41.61% 20.48% 5.33%
9 Shallow 2 0.01% 9.70% 73.90% 14.25% 2.14%
9 Deep 1 0.00% 48.43% 8.62% 42.95% 0.00%
TOTALS 28,604 100.00% 18.81% 63.94% 16.50% 0.74%

As the above example shows, a line drive single to left field had an a priori probability of becoming an out of 18.81%. For fly balls, the a priori probability of becoming an out was 83.34%. For ground balls, the a priori probability of becoming an out was 53.68%. Overall, the a priori probability of a single to left field becoming an out, regardless of hit type, was 36.88%.

The individual probabilities of fielders having made an out on a play which becomes a generic single to left field ranged from 47.6% by the left fielder to 26.0% by the shortstop, 22.6% by the third baseman, 3.3% by the center fielder, and 0.5% by other fielders. For ground-ball singles, the relevant probabilities are 53.5% for the third baseman and 45.7% for the shortstop. For fly ball singles, on the other hand, left fielders would be expected to convert 75.2% of all outs (76.6% for line drives) with center fielders expected to convert 10.2% (4.7% for line drives).

For the probabilities of the play becoming an out, single, double, or triple, one further adjustment is made for recent years. For recent years, while these various weights are not known for specific locations, they are, of course, known for the league as a whole. For example, in the 2005 American League, 25.90% of all line drives (excluding home runs) were converted into outs. From 1989 – 1998, however, 33.49% of line drives were converted into outs. The weights used for other leagues are adjusted to tie the aggregate weights to the actual league-wide percentages. This is done through the use of the Matchup Formula, which is described later in this article.

     b.    Baserunner Outs/ Advancements
In addition to the probability of a ball-in-play becoming an out, single, double, or triple, and the probability the ball is played by each of the various fielders, the other events for which weights are needed are the probability of baserunner advancements and/or baserunning outs. Such probabilities are, of course, a function of the specific baserunner – batter, runner on first, runner on second, runner on third; the batting event – out, single, double, triple; and, where available, the hit type (bunt, ground ball, fly ball, line drive) and location of the ball. For these events, probabilities are simply calculated by direct observation for the baserunner/bat event/hit type combinations for which data are available. For earlier years, probabilities are calculated by hit type for outs where possible (i.e., unique probabilities are calculated for a baserunner scoring from third base on a ground out versus a fly out), but not for hits (when hit type data are not recorded for hits).

In the case of baserunner advancements on singles and doubles, separate probabilities are also calculated based on the number of outs (two versus less than two).

Event probabilities for a specific league are linked through the league stats page for the league. An example of the event probabilities for the 2011 American League can be found here.

Dividing Credit between Batters and Baserunners and between Pitchers and Fielders

In many cases, it is not clear exactly who should get credit for a particular play. For example, pitchers and catchers share responsibility for Component 1 (basestealing) Player decisions. The allocation of Player decisions in these cases is done based on the relative skill level apparent by the relevant players.

The technique outlined here is used to divide responsibility between pitchers and catchers for Component 1 (basestealing) and Component 2 (wild pitches and passed balls) Player decisions, between pitchers and fielders for Components 5 (hits vs. outs), 6 (single vs. double vs. triple), and 7 (double plays), and between batters and baserunners for Components 7, 8 (baserunner outs), and 9 (baserunner advancements).

The division of Component 1 Player decisions between pitchers and catchers is used here as an illustration of the general technique.

1.    Basic Theory
How does one determine how to divide credit between pitchers and catchers for Component 1 (basestealing) Player decisions?

Let’s begin by asking, what if somebody deserved no credit for a particular component of Player decisions but we allocated Player decisions to them anyway? For example, what if we assigned Component 1 Player decisions to the defensive team’s right fielder? What would we expect Component 1 Player decisions to look like in that case? Essentially, we would expect every right fielder to have a Component 1 winning percentage of 0.500 plus or minus some random variation.

Suppose we were to try to predict a right fielder’s Component 1 winning percentage over some time period based on his Component 1 winning percentage over some other time period. We would expect, in such a persistence equation, for there to be no predictive ability of this component.

Alternately, what would we expect Component 1 Player decisions to look like if we assigned them to players who had different levels of talent in terms of affecting the opponents’ basestealing? In such a case, we would expect a player’s Component 1 winning percentage to be equal to his “true” winning percentage (his “true-talent”) plus or minus some random variation and for a player’s Component 1 winning percentage over some time period to have significant predictive capacity over other time periods.

In other words, the extent to which a player’s winning percentage at some point in time is predictive of his winning percentage at some other point is suggestive of the extent to which there is a true skill involved in a particular component. Based on this, Player wins and losses are allocated in proportion to the extent to which a player’s winning percentage has predictive power.

2.    Mathematics
The basis for dividing shared Player decisions is Persistence Equations.

Player won-lost records are not constructed from aggregated year-end data, but are, instead, constructed from play-by-play data. Rather than comparing results across years, therefore, it is possible for me to compare results across plays. As a general rule, players’ “true talent” should be much more stable from play to play than from year to year.

For the purpose of developing what I call “Persistence Equations”, I divide the plays that took place in a particular season into two pools: odd and even. That is, the first plate appearance of the season is identified as play number 1, the next plate appearance is play number 2, etc. Even-numbered plays (2, 4, …) go into one pool; odd-numbered plays go into the other one.

To evaluate the persistence of skills, I can then fit a simple equation which attempts to explain the relevant factor (e.g., component) on even plays as a function of the same factor for odd plays. That is,

(Factor A)Even = a + b•(Factor A)Odd

         Interpretation of Persistence Equation
The coefficient b in the Persistence Equation measures the persistence of Factor A between the two samples (even plays v. odd plays). The value of Factor A in the odd and the even period here are both samples of Factor A’s true value. Sample statistics have a tendency to trend toward their long-run value as the sample size increases. Statisticians call this “regression to the mean”.

The constant term, a, can be thought of as a measure of the extent to which Factor A regresses toward the mean. That is, one could re-write the Persistence Equation as follows:

(Factor A)Even = b•(Factor A)Odd + (1-b)•(Factor A)Baseline

where (Factor A)Baseline represents a baseline toward which Factor A regresses over time.

There are two relevant results in interpreting the extent to which Factor A persists. The one most commonly used by sabermetricians is the correlation coefficient, or r (or r2). The value, R2, measures the percentage of variation in the left-hand side variable – (Factor A)Even – that can be explained by the right-hand side variable(s) – i.e., (Factor A)Odd. This provides some indication of the magnitude of the persistence of Factor A.

To assess the significance of the persistence, however, one must look at the significance of the persistence coefficient, b. The estimated value of b will have a standard error associated with it. If one divides b by this standard error, the resulting variable is called a t-statistic. The larger the t-statistic (in absolute value), the less likely that the true persistence coefficient is zero. As a (somewhat crude) rule of thumb, if the t-statistic is greater than 2, then we can be 95% certain that the true value of b is greater than zero (given that certain statistical assumptions about our equation are true).

         Mathematical Complications Estimating Persistence Equations
The basic Persistence Equation above:

(Factor A)Even = a + b•(Factor A)Odd

can be solved by Ordinary Least Squares (OLS), which is one of the most basic statistical regression procedures out there. There are, however, two additional complications associated with estimating Persistence Equations.

The first issue is that, in order to ensure that the estimated value b is not biased, the persistence equation should be fully specified. That is, if there are other variables that can be expected to affect (Factor A)Even, these variables should be included on the right-hand side of the persistence equation along with (Factor A)Odd. In general, this is not a big deal for most of the Persistence Equations that I estimate here, but it can be an issue in general regression analysis and is always worth keeping in mind.

The second issue is much more of an issue with the Persistence Equations that I estimate. The validity of OLS as an estimation technique is dependent on several assumptions about the distribution of the residual, or error, term in the persistence equation*. One of these assumptions is that the variance of the error term is constant across all observations. That is, for example, OLS is only valid if the unexplained variation in player winning percentage is equal for all players. In this case, however, not only do we not want to assume this, but we actually know that it’s wrong. Unexplained variation declines as the number of player games increases. Fortunately, there is a very easy way to adjust for this. Instead of OLS, I use Weighted Least Squares (WLS). This weights each observation by the number of player games over which the Factor has been compiled**, squared***. In this way, the results for players with more games played are weighted more heavily than players with fewer games.

* To be technically correct, the persistence equation should be written as follows:

(Factor A)Even = a + b•(Factor A)Odd + e

where e is the “error” or “residual” term that measures unexplained variation in (Factor A)Even. The appropriateness of OLS is then dependent on a set of assumptions regarding the distribution of e.


** The number of games is defined as the harmonic mean of the games over which (Factor A)Odd and (Factor A)Even are compiled.

***The decision to square the number of games in the weighting matrix was determined by empirical experimentation, which considered several alternative weighting schemes, based on the number of games (total games, the log of games, games squared, et al.).

3.    Theoretical Complications Estimating Persistence Equations
         Complication 1: Controlling for the Talent of the Other Players Involved
Earlier, I identified a defensive team’s right fielder as an example of a player for whom we would expect his Component 1 winning percentage to simply be randomly distributed. In fact, however, some of you might have seen a flaw in my example.

In 2004, the Montreal Expos allowed only 58 stolen bases on the season, while catching 41 opposing baserunners attempting to steal. Based on this, the Montreal Expos compiled a team-wide Component 1.1 (basestealing by runners on first base) winning percentage of 0.645. Of course, this means that Expos right-fielders would have a combined Component 1.1 winning percentage of 0.645, not 0.500, not because Expos right fielders had some innate ability to prevent the other team from stealing bases, but because they had the good fortune to be teammates with Brian Schneider, who amassed an unadjusted Component 1.1 winning percentage of 0.660 at catcher.

On the other hand, the 2002 New York Mets allowed 151 stolen bases against only 53 caught stealing, leading to a team-wide context-neutral Component 1.1 winning percentage of 0.432, due, in part, to the notorious problems of their catcher, Mike Piazza, who allowed 125 stolen bases (which led the National League) against 27 caught stealing in 121 games caught, for a context-neutral Component 1.1 winning percentage of 0.320.

Unfortunately, this problem with attempting to measure “true-talent” Component 1 winning percentage is not limited to outfielders, where we know that no such talent exists. In fact, on average, the context-neutral Component 1.1 winning percentage for Montreal Expos pitchers in 2004 was 0.645, not necessarily because Expos pitchers were particularly adept at holding runners on base, but, in large part, because Brian Schneider was their catcher. Yet, pitchers do have some ability here. The key is to separate the ability of Montreal Expos pitchers from the ability of Montreal Expos catchers.

The first step before one can accurately assess “true-talent” Component 1 winning percentages is to adjust player winning percentages for the context in which these percentages were amassed. Specifically, pitchers’ Component 1 winning percentages are adjusted to control for the Component 1 winning percentages of their catchers, and catchers’ Component 1 winning percentages are adjusted to control for the Component 1 winning percentages of their pitchers. Similar adjustments are done for all Components for which Player Game Points are to be shared.

This is done iteratively. First, pitchers’ Component 1 winning percentages are adjusted to control for the Component 1 winning percentages of their catchers. This is done using the Matchup Formula.

         Sidebar: the Matchup Formula
One of the coolest formulas I’ve come across in sabermetrics is the Matchup Formula, sometimes called the Log5 Matchup Formula, which I learned from Bill James.

If a team with a 0.667 winning percentage faces a team with a 0.450 winning percentage, how often would you expect the 0.667 team to win?

If a 0.300 hitter faces a pitcher with a 0.290 batting average against, in a league with a 0.250 batting average, how well do we expect the batter to hit?

If 49.0% of a particular type of ball were turned into outs in a league when 70.2% of all balls-in-play were turned into outs, what percentage of these would be outs in a league where only 69.4% of all balls-in-play were converted into outs?

The answers to all of these questions can be solved with the Matchup Formula.

I’ll begin with the simplest version of the Matchup Formula. Let W1 be the winning percentage of Team 1 and W2 be the winning percentage of Team 2. The probability of Team 1 beating Team 2 is given by the following formula:

Probability of Team 1 Winning = W1•(1 – W2) / [W1•(1 – W2) + W2•(1 – W1)]

So, for example, a team with a 0.667 winning percentage will beat a team with a 0.450 winning percentage approximately 71.0% of the time.

The formula implicitly assumes that both of these teams faced average (or at least equivalent) opposition in compiling those winning percentages. So, what if Team 1 has a 0.667 winning percentage, but the average record of their opponents was only 0.440, while Team 2’s 0.450 winning percentage was amassed against opponents with an average winning percentage of 0.520?

Let W1 be Team 1’s winning percentage, O1 be the average record of Team 1’s opponents, W2 be Team 2’s winning percentage, and O2 be the average record of Team 2’s opponents. In this case, the probability of Team 1 beating Team 2 follows the same basic formula, but with a twist:

Probability of Team 1 Winning = W’1•(1 – W’2) / [W’1•(1 – W’2) + W’2•(1 – W’1)]

where

W’1 = W1•O1 / [W1•O1 + (1- W1)•(1- O1)]

W’2 = W2•O2 / [W2•O2 + (1- W2)•(1- O2)]

Plugging in the numbers from above (0.667 against 0.440 opponents versus 0.450 against 0.520 opponents), we find that Team 1 has a 64.0% chance of defeating Team 2.

There is still one more additional piece of information. The formula so far assumes that all of the numbers within the formula are relative to a 0.500 context. What if we return to our batting average example from the first paragraph? If a 0.300 hitter faces a pitcher with a 0.290 batting average against in a league with a 0.250 batting average, how well do we expect the batter to hit? Relating this to our earlier formulae, the 0.300 corresponds to W1, the 0.290 actually equals (1 – W2) (W2 would be the pitcher’s success rate, which is 0.710 in this case). Let’s complicate this further and assume that the 0.300 hitter has faced pitchers with an average batting average against of 0.265 – so, O1 equals 0.735 here (1 – 0.265) – and the pitcher has faced opponents with an average batting average of 0.270 (which will equal O2). Now, we have one more new piece of information – we’ll call it L – the league batting average, which is 0.250 in this example.

Let P1 equal the probability that Batter 1 gets a hit against Pitcher 2. Here, the Matchup Formula, in its entirety, becomes the following:

P0 = W’1•(1 – W’2) / [W’1•(1 – W’2) + W’2•(1 – W’1)]

where

W’1 = W1•O1 / [W1•O1 + (1- W1)•(1- O1)]

W’2 = W2•O2 / [W2•O2 + (1- W2)•(1- O2)]

and

P1 = P0•(1 – L) / [P0•(1 – L) + L•(1 – P0)]

And, in our example, our 0.300 hitter would be expected to bat 0.304 against this particular pitcher in this particular league.

Suppose, for example, that a pitcher compiled a Component 1 (basestealing) winning percentage of 0.515 but that the catchers with whom he shared that Component 1 credit compiled an average winning percentage (weighted by the number of Component 1 decisions which they shared with this particular pitcher) of 0.535.

In such a case, the Matchup Formula can be used to adjust the pitcher’s Component 1 winning percentage. Here, the pitcher’s winning percentage (0.515) would correspond to W1 in the Matchup Formula above. The average winning percentage of his catchers (0.535) would correspond to O1, the context in which the pitcher performed. Plugging these values into the Matchup Formula would produce an adjusted Component 1 winning percentage for this pitcher of 0.480.

Back to our persistence equations.

After pitchers’ winning percentages are adjusted based on catcher winning percentages, catcher winning percentages are then adjusted based on these newly-adjusted pitcher winning percentages. Ideally, one would probably prefer to continue the iterative process until all Component 1 winning percentages do not change between iterations. For computational simplicity, I simply repeated this process three more times for both pitchers and catchers.

Returning to the earlier examples, the adjusted Component 1.1 winning percentages for Montreal Expos pitchers was 0.539 in 2004 (versus 0.645 unadjusted), while Montreal Expos catchers put up a combined adjusted Component 1.1 winning percentage of 0.641 (versus 0.645 unadjusted). Here, because Expos pitchers and catchers were both above-average in this component in 2004, their combined winning percentage ends up being greater than either of their individual winning percentages. The whole is greater than the sum of the parts.

For the 2002 New York Mets, their pitchers’ adjusted winning percentage was 0.519 (versus 0.432 unadjusted) while Mets’ catchers had an adjusted winning percentage of 0.415 ( 0.320 for Mike Piazza and 0.708 for other Mets’ catchers). Mets pitchers weren’t bad at preventing stolen bases in 2002; they simply had the misfortune of pitching to one of the worst catchers in modern times at stopping an opponent’s running game.

The Persistence Equations by which Shared Player Wins and Losses are calculated are estimated using component winning percentages which have been adjusted in this way for the winning percentages of players’ teammates.

4.    Example Persistence Equations
Persistence equations are estimated using all of the seasons for which I have estimated Player won-lost records, which model player winning percentage for the Component of interest on even-numbered plays as a function of player winning percentage for the Component of interest on odd-numbered plays:

(Component Win Pct)Even = b•(Component Win Pct)Odd + (1-b)•(WinPct)Baseline

where (WinPct)Baseline represents a baseline winning percentage toward which Component winning percentages regress over time.

The results for Component 1.1, Component 1 (basestealing) for the baserunner on first base, are shown below.

Persistence of Component 1 Winning Percentage: Baserunner on First Base

 
Pitchers:  n = 32,398, R2 = 0.0582
WinPctEven = (27.35%)•WinPctOdd + (72.65%)•0.5000 (51.88)

 
Catchers:  n = 6,872, R2 = -0.0046
WinPctEven = (23.35%)•WinPctOdd + (76.65%)•0.5000 (19.64)

The number n is the number of players over whom the equation was estimated, that is, who accumulated any Player wins and/or losses on both odd- and even-numbered plays. The value R2 measures the percentage of variation in the dependent variable (WinPctEven) explained by the equation (i.e., explained by WinPctOdd).

The baseline, toward which WinPctEven regresses - (Win %)Baseline in the persistence equation - is set equal to 0.500. This is done for all of the persistence equations which I use to allocate shared credit. I did this based on empirical experimentation with alternatives, including freely estimating (Win %)Baseline. I thought the results when (Win %)Baseline was constrained to 0.500 worked best.

The numbers in parentheses are t-statistics. T-statistics measure the significance of b, that is, the confidence we have that b is greater than zero. The greater the t-statistic, the more confident we are that the true value of b is greater than zero. Roughly speaking, if a t-statistic is greater than 2, then we can be at least 95% certain that the true value of b is greater than zero (assuming that certain statistical assumptions regarding our model hold).

For baserunners on first base, Component 1 win percentage is significantly persistent for both pitchers and catchers with t-statistics far greater than two for both sets of players. The persistence is somewhat stronger for catchers (23.4%) than for pitchers (27.4%). The percentage of Component 1 Player decisions with a runner on first base (Component 1.1) which are attributed to pitchers is set equal to the pitcher persistence coefficient (27.4%) divided by the sum of the persistence coefficients for pitchers and catchers (27.4% + 23.4%). This leads to 53.9% of Component 1.1 decisions being allocated to pitchers and 46.1% of Component 1.1 decisions allocated to catchers.

5.    Changes in Component Splits over Time
There is no reason to believe that the split of credit between positions should be constant over time. On the other hand, if a distinct persistence equation is estimated every year, this could well produce significant year-to-year shifts because of statistical quirks from small sample sizes. Ideally, what we would like to do is allow for gradual changes in component splits over time, but do so in a way that reduces the likelihood of flukish year-to-year changes.

To accomplish this, I estimate unique Persistence Equations for every season, but I use all of my data in all of these equations. I simply weight the data based on how close to the season of interest it is. Each observation is multiplied by a YearWeight, which is equal to the following:

YearWeight = 1 - abs(Year - YearTarget) / 100


where "Year" is the year in which the observation occurred, and YearTarget is the year for which shares are being estimated. So observations in the target year get a YearWeight of 1.0, observations one year before or after the target year get a YearWeight of 0.99, observations two years removed from the target year get a YearWeight of 0.98, etc.

The result is a set of share weights that vary by year but do so fairly gradually. For example, the share of credit for Component 1.1 (basestealing by runners on first base) attributed to pitchers varies by season within a range of 53.6% to 55.2%.
6.    Final Proportions of Shared Player Game Points
The specific Persistence Equations used to separate shared responsibilities are summarized in the sections of this article related to specific components below.

Separate persistence equations and, hence, separate share weights, are calculated for specific fielders and by specific baserunners, so that, for example, Component 5 shares for first basemen and third basemen will differ. Also, as noted above, these share weights vary by season. Splits by season are presented on the pages for specific leagues (e.g., 2012 National League).

Average breakdowns of shared components over the full Retrosheet Era are summarized in the table below. The numbers below are averages across all fielders/baserunners and across all seasons, so do not necessarily apply precisely for any specific players or seasons.

Shared Components based on Persistence Equations

Component Pitcher Fielder
Component 1 52.7%47.3%
Component 2 76.7%23.3%
Component 5 38.4%61.6%
Component 6 24.2%75.8%
Component 7 36.9%63.1%
Component Batter Baserunner
Component 7 78.5%21.5%
Component 8 52.4%47.6%
Component 9 50.7%49.3%

Component 1: Basestealing

In the first step of calculating Player Wins and Losses, baserunners, pitchers, and catchers are given credit and blame for either advancing (allowing) or failing to advance by (preventing) stolen bases (or defensive indifference) and for being caught stealing or picked off and failing to be caught stealing or picked off.

Overall, 2.2% of raw Player Decisions were accrued in this step over the entire Retrosheet era. Because stolen bases are an elective strategy, however, the share of total Player decisions earned in this Component has tended to vary more over time than other components. In the 1950s, for example, basestealing was fairly rare (until the pennant-winning 1959 Go-Go Sox); hence, Component 1 made up only 1.9% of total Player decisions during this decade. In the 1960s, spurred by the basestealing of players such as Luis Aparicio, Maury Wills, and Lou Brock, as well as the increased importance of one-run strategies due to the lower offensive levels, Component 1 grew to 2.0% of all Player decisions. Basestealing grew still more in importance in the 1970s (2.6% of total Player decisions) and peaked in the 1980s at 2.9%, thanks to a new generation of basestealers including Rickey Henderson, Tim Raines, Willie Wilson, and others. Basestealing lessened somewhat in importance in the 1990s (2.4% of total player decisions for the decade), especially the late 1990s, as offensive levels increased, making one-run strategies relatively less important. The lessening of the importance of basestealing continued into the 21st century, with Component 1 accounting for 1.9% of total Player decisions since 2000.

1.    Calculation of Component 1 Player Game Points
Credits for actual stolen bases, caught stealings, and the like are calculated simply as the change in Win Probability resulting from the change in the base/out situation (and the score, if appropriate).

The probability of a stolen base is calculated based on the league-wide percentage of times a base was stolen given this particular baserunner/out state – that is, 24 probabilities are calculated, one for each base-out state (of course, the three bases-empty scenarios have no chance of a stolen base). Unique probabilities are calculated for each league-season. As an example, average probabilities over the entire Retrosheet Era (1934 - 2013) are shown below.

Outs Baserunners SB CS Success Rate
0 1 5.8%3.3%63.9%
0 2 1.2%0.5%70.3%
0 3 0.2%0.2%58.0%
0 1-2 1.5%1.4%52.9%
0 1-3 4.2%1.1%78.4%
0 2-3 0.2%0.1%62.6%
0 1-2-3 0.2%0.1%65.5%
1 1 6.1%3.7%62.2%
1 2 1.9%0.9%69.2%
1 3 0.3%0.7%29.9%
1 1-2 1.8%1.4%56.7%
1 1-3 4.7%1.9%70.8%
1 2-3 0.2%0.3%38.7%
1 1-2-3 0.2%0.3%45.9%
2 1 6.7%3.3%67.5%
2 2 0.9%0.2%80.4%
2 3 0.4%0.2%63.2%
2 1-2 0.9%0.4%69.1%
2 1-3 5.9%1.4%81.3%
2 2-3 0.3%0.1%68.7%
2 1-2-3 0.3%0.2%65.6%

In addition to wins and losses for actually stealing bases, baserunners, pitchers, and catchers are also credited or debited with their failure to steal or be caught stealing. The win probability at the beginning of a play is calculated based on the probability of each possible event which could subsequently occur. This includes, of course, some possibility that a baserunner may steal one or more bases as well as the possibility that some baserunner may be picked off or caught stealing.

The net win probability in the absence of any stolen bases is calculated as follows. The overall win probability is equal to the weighted average of the win probability with and without base-stealing, i.e.,

WinProb = Prob(SB)•WinProbSB + (1-Prob(SB)) •WinProbnoSB

where Prob(SB) is the probability of a stolen base, which, as noted above, is base-out dependent. If no stolen base occurs, then the resulting Win Probability will be WinProbnoSB above, which can be calculated as follows:

WinProbnoSB = [1/(1-Prob(SB))]•(WinProb – Prob(SB)•WinProbSB)

The net effect on Win Probability, then, of not stealing a base will simply be the difference: WinProbnoSB – WinProb.

Balks are included in Component 1 under the assumption that balks tend to be the result of pitchers worrying about possible stolen bases. Of course, balks are relatively rare, so it makes little difference whether they are lumped together with stolen bases or with wild pitches and passed balls in Component 2.

One way in which stolen bases are unique among the nine components of Player Decisions is that stolen base attempts are purely elective. That is, the offensive team chooses whether or not to attempt to steal a base, unlike, say, balls in play or wild pitches, which just happen. Because of this, the value of a stolen base is intrinsically dependent on the context in which it takes place. To acknowledge this, I do not calculate a “context-neutral” version of stolen base Player Decisions. All stolen base Player Decisions are tied to the context in which they occurred, so that “context-neutral” Component 1 Player Decisions are exactly equal to “context-dependent” Component 1 Player Decisions. An example of how context affects the value of a player’s stolen bases is discussed later in this article, where I compare Ichiro Suzuki’s and Mike Cameron’s Component 1 Player Games for the 2002 Seattle Mariners.

Offensively, stolen bases, caught stealings, and the lack thereof, are credited to baserunners. Defensively, the credit for these things is shared by pitchers and catchers. This is one of several cases where credit may be shared by different players. The basic process whereby this credit is divided was described earlier in this article. The specific division of defensive Component 1 Player Decisions is presented next.

2.    Division of Component 1 Game Points Between Pitchers and Catchers
As explained above, Component 1 Player Games are divided between pitchers and catchers based on the extent to which player winning percentages persist across different sample periods.

One measure of the extent to which a particular factor is a skill is the extent to which a player’s winning percentage persists over time. To evaluate the persistence of skills, I fit a simple persistence equation which modeled Component 1 winning percentage on even-numbered plays as a function of Component 1 winning percentage on odd-numbered plays:

(Component 1 Win Pct)Even = b•(Component 1 Win Pct)Odd + (1-b)•(WinPct)Baseline

where (WinPct)Baseline represents a baseline winning percentage toward which Component 1 winning percentages regress over time.

Equations of this type were fit for Component 1 Player Games for pitchers and catchers. Separate equations were estimated for each base. The results for these equations are shown below. A brief explanation of these variables follows.

The number n is the number of players over whom the equation was estimated, that is, who accumulated any Player wins and/or losses on both odd- and even-numbered plays. The value R2 measures the percentage of variation in the dependent variable (WinPctEven) explained by the equation (i.e., explained by WinPctOdd). The numbers in parentheses are t-statistics. T-statistics measure the significance of b, that is, the confidence we have that b is greater than zero. The greater the t-statistic, the more confident we are that the true value of b is greater than zero. Roughly speaking, if the t-statistic is greater than 2, then we can be at least 95% certain that the true value of b is greater than zero (assuming that certain statistical assumptions regarding our model hold). The value of (WinPct)Baseline, the baseline winning percentage toward which winning percentages regress over time, is set equal to 0.500 by construction.
note: To be precise, I estimate unique Persistence Equations for every season, which use all of my data in all of these equations, but weight the data based on how close to the season of interest it is. The equations shown here weight each season equally.

Persistence of Component 1 Winning Percentage: Baserunner on First Base

 
Pitchers:  n = 32,398, R2 = 0.0582
WinPctEven = (27.35%)•WinPctOdd + (72.65%)•0.5000 (51.88)

 
Catchers:  n = 6,872, R2 = -0.0046
WinPctEven = (23.35%)•WinPctOdd + (76.65%)•0.5000 (19.64)

For baserunners on first base, Component 1 win percentage is significantly persistent for both pitchers and catchers with t-statistics far greater than two for both sets of players. The persistence is somewhat stronger for catchers (23.4%) than for pitchers (27.4%). The percentage of Component 1 Player decisions with a runner on first base (Component 1.1) which are attributed to pitchers is set equal to the pitcher persistence coefficient (27.4%) divided by the sum of the persistence coefficients for pitchers and catchers (27.4% + 23.4%). This leads to 53.9% of Component 1.1 decisions being allocated to pitchers and 46.1% of Component 1.1 decisions allocated to catchers.

Persistence of Component 1 Winning Percentage: Baserunner on Second Base

 
Pitchers:  n = 31,982, R2 = -0.0210
WinPctEven = (15.24%)•WinPctOdd + (84.76%)•0.5000 (26.51)

 
Catchers:  n = 6,791, R2 = -0.0389
WinPctEven = (4.76%)•WinPctOdd + (95.24%)•0.5000 (4.122)

Based on these results, Component 1.2 decisions are split 76.2% to pitchers (15.2% / (15.2% + 4.8%)) and 23.8% to catchers.

Persistence of Component 1 Winning Percentage: Baserunner on Third Base

 
Pitchers:  n = 30,549, R2 = -0.0252
WinPctEven = (11.58%)•WinPctOdd + (88.42%)•0.5000 (19.87)

 
Catchers:  n = 6,604, R2 = -0.0570
WinPctEven = (12.58%)•WinPctOdd + (87.42%)•0.5000 (9.912)

Finally, in this case, there is no positive persistence (in fact, there is significant negative persistence) for pitchers in preventing steals of home. There is, however, significant persistence for catchers. Because of this, Component 1.3 decisions are allocated 100% to catchers.
3.    Level of Credit for Not Attempting Stolen Bases
Player Decisions are awarded not only for stolen bases and caught stealing, but also for a lack of stolen bases and caught stealings when given the opportunity. For most of the seasons for which I have calculated Player won-lost records, the failure to attempt a stolen base was actually a net positive for a base runner. That is, the expected gain in win percentage to an offensive team from a stolen base times the number of actual stolen bases (including defensive indifferences and balks) was less than the expected loss in win percentage to an offensive team from being caught stealing times the number of actual caught stealings (including pickoffs). This has not been true every season, and, in fact, this tendency has reversed itself for seasons since 2007.

Regardless of whether a net positive or net negative, it is worth noting that very few Player Wins are actually earned by failing to attempt a stolen base. With a runner on first base and second base open, the failure to steal second base cost an average of 0.000292 losses per plate appearance in 2009, for example. Avoiding being caught stealing (or picked off), on the other hand, earned an average of 0.000266 wins per plate appearance.

An interesting contrast can be made between the most prolific basestealer in the Major Leagues from 2000 – 2006, Juan Pierre, who stole 325 bases and was caught stealing 116 times, and perhaps the least prolific basestealer during this time period, Tony Clark, who reached base approximately 580 times over this time period (excluding home runs) and was credited with no stolen bases and a single caught stealing over this time period.

Juan Pierre, from 2000 – 2006, earned a total of 6.43 stolen base wins, the most stolen base wins earned by any baserunner over this time period. He also led all players in stolen base losses, however, with 6.28, for a Component 1 winning percentage of 0.506 and 0.15 net wins.

Tony Clark, on the other hand, because he never ran, amassed a mere 0.24 stolen base wins, but, because he was only caught stealing once, he also amassed only 0.15 stolen base losses, for a Component 1 winning percentage of 0.614 and 0.09 net wins.

In other words, Juan Pierre’s 441 stolen base attempts generated 0.06 more net wins for his teams than Tony Clark’s one (unsuccessful) stolen base attempt did over these seven years.*
*To be fair, Juan Pierre has earned a total of 1.91 net Component 1 wins over his entire career.

4.    Baserunners versus Pitchers versus Catchers
Overall, Component 1 Player Decisions account for 2.2% of total Player Decisions. The relative importance of basestealing as a component of total player value is quite different, however, for baserunners, pitchers, and catchers.

Because of the perfect symmetry between offensive and defensive Player Game Points, basestealing accounted for a total of 2.2% of total offensive Player Decisions. The importance of basestealing varied considerably, however, across players. Returning to the above examples, Juan Pierre has accumulated 7.1% of his total Player decisions in Component 1 over the course of his career, while Tony Clark’s basestealing only accounted for 0.6% of his total Player decisions.

The highest percentage of total offensive Games within Component 1 for a single season for a player that played regularly* was probably Otis Nixon for the 1990 Montreal Expos who had 50 stolen bases and 13 caught stealing in 119 games (263 plate appearances and 26 pinch-running appearances), for a Component 1 Won-Lost record of 1.45 - 1.12. Basestealing accounted for a total of 20.7% of Nixon’s total offensive Player Decisions and 16.0% of his total Player decisions that year.
*min. 100 games played, 10 player decisions

In contrast, basestealing is a more minor aspect of overall pitching, with Component 1 Player Decisions accounting for only 1.8% of total pitching Player Decisions (not including fielding, batting, and baserunning decisions earned by pitchers). For catchers, on the other hand, Component 1 Player Decisions are a huge percentage of overall catcher fielding, accounting for 62.5% of total fielding value for catchers.

Component 1 leaders can be found here.

Basestealing Context
Unlike most other events in baseball, stolen bases are purely elective plays by the offensive team (or player). To recognize this, I do not believe that it makes sense to talk about “context-neutral” basestealing, as one factor that can distinguish between “good” and “bad” basestealers is their ability to evaluate context in making the decision whether or not to attempt a stolen base.

Because of context, then, not all stolen bases are created equally, nor are all caught stealings. A terrific example of this was two teammates on the 2002 Seattle Mariners: Mike Cameron and Ichiro Suzuki. In 2002, Cameron and Ichiro each stole 31 bases for the Seattle Mariners; Cameron was caught stealing 8 times while Ichiro was caught stealing 15 times. Based on this, a casual observer would likely conclude that Mike Cameron had a better season than Ichiro in terms of basestealing.

In fact, however, Mike Cameron’s Component 1 record in 2002 was 0.5 wins and 0.7 losses (0.409 winning percentage, -0.2 net wins - wins minus losses) while Ichiro’s record was 0.9 - 0.9 (0.498, -0.0). Why was this? Basically, because Ichiro was smarter about when he attempted to steal bases, so that his successful steals were worth more and his caught stealing were worth less than Mike Cameron’s attempts.

Here is how many wins and losses Cameron and Ichiro’s stolen bases and caught stealings were worth, on average.

Component 1 Decisions per SB Attempt
Ichiro Suzuki Mike Cameron
Runner on SB CS/PO SB CS/PO
1st base 0.020 -0.051 0.012 -0.050
2nd base 0.026 -0.029 0.009 -0.047

Ichiro’s stolen bases were worth twice as much as Cameron’s and his caught stealings cost less. Also working in Ichiro’s favor was the fact that while Cameron was only charged with 8 official caught stealings, he was also picked off 3 times (Ichiro was picked off once). Put it all together, along with the Component 1 decisions earned by the players for not stealing and being caught stealing, and the end result is that Ichiro Suzuki was about 0.2 net wins better (or, perhaps, “less bad” given that both players were sub-0.500 Component 1 players in 2002) than Mike Cameron (relative to 0.500) as a basestealer in 2002.

While 2002 was a bad Component 1 season for both players, the fact that Ichiro produced a better Component 1 record relative to raw stolen base and caught stealing than Mike Cameron is consistent with their careers. Ichiro Suzuki, in his career (2001 - 2012), has compiled a Component 1 record of 9.4 - 6.4 (0.596 winning percentage, 3.0 net wins) on the strength of 452 stolen bases against 102 caught stealings. Overall, Ichiro’s rank in Component 1 net wins for the entire Retrosheet Era is 26th. This works out to about 0.0208 Component 1 wins per stolen base versus 0.0623 Component 1 losses per caught stealing for his career.*

Mike Cameron’s career stolen base numbers (1995 - 2011) are somewhat worse – 297 SB, 83 CS – although not necessarily too much so, but his Component 1 record is more notably worse than Ichiro’s, 5.2 - 4.7 (0.523, 0.5, 300th-best in the Retrosheet Era). This works out to only 0.0174 wins per stolen base versus 0.0568 losses per caught stealing.*
* These win/SB and loss/CS numbers are just done by dividing Component 1 decisions by official SB/CS data. Component 1 decisions are also accumulated on plays in which baserunners do not pick up an SB or CS.

Some players are much better basestealers than others; and Component 1 Player wins and losses provide a deeper insight into which players are the best basestealers than raw stolen base and caught stealing totals.

Component 2: Wild Pitches and Passed Balls

In the second step of calculating Player Wins and Losses, baserunners, pitchers, and catchers are given credit and blame for either advancing (allowing) or failing to advance by (preventing) wild pitches or passed balls.

1.    Calculation of Component 2 Player Game Points
Credits/debits for wild pitches and passed balls (and the occasional case of a baserunner being thrown out trying to advance on a wild pitch or passed ball) are calculated simply as the change in Win Probability resulting from the change in the base/out situation (and the score, if appropriate). Component 2 is also where credit is given to batters for successfully reaching base on a dropped third strike.

The probability of a wild pitch or passed ball is calculated based on the league-wide percentage of times such an event occurred given a particular baserunner/out state – that is, 24 probabilities are calculated, one for each base-out state (note, in this case, non-zero probabilities in bases-empty situations refer to incidences of the batter reaching base safely on a dropped third strike). Unique probabilities are calculated for each league-season. As an example, average probabilities for the entire Retrosheet Era (1934 - 2013) are shown below.

Outs Baserunners WP/PB
0 0 0.0%
0 1 1.9%
0 2 2.2%
0 3 1.7%
0 1-2 2.2%
0 1-3 2.1%
0 2-3 1.6%
0 1-2-3 1.5%
1 0 0.1%
1 1 1.9%
1 2 2.4%
1 3 1.7%
1 1-2 2.2%
1 1-3 2.1%
1 2-3 1.4%
1 1-2-3 1.5%
2 0 0.1%
2 1 1.9%
2 2 2.2%
2 3 1.6%
2 1-2 2.0%
2 1-3 2.1%
2 2-3 1.5%
2 1-2-3 1.4%

As with stolen bases, credit is also given for not throwing wild pitches or committing passed balls. The net win probability in the absence of any wild pitch or passed ball is calculated as follows. The overall win probability is equal to the weighted average of the win probability with and without wild pitches, i.e.,

WinProb = Prob(WP)•WinProbWP + (1-Prob(WP)) •WinProbnoWP

where Prob(WP) is the probability of a wild pitch (or passed ball), which, as noted above, is base-out dependent. The Win Probability in the absence of a wild pitch (or passed ball), WinProbnoWP, can then be calculated as follows:

WinProbnoWP = [1/(1-Prob(WP))]•(WinProb – Prob(WP)•WinProbWP)

The net effect on Win Probability, then, of no wild pitch or passed ball during a plate appearance will simply be the difference: WinProbnoWP – WinProb.

Offensively, wild pitches, passed balls, and the lack thereof, are credited to baserunners. Defensively, the credit for these things is shared by pitchers and catchers. This is one of several cases where credit may be shared by different players. The basic process whereby this credit is divided was described earlier in this article. The specific division of defensive Component 2 Player Game Points is presented next.

2.    Division of Component 2 Game Points Between Pitchers and Catchers
As explained earlier in this article, Component 2 Player Games are divided between pitchers and catchers based on the extent to which player winning percentages persist across different sample periods.

One measure of the extent to which a particular factor is a skill is the extent to which a player’s winning percentage persists over time. To evaluate the persistence of skills, I fit a simple persistence equation which modeled Component 2 winning percentage on even-numbered plays as a function of Component 2 winning percentage on odd-numbered plays:

(Component 2 Win Pct)Even = b•(Component 2 Win Pct)Odd + (1-b)•(WinPct)Baseline

where (WinPct)Baseline represents a baseline winning percentage toward which Component 2 winning percentages regress over time.

Equations of this type were fit for Component 2 Player Game Points for pitchers and catchers. Separate equations were estimated for each base. The results for these equations are shown below. A brief explanation of these variables follows.

The number n is the number of players over whom the equation was estimated, that is, who accumulated any Player wins and/or losses on both odd- and even-numbered plays. The value R2 measures the percentage of variation in the dependent variable (WinPctEven) explained by the equation (i.e., explained by WinPctOdd). The numbers in parentheses are t-statistics. T-statistics measure the significance of b, that is, the confidence we have that b is greater than zero. The greater the t-statistic, the more confident we are that the true value of b is greater than zero. Roughly, if the t-statistic is greater than 2, then we can be at least 95% certain that the true value of b is greater than zero (given that certain statistical assumptions underlying our model hold). The value of (WinPct)Baseline, the baseline winning percentage toward which winning percentages regress over time, is set equal to 0.500 by construction.
note: To be precise, I estimate unique Persistence Equations for every season, which use all of my data in all of these equations, but weight the data based on how close to the season of interest it is. The equations shown here weight each season equally.

Persistence of Component 2 Winning Percentage: Batter as Baserunner (ability to reach on dropped third strike)

 
Pitchers:  n = 29,167, R2 = -0.0117
WinPctEven = (97.46%)•WinPctOdd + (2.54%)•0.5000 (523.0)

 
Catchers:  n = 6,566, R2 = -0.0470
WinPctEven = (42.82%)•WinPctOdd + (57.18%)•0.5000 (37.36)

In terms of preventing batters from reaching first base on a dropped third strike, Component 2 win percentage is highly significantly persistent for both pitchers and catchers, albeit far more so for pitchers. The percentage of Component 2 Player decisions associated with the batter (i.e., associated with dropped third strikes) (Component 2.0) which are attributed to pitchers is set equal to the pitcher persistence coefficient (97.5%) divided by the sum of the persistence coefficients for pitchers and catchers (97.5% +42.8%). This leads to 69.5% of Component 2.0 decisions being allocated to pitchers and 30.5% of Component 2.0 decisions allocated to catchers.

Persistence of Component 2 Winning Percentage: Baserunner on First Base

 
Pitchers:  n = 32,449, R2 = 0.1334
WinPctEven = (61.79%)•WinPctOdd + (38.21%)•0.5000 (140.3)

 
Catchers:  n = 6,880, R2 = 0.0517
WinPctEven = (26.39%)•WinPctOdd + (73.61%)•0.5000 (22.32)

For baserunners on first base, Component 2 win percentage is also significantly persistent for both pitchers and catchers, with 70.1% of Component 2.1 decisions allocated to pitchers and 29.9% of Component 2.1 decisions allocated to catchers.

Persistence of Component 2 Winning Percentage: Baserunner on Second Base

 
Pitchers:  n = 32,096, R2 = 0.0677
WinPctEven = (71.00%)•WinPctOdd + (29.00%)•0.5000 (175.5)

 
Catchers:  n = 6,808, R2 = 0.0218
WinPctEven = (21.80%)•WinPctOdd + (78.20%)•0.5000 (18.06)

Persistence in wild pitches and passed balls is significant here for both pitchers and catchers. This persistence is much stronger as well as much more significant for pitchers, to whom 76.5% of Component 2.2 decisions are allocated, than for catchers, to whom 23.5% of Component 2.2 decisions are allocated. This represents the highest share of any sub-component of Component 2 allocated to pitchers.

Persistence of Component 2 Winning Percentage: Baserunner on Third Base

 
Pitchers:  n = 30,924, R2 = -0.0680
WinPctEven = (74.97%)•WinPctOdd + (25.03%)•0.5000 (211.4)

 
Catchers:  n = 6,658, R2 = -0.0080
WinPctEven = (18.15%)•WinPctOdd + (81.85%)•0.5000 (14.56)

The results for baserunners on third base are similar to the earlier results. Component 2.3 decisions are allocated 80.5% to pitchers and 19.5% to catchers.

Overall, Component 2 Player Game Points account for 1.3% of total Player Decisions, a percentage that has remained fairly stable over the years for which I have estimated Player won-lost records so far.

Component 2 leaders can be found here. An example of how teammate performance can affect a player's won-lost records, using Component 2 as the example, is presented next.

Example of Teammate Effects on Shared Components: Doug Mirabelli
In 2000, Doug Mirabelli committed 5 passed balls in 80 games for the San Francisco Giants, good for a (teammate-unadjusted) context-neutral Component 2 winning percentage of 0.604.

In 2003, Doug Mirabelli committed 14 passed balls in only 55 games for the Boston Red Sox, posting a (teammate-unadjusted) context-neutral Component 2 winning percentage of 0.482.

Did Doug Mirabelli really get that much worse in just three years? Well, he did age from 29 in 2000 to 32 in 2003, so some of that could be age-related decline. But, more significantly for Mirabelli, in 2003, he was the personal catcher for knuckleballer Tim Wakefield, who had a career (context-neutral, teammate-adjusted) Component 2 winning percentage of 0.254.

As explained above, in order to make Player Won-Lost records meaningful as measures of player talent, it is necessary to control for the ability of one’s teammates. The case of Doug Mirabelli, sometime personal catcher for knuckleballer Tim Wakefield, is instructive in this regard.

Doug Mirabelli’s teammate-unadjusted context-neutral Component 2 won-lost records over his career are as follows:

Year Team Wins Losses Win Pct
1996SFN0.020.040.336
1997SFN0.000.001.000
1998SFN0.010.010.530
1999SFN0.060.020.800
2000SFN0.160.100.604
2001TEX0.050.050.483
2001BOS0.110.130.446
2002BOS0.100.100.505
2003BOS0.100.100.482
2004BOS0.100.170.359
2005BOS0.060.070.474
2006SDN0.010.010.602
2006BOS0.110.170.395
2007BOS0.070.140.339
CAREER0.961.110.463

Outside of Boston over these years, Mirabelli’s Component 2 winning percentage was over 0.500 every year except for his first and (barely) his last year, with an overall winning percentage of 0.582. In contrast, Mirabelli’s Component 2 winning percentage was below 0.500 in all but one of his 7 seasons in Boston, with an overall Component 2 winning percentage in Boston of 0.421. Overall, Mirabelli rates as a fairly poor catcher at preventing wild pitches and passed balls, with an overall Component 2 winning percentage of 0.463.

When Mirabelli’s Component 2 won-lost record is adjusted to control for the pitchers who Mirabelli caught, however, the results are the following:

Year Team Wins Losses Win Pct
1996SFN0.020.040.325
1997SFN0.00-0.001.014
1998SFN0.010.010.529
1999SFN0.060.020.788
2000SFN0.160.100.601
2001TEX0.050.050.484
2001BOS0.110.130.477
2002BOS0.110.090.540
2003BOS0.100.090.524
2004BOS0.120.140.460
2005BOS0.080.050.597
2006SDN0.010.010.586
2006BOS0.130.150.460
2007BOS0.100.120.454
CAREER1.071.010.514

Adjusting for the pitchers he caught, Doug Mirabelli turns out to have been slightly above average at preventing wild pitches and passed balls through his career. Outside of Boston over these years, Mirabelli’s Component 2 winning percentage remains fairly consistent after adjusting for his teammates, at 0.577. With Boston, on the other hand, Mirabelli’s combined Component 2 winning percentage improves dramatically from 0.421 unadjusted to 0.492 adjusted.

In words, adjusting for Mirabelli’s teammates brings his Component 2 winning percentages closer together over time. Mathematically, the standard deviation of Mirabelli’s winning percentages falls from 0.109 unadjusted – i.e., Mirabelli’s Component 2 winning percentages are approximately in a range of 0.463 +/- 0.109 (0.354 - 0.572) – to 0.084 adjusted – i.e., Mirabelli’s Component 2 winning percentages range from 0.430 to 0.599 (0.514 +/- 0.084).

Mirabelli was still a bit worse in Boston than elsewhere. Of course, outside of one month in 2006 in San Diego, his career outside of Boston came at ages 25 – 30, while his Boston career was from ages 30 – 36. So based on age alone, we would have expected him to probably be a little less agile at blocking would-be wild pitches in Boston than in San Francisco and Texas.

It seems clear to me that the latter set of numbers more accurately reflect Doug Mirabelli’s ability to prevent wild pitches and passed balls.

Component 3: Balls not in Play

In the third step of calculating Player Wins and Losses, batters and pitchers are given credit and blame for plate appearances that do not result in the ball being put into play: i.e., strikeouts, walks, and hit batsmen. Unlike Components 1, 2, 5, 6, 7, 8, and 9; Components 3 and 4 are not constrained to be 0.500 Components by construction. Instead, the combined winning percentage of Components 3 and 4 is equal to 0.500.

1.    Calculation of Component 3 Player Game Points
Components 3 and 4 are calculated together. After Components 1 (stolen bases) and 2 (wild pitches) are accounted for, Components 3 and 4 are calculated by calculating the expected value of the plate appearance, based purely on the basic result – walk, strikeout, or ball in play – assuming average results following the play. If the batter does not put the ball in play, the results are credited to Component 3. If the batter does put the ball in play (including hitting a home run), the results are credited to Component 4.

For strikeouts and walks, any baserunner advancement beyond normal – e.g., a batter reaching base on a dropped third strike or a baserunner going from first to third on a walk and a wild pitch – is attributed to Component 2. Like Component 1, where baserunners earn Component 1 decisions for not stealing as well as for stealing, batters are credited/debited with Component 2 decisions on strikeouts both for successfully reaching first base on a dropped third strike as well as for failing to reach first base safely. Component 3 decisions for strikeouts are calculated given an average probability of the batter successfully reaching first base on strike three.

Intentional walks are issued at the discretion of the pitching team. To acknowledge this, I do not calculate a “context-neutral” version of Component 3 decisions for intentional walks. Instead, intentional walks are tied to the context in which they occurred, so that “context-neutral” Component 3 intentional walks are exactly equal to their context-dependent” Component 3 value.

2.    Relative Values of Strikeouts and Walks
On average, in the Retrosheet era, a strikeout has a net win value (for the batter) of -0.0207 while unintentional walks and hit batsmen have had an average net win value of 0.0350.

These values have varied somewhat over time due to differences in the run-scoring environment, although not by as much as most other offensive events. A strikeout-to-walk ratio (excluding intentional walks) for a pitcher greater than 1.69 will produce a Component 3 winning percentage over 0.500, while a strikeout-to-walk ratio for a batter less than 1.69 will produce a Component 3 winning percentage over 0.500. Overall, from 1934 - 2013, the actual ratio of strikeouts to (unintentional) walks* was 1.72, so that, on average, pitchers’ Component 3 winning percentage is somewhat greater than 0.500 (0.491) while batters’ Component 3 winning percentage is somewhat less than 0.500 (0.509).
*Hit-by-Pitches are included here as well.

The strikeout-to-walk ratio in Major League Baseball has increased in recent years, as the number of strikeouts has increased. Since 2000, the strikeout-to-(unintentional)walk ratio in Major League Baseball was 2.01 with pitchers having a Component 3 winning percentage of 0.580 over this time period.

Over the entirety of the Retrosheet era, Component 3 accounted for 15.5% of total Player decisions. That share has grown over time as strikeouts have increased. Since 2000, Component 3 has accounted for 16.6% of total Player decisions. Component 3 Player decisions are assigned entirely to batters and pitchers.

Component 3 leaders are shown here.

Component 4: Balls in Play

In the fourth step of calculating Player Wins and Losses, batters and pitchers are given credit and blame for plate appearances that result in the ball being put into play. Wins and losses are assigned in this step based on the expected value of the balls in play. Expected values are calculated based on the location and type of ball hit. The level of location and hit-type information available varies considerably by year. The most complete location data are available from 1989 through 1999. More recent data have more limited location and hit-type data, while older data has still less complete location data. The calculations of the probabilities of various events used in this step of calculations were described earlier in this article.

1.    Calculation of Component 4 Player Game Points
Components 3 and 4 are calculated together. After Components 1 (stolen bases) and 2 (wild pitches) are accounted for, Components 3 and 4 are evaluated by calculating the expected value of the plate appearance, based purely on the basic result – walk, strikeout, or ball in play – assuming average results following the play. If the batter does not put the ball in play, the results are credited to Component 3. If the batter does put the ball in play (including hitting a home run), the results are credited to Component 4. Component 4 decisions are calculated assuming average results based on where and how the ball is hit. Whether the ball becomes a hit or an out is allocated in Component 5.

As explained in the description of Component 3, Components 3 and 4 are not individually constrained to 0.500 winning percentages. Instead, the combined winning percentage of Components 3 and 4 is equal to 0.500. Overall, putting the ball in play is a net positive event for the offense. From 1934 - 2013, the overall Component 4 winning percentage for hitters was 0.496. More recently, as home runs have increased, this has improved to 0.538 since 2000.

2.    Home Runs: Component 3 or Component 4?
In sabermetric circles, it is fairly common to distinguish between home runs and other balls hit into play. In fact, frequently, the term “balls-in-play” or BIP is used to denote those balls that are hit into play excluding home runs. The logic for this distinction is most apparent in the study of Defense-Independent Pitching Statistics, or DIPS.

The basic idea behind DIPS is that pitchers have more control over those events which do not involve fielders and, of course, home runs (not including inside-the-park home runs) do not involve fielders. In a way, however, this breakdown doesn’t really make intuitive sense and has, I think, contributed to a lot of the misunderstanding and understatement of the effect pitchers have on balls in play.

Do pitchers have any control over balls in play? Several years ago, Voros McCracken developed a theory (called DIPS) that said that the ability to prevent hits on balls in play (excluding home runs) was the same, or virtually the same, for all major-league pitchers. His key conclusion was that one could predict a pitcher’s earned run average (ERA) for the next season looking only at that pitcher’s strikeouts, walks, and home runs allowed and that this predicted (or DIPS) ERA was, on average, a better predictor of future ERA than actual ERA. In other words, pitchers who had actual ERAs better than their DIPS ERAs would expect to see their ERAs get worse the next season (move toward their DIPS ERAs), while pitchers who had actual ERAs worse than their DIPS ERAs would expect to see their ERAs get better the next season (again, moving them toward their DIPS ERAs). McCracken’s conclusion that DIPS ERA was a better predictor than actual ERA was, and almost certainly still is, fundamentally true.*
*A nice write-up of the issue of how much control pitchers have over balls that batters hit is here. The linked article (by Mike Fast) is particularly useful because it provides links to several other studies of the issue in its concluding section (References and Resources).

Some baseball fans have taken this argument a step further (some might say they’ve taken it to its natural conclusion) and argued that pitchers have no effect on balls in play. With all due respect, anybody who has watched baseball for any length of time knows that this argument is simply not true as I have just stated it.

Some pitchers are “groundball pitchers”. That is, the balls which are hit off of them tend to be ground balls, not fly balls. In the recent past, well-known groundball pitchers have included, for example, Brandon Webb, whose ground-ball percentages (percentage of total balls-in-play allowed that were ground balls) ranged from 61 – 66% from 2004 – 2007; and Chien-Ming Wang, whose ground-ball percentages ranged from 59 – 64% over the same time period (actually 2005-07; he didn’t pitch in the majors in 2004). On the other hand, Barry Zito’s ground-ball percentages over this same time period ranged from 37 – 42% and Curt Schilling’s ranged from 34 – 42%. Suffice it to say that I am not aware of anybody who would seriously argue that Webb, Wang, Zito, and Schilling’s ground-ball percentages are entirely the product of luck over this time period. So pitchers clearly have some impact over balls-in-play, right?

This is where the treatment of home runs becomes critical. Since 2003, an average ground ball was worth 0.0096 wins to the defensive team (i.e., to the pitcher). Over the same time period, excluding home runs, an average fly ball (including infield popups, excluding line drives) was worth 0.0132 wins to the defensive team. These results are relatively similar and, moreover, are relatively small. Compare these, for example, to line drives, which have an average net win value of -0.0259 wins to the defensive team or strikeouts, with an average defensive net win value of 0.0215.

But this is where home runs come in. As even McCracken noted in his original DIPS formulation, pitchers have some control over whether batters hit home runs against them. But a home run is just a fly ball (or line drive) that goes farther than the fly balls that stay in the park. As McCracken himself said, “Aside from walks, there are two basic outcomes for a pitcher: batter hits the ball or batter strikes out. With the latter, the result is almost always an out. With the former, all sorts of things can happen, including a base hit.” Of course, one of those “sorts of things” that “can happen” when the batter hits the ball is that he could hit a home run.

If we add home runs, which have an average net win value (to the batter) of 0.1378 wins, to the fly balls allowed by pitchers, we see that, all of a sudden, a fly ball allowed isn’t a slightly better outcome than a ground ball allowed (0.0132 wins to 0.0096 wins above) but, in fact, is a net negative outcome for a pitcher: -0.0032 wins.

Based on this, I allocate wins and losses attributable to home runs to Component 4 rather than Component 3. Since Components 3 and 4 are calculated simultaneously, however, this is, in fact, a purely semantic decision. If one were inclined to include home runs as part of Component 3, one could do so by simply re-defining all Component 4 decisions resulting from home runs as Component 3 decisions instead.

Overall, from 1934 - 2013, Component 4 decisions account for 32.9% of total Player decisions, 36.0% of all Batting decisions, and 50.8% of total Pitcher decisions.

Component 4 leaders are shown here.

Component 5: Hits vs. Outs

In the fifth step of calculating Player Wins and Losses, batters, pitchers, and fielders are given credit and blame for balls in play becoming hits or outs.

1.    Calculation of Component 5 Player Game Points
Credits/debits for whether balls in play become hits or outs are assigned in Component 5. The average value of the ball in play, based on the location and hit type of the ball, was assigned to batters and pitchers in Component 4. Component 5 player wins and losses are calculated based on an average result of the play, given that it is either a hit or an out. For hits, debits and credits based on the type of hit – single, double, or triple – are assigned in Component 6. Credits for double plays, and baserunner outs and advancements are assigned in Components 7, 8, and 9, respectively. For fielders, Component 5 Player Game Points are the portion of Player won-lost records that are most comparable to other play-by-play measures of fielding, such as UZR, PMR, +/-, and TotalZone.

2.    Division of Component 5 Game Points Between Pitchers and Fielders
Component 5 Player Games are shared between pitchers and fielders based on the extent to which player winning percentages persist across different sample periods. The mathematics underlying this division were described earlier in this article.

To summarize, one measure of the extent to which a particular factor is a skill is the extent to which a player’s winning percentage persists over time. To evaluate the persistence of skills, I fit a simple persistence equation which modeled Component 5 winning percentage on even-numbered plays as a function of Component 5 winning percentage on odd-numbered plays:

(Component 5 Win Pct)Even = b•(Component 5 Win Pct)Odd + (1-b)•(WinPct)Baseline

where (WinPct)Baseline represents a baseline winning percentage toward which Component 5 winning percentages regress over time.

Equations of this type were fit for Component 5 Player Game Points for pitchers and fielders. Separate equations were estimated for each fielding position (except for pitcher, obviously). The results for these equations are shown below. A brief explanation of these variables follows.

The number n is the number of players over whom the equation was estimated, that is, who accumulated any Player wins and/or losses on both odd- and even-numbered plays. The value R2 measures the percentage of variation in the dependent variable (WinPctEven) explained by the equation (i.e., explained by WinPctOdd). The numbers in parentheses are t-statistics. T-statistics measure the significance of b, that is, the confidence we have that b is greater than zero. The greater the t-statistic, the more confident we are that the true value of b is greater than zero. Roughly speaking, if the t-statistic is greater than 2, then we can be at least 95% certain that the true value of b is greater than zero (assuming that certain statistical assumptions regarding our model hold). The value of (WinPct)Baseline, the baseline winning percentage toward which winning percentages regress over time, is set equal to 0.500 by construction.
note: To be precise, I estimate unique Persistence Equations for every season, which use all of my data in all of these equations, but weight the data based on how close to the season of interest it is. The equations shown here weight each season equally.

Persistence of Component 5 Winning Percentage: Catcher

 
Pitchers:  n = 31,452, R2 = -0.0226
WinPctEven = (4.66%)•WinPctOdd + (95.34%)•0.5000 (8.313)

 
Catchers:  n = 6,750, R2 = 0.0123
WinPctEven = (12.58%)•WinPctOdd + (87.42%)•0.5000 (10.21)

The percentage of Component 5.2 Player decisions which are attributed to pitchers is set equal to the pitcher persistence coefficient (4.7%) divided by the sum of the persistence coefficients for pitchers and catchers (4.7% + 12.6%). This leads to 27.0% of Component 5.2 decisions being allocated to pitchers and 73.0% of Component 5.2 decisions allocated to catchers.

Persistence of Component 5 Winning Percentage: First Baseman

 
Pitchers:  n = 31,945, R2 = 0.0254
WinPctEven = (21.98%)•WinPctOdd + (78.02%)•0.5000 (39.77)

 
First Basemen:  n = 8,622, R2 = 0.0714
WinPctEven = (30.06%)•WinPctOdd + (69.94%)•0.5000 (28.08)

The breakdown of Component 5.3 Player decisions is 42.2% for pitchers versus 57.8% for first basemen.

Persistence of Component 5 Winning Percentage: Second Baseman

 
Pitchers:  n = 32,116, R2 = 0.0091
WinPctEven = (26.11%)•WinPctOdd + (73.89%)•0.5000 (47.69)

 
Second Basemen:  n = 7,896, R2 = 0.0996
WinPctEven = (35.86%)•WinPctOdd + (64.14%)•0.5000 (31.80)

Based on these results, Component 5.4 Player decisions are allocated 42.1% to pitchers and 57.9% to second basemen.

Persistence of Component 5 Winning Percentage: Third Baseman

 
Pitchers:  n = 32,046, R2 = 0.0067
WinPctEven = (28.14%)•WinPctOdd + (71.86%)•0.5000 (52.60)

 
Third Basemen:  n = 8,828, R2 = 0.1249
WinPctEven = (40.92%)•WinPctOdd + (59.08%)•0.5000 (40.32)

For Component 5.5, Player decisions are split 40.7% for pitchers, 59.3% for third basemen.

Persistence of Component 5 Winning Percentage: Shortstop

 
Pitchers:  n = 32,089, R2 = -0.0230
WinPctEven = (20.84%)•WinPctOdd + (79.16%)•0.5000 (38.35)

 
Shortstop:  n = 7,028, R2 = 0.0711
WinPctEven = (35.59%)•WinPctOdd + (64.41%)•0.5000 (29.85)

Pitchers receive 36.9% of the credit for Component 5.6 Player decisions, while shortstops earn 63.1%. Shortstops and third basemen earn the largest share of Component 5 decisions of any positions. This makes a certain amount of sense to me. These positions are, perhaps, more difficult to field than first or second base, thereby allowing for a greater spread in the observed talent of fielders on this side of the infield.

Persistence of Component 5 Winning Percentage: Left Fielder

 
Pitchers:  n = 31,989, R2 = 0.0058
WinPctEven = (25.96%)•WinPctOdd + (74.04%)•0.5000 (48.73)

 
Left Fielders:  n = 12,241, R2 = 0.1894
WinPctEven = (45.27%)•WinPctOdd + (54.73%)•0.5000 (55.12)

Pitchers receive 36.4% of Component 5.7 Player decisions, while left fielders earn 63.6% of Component 5.7 decisions.

Persistence of Component 5 Winning Percentage: Center Fielder

 
Pitchers:  n = 32,041, R2 = 0.0004
WinPctEven = (22.02%)•WinPctOdd + (77.98%)•0.5000 (39.88)

 
Center Fielders:  n = 8,403, R2 = 0.1589
WinPctEven = (40.31%)•WinPctOdd + (59.69%)•0.5000 (43.62)

Based on these results, Component 5.8 Player decisions are allocated 35.3% to pitchers and 64.7% to center fielders. Center fielders therefore earn the highest percentage of Component 5 player decisions of the three outfield positions. As with shortstops and third basemen, the relative difficulty of fielding center field (as compared to left field or right field) may allow for more observable separation in the fielding quality of center fielders, leading to higher and more significant persistence of center fielders' Component 5 winning percentage.

Persistence of Component 5 Winning Percentage: Right Fielder

 
Pitchers:  n = 31,960, R2 = 0.0041
WinPctEven = (27.10%)•WinPctOdd + (72.90%)•0.5000 (49.35)

 
Right Fielders:  n = 11,055, R2 = 0.1799
WinPctEven = (48.40%)•WinPctOdd + (51.60%)•0.5000 (52.60)

Component 5.9 Player decisions are allocated 35.9% to pitchers and 64.1% to right fielders.

Overall, pitchers earn about 40.6% of defensive Component 5 Player decisions, including 3.6% of Component 5 player decisions which are compiled on balls-in-play that are fielded by the pitcher. Excluding pitchers and catchers, infielders are allocated about 62.1% of Component 5 Player decisions involving them, while outfielders are allocated 67.2% of relevant Component 5 Player decisions.

3.    Component 5 Player Decisions by Fielding Position
The breakdown of Component 5 Player decisions by fielding position is as follows:

Position 1934 - 2013Since 2000
Pitcher 5.7%4.9%
Catcher 1.1%0.9%
First Base 7.0%7.5%
Second Base 14.6%14.3%
Third Base 13.9%14.3%
Shortstop 17.6%16.1%
Left Field 13.7%13.8%
Center Field 12.9%13.8%
Right Field 13.6%14.4%

Infielders, especially pitchers and shortstops, field fewer plays in more recent years, while outfielders, particularly right fielders and center fielders, field more. This is consistent with the higher run-scoring context in recent years, which leads to fewer bunts and more hits to the outfield. It also suggests that a higher percentage of balls are hit to the right side - first base, second base, right field - now than was the case 50 years ago. Perhaps there are more left-handed batters today?

For older years (typically, pre-1974), there is less information on the first fielder to field hits and, in some cases, there is even uncertainty on the fielder(s) involved in some outs on balls-in-play. These will lead to less reliable fielding records in general, although it is not clear to me that this would affect the distribution of fielding plays for those plays which can be identified.

Excluding pitchers and catchers, the numbers look like this:

Position 1934 - 2013Since 2000
First Base 7.5%7.9%
Second Base 15.6%15.2%
Third Base 14.9%15.2%
Shortstop 18.9%17.1%
Left Field 14.7%14.6%
Center Field 13.8%14.6%
Right Field 14.5%15.3%

Most of these results look reasonable: catchers handle very few balls in play (and most of those that they do handle require relatively little skill on the catcher's part), and middle infielders handle more plays than corner infielders. One result does strike me as a little bit curious: corner outfielders earn somewhat more Component 5 decisions than center fielders. This is explored in a bit more detail below.

Component 5 leaders are shown here.

The next section takes a brief sidebar to look more closely at my use of location data in calculation Player won-lost records. The issue of center fielders vs. corner outfielders is considered after that.

Use of Location Data in Calculating Player Won-Lost Records
For balls in play, there are three pieces of information that are potentially of value in determining the value of particular plays and to whom that value should be credited (or debited): (i) the first fielder to make a play on the ball, (ii) the type of hit (ground ball, fly ball, line drive), and (iii) the location of the ball. The extent to which these three pieces of information are available in Retrosheet play-by-play data varies considerably through the years.

   (i)    First Fielder
The first fielder to touch the ball is the most important consideration for determining credit. The first fielder to touch the ball is identified for virtually all plays for the last 20 – 25 years of Retrosheet data. For earlier years, the first fielder to touch most base hits is unknown. As data goes back even further, there are even some outs for which the fielder of record is unknown.

For my work, the identity of the first fielder is used for assigning credit whenever this information is available. When this information is not available, credit is allocated across all fielders in the proportion which fielders get credit across similar plays for which the fielder is known.

   (ii)    Hit Type
The second level of detail on balls in play is the type of hit: ground ball, fly ball, line drive. This information is available from Retrosheet for all balls in play for the years 1989 - 1999 and for seasons since 2003. For other years, hit types are generally only available on outs-in-play, not hits.

As with first fielder information, hit type information is used in calculating Player won-lost records whenever this information is available. When this information is not available, credit is allocated based on the expected distribution of hit type based on the final play result.

   (iii)    Location
For the years 1989 - 1999, the location of all balls-in-play are identified in Retrosheet's play-by-play data. I do not use this location data directly in calculating Player won-lost records, however. Instead, I use location data for these seasons to calculate expected ex ante probabilities for ball-in-play events. That is, based on 1989 - 1999 location data, I calculate what the probability of an out would have been on a play that ended up as, say, a line drive double to the left fielder.

After a great deal of research and consideration, I decided to use location data only in this indirect way even for those seasons for which Retrosheet provides location data (i.e., 1989 - 1999). I made this decision for several reasons. For one thing, using location data only indirectly leads to a more consistent methodology across all seasons for which I estimate Player won-lost records. But also, it was not clear to me, in looking at results from those years for which location data are available, that the location data actually improved the results.

Location data is fundamentally subjective, by its very nature. Relying on individual pieces of subjective data will inevitably introduce errors and possible biases into the valuation of these individual plays. By relying on location data only indirectly, however, and by relying on all of the location data - 11 years' worth - in assessing every play should allow these individual errors to balance out and offset in such a way as to vastly reduce any potential biases or errors.

Consider, for example, the impact of using STATS data versus BIS data for calculating UZR fielding statistics. Simply changing the data source leads to wildly different stories about some players' defense: was Andruw Jones the best fielder in baseball from 2003 - 2008 (+112 runs in UZR using BIS data) or a slightly below-average center fielder (-5 using STATS)? If the results are that unstable across different location measurements of the same plays, then it's hard to see exactly how much information location data is bringing to the party at all.

Beyond the question of whether the actual locations being reported are accurate, however, another issue with using location data is that I think that relying too heavily on location data builds on a fundamental assumption that I am not entirely sure is true. This is that balls hit to the same location are more similar than balls that end up with the same end result. That is, a fielding system based on location data treats two fly balls to medium right-center as equivalent - implicitly assuming that all fly balls to medium right-center are created equally. My fielding system here treats two fly ball doubles fielded by the right fielder as equivalent - implicitly assuming that all fly-ball doubles fielded by the right fielder are equivalent.

I am not saying the latter of these implicit assumptions is necessarily right, so much as I wonder whether the former implicit assumption is actually more right. And if our focus is purely on player value rather than player talent (as it is in my system), then, in fact, in many ways it makes more sense to me to view one fly-ball double to right field as being equally valuable as any other fly-ball double to right field than to view a fly-ball double to medium right-center field as equally valuable as a fly out to the same location.

Incidentally, my Player won-lost records rank Andruw Jones as the 7th-best fielder (measured by net fielding eWins) of the Retrosheet Era (career record of 81.5 eWins and 72.1 eLosses), although they also see him as being below average in 2007 and 2008, so that from 2003 - 2008, Andruw Jones has a fielding record of 32.0 eWins against 30.8 eLosses (0.510 winning percentage, 1.3 net wins).

   The Impact of Location Data on my Fielding System
I am not necessarily saying that my results are "better" by not using location data for individual plays. But neither do I think it is obvious that my results are worse. The problem with evaluating fielding systems, in general, is that we don't really know what the "right" answer is - after all, if we knew the right answer, we'd just use that.

The one thing that I can compare is how my results compare to what they would have been had I used location data for those years for which it is available, 1989 - 1999. For those seasons, I calculated Player won-lost records both ways. I then calculated a weighted correlation of winning percentages by fielding position between the two methods. The results were as follows:

Weighted Correlation, Fielding Winning Percentages: Location Data v. No Location Data
Catcher 77.56%
First Base 59.99%
Second Base 62.58%
Third Base 66.94%
Shortstop 69.16%
Left Field 45.52%
Center Field 62.74%
Right Field 47.88%

Note: Catcher figures exclude SB, WP

Generally speaking, the correlations here are fairly high. They are higher for infielders than outfielders, which makes a certain amount of sense: there's likely far less of a spread in the range of locations for which infielders can make plays. The fact that the wider range of possible outfield locations corresponds to lower correlations in fielder winning percentage in the outfield suggests that location data probably would add something. But, as I said, since we don't know for certain what the "right" answer is anyway, the mere fact that these two methodologies do or do not correlate well doesn't really tell us anything definitive about which of the two methodologies is "better", only the extent to which they're different.

To some extent, then, the proof of the pudding is in the eating. The top 25 defensive players, measured by net Fielding Wins (eWins minus eLosses), of the Retrosheet Era are as follows:

Career Fielding Won-Lost Records
Player eWins eLosses Net Fielding Wins
(eWins - eLosses)
Carl Furillo75.561.314.2
Pee Wee Reese75.262.612.6
Ozzie Smith107.596.011.5
Ichiro Suzuki82.571.511.0
Jesse Barfield61.551.210.4
Willie Davis90.680.99.7
Barry Bonds115.0105.39.7
Andruw Jones81.572.19.3
Al Kaline96.987.89.1
Mark Belanger73.064.68.4
Brooks Robinson81.273.37.9
Curt Flood66.759.27.6
Jim Gilliam51.444.07.4
Jackie Robinson37.129.87.3
Roberto Clemente100.192.97.2
Alexis Rios58.551.57.0
Cal Ripken102.595.57.0
Robin Ventura52.845.96.9
Luis Aparicio102.795.96.8
Brian Jordan52.846.06.8
Jim Piersall65.859.16.7
Darin Erstad43.036.36.6
Tim Wallach62.455.96.5
Reggie L. Sanders63.857.46.4
Clete Boyer51.445.16.3

The 25 worst defensive players by the same measure are as follows:

Career Fielding Won-Lost Records
Player eWins eLosses Net Fielding Wins
(eWins - eLosses)
Jeff Burroughs45.654.5-8.9
Gary Sheffield75.383.6-8.3
Greg Luzinski42.150.0-7.9
Gary Matthews Sr.72.480.1-7.7
Ralph Kiner50.257.7-7.5
Frank Howard53.961.4-7.4
Dante Bichette59.366.7-7.4
Keith Moreland34.440.6-6.1
Leon Wagner36.142.1-6.1
Derek Jeter78.484.5-6.0
Harvey Kuenn57.963.9-5.9
Bobby Bonilla56.061.8-5.9
Steve Sax54.159.7-5.6
Todd Zeile48.754.3-5.5
Al Martin35.841.2-5.5
Ted Williams66.371.7-5.4
Gus Bell60.766.0-5.3
Ken Griffey Sr.64.870.0-5.2
Roy Sievers48.353.5-5.2
Danny Tartabull34.639.7-5.1
Craig Biggio75.680.6-4.9
Dick Allen38.943.7-4.9
Adam Dunn50.054.7-4.7
Dean Palmer29.634.4-4.7
Juan Pierre61.766.4-4.6

Nothing necessarily jumps out of those two tables as especially unreasonable to me.

Center Fielders vs. Corner Outfielders
Over the Retrosheet Era, total Fielding Decisions (Player wins plus Player losses) for each of the three outfield positions were as follows:

Component 5 Component 6 Component 8 Component 9 Total Fielding
Left Field 12,2683,6912,2174,79222,968
Center Field 11,5282,8612,0255,32721,741
Right Field 12,1063,1432,2764,88822,413


At first glance, this looks quite curious. Why do left fielders and right fielders accumulate more fielding decisions than center fielders? What does this mean, exactly? Is a good defensive right fielder more valuable than a good defensive center fielder? Arguably. Should teams play their best defensive outfielder in right field rather than center field? Probably not.

The reason for this apparent anomaly is not because corner outfielders are better, or even necessarily more valuable, than center fielders. Rather, this is the result of two issues that are worth thinking about with respect to Player Won-Lost records. First, there is a wider range of fielding talent across corner outfielders than across center fielders, and, second (and somewhat related), there is a greater range of possible outcomes on balls hit to left or right field than on balls hit to center field.

The table below shows the number of plays in the 2007 American League (i.e., games played at AL ballparks) for which the various outfielders are the fielder of record (i.e., are the first fielder to touch the ball):

Total Plays Total Outs Singles* Doubles Triples % Outs % XBH
Left Field 10,252 4,611 3,965 1,618 58 45.0% 29.8%
Center Field 11,667 5,996 4,421 1,060 190 51.4% 22.1%
Right Field 9,609 4,527 3,629 1,278 175 47.1% 28.7%

*“Singles” include batters reaching on error.

For simplicity, suppose that singles have a net fielding win value of -0.0364 and extra-base hits have a net fielding win value of -0.0625 (these are reasonably close to the average net win values for these plays in recent seasons).* Let’s also normalize the above numbers to be per 100 plays.

*Base hits will likely not have the exact same value to all fields because of differences in baserunner advancement. The numbers here should therefore be thought of as illustrative, not definitive.

Total Plays Outs Singles Extra-Base Hits
Left Field 100.00 44.98 38.68 16.35
Center Field 100.00 51.39 37.89 10.71
Right Field 100.00 47.11 37.77 15.12


So, for example, left fielders allow 38.68 singles per 100 plays. At -0.0364 wins per play that works out to -1.41 wins for left fielders on singles allowed. Full numbers are shown in the table below.



Net Wins on: Singles Extra-Base Hits Total Losses Total Wins Wins per Out
Left Field -1.4078 -1.0218 -2.4295 2.4295 0.0540
Center Field -1.3793 -0.6696 -2.0489 2.0489 0.0399
Right Field -1.3747 -0.9451 -2.3198 2.3198 0.0492


Let me walk through the numbers briefly. As noted above, left fielders allow 38.68 singles per 100 plays with a value of -0.0364 wins (0.0364 losses) per single, for a total of -1.4078 wins (1.4078 losses). Left fielders allow 16.35 extra-base hits per 100 plays with a value of -0.0625 wins per extra-base hit, for a total of -1.0218 wins on extra-base hits. Adding these together, left fielders accumulate approximately 2.43 losses per 100 plays. Since fielding wins and losses are set to be equal in the aggregate for every position by construction, this means that left fielders also accumulate 2.43 wins per 100 plays, which works out to 0.0540 wins per out by the left fielder.

Note what this shows. Plays made by the left fielder are worth more player decisions on average – 0.049 decisions per play* – than plays made by right fielders – 0.046 – than plays made by center fielders – 0.041. This is true for two reasons. First, center fielders allow fewer extra-base hits than corner outfielders – 10.7 per 100 plays vs. 15.8 per 100 plays for corner outfielders, and extra-base hits have the highest value in terms of total player decisions per play. Second, center fielders allow fewer hits than corner outfielders – 48.6 per 100 plays vs. 54.0 per 100 plays – which makes outs to center field less valuable – because they’re more common – than outs to the corners. The overall result is that an average play made by a left fielder is worth about 19% more player decisions than an average play made by a center fielder, which is more than enough difference to offset the fact that center fielders were involved in 14% more plays than left fielders (in the 2007 American League).
*2.4295 wins plus 2.4295 losses equals 4.86 total decisions per 100 plays, or 0.0486 decisions per play.

The primary reason for this, I believe, is that there is a much wider range in the abilities of corner outfielders as compared to center fielders. Mathematically, this can be measured by looking at the standard deviation of winning percentages by corner outfielders. Over the Retrosheet Era, the standard deviation of season-level winning percentages for center fielders (fielding only) is 4.0%, versus 4.7% for right fielders and 4.9% for left fielders. In other words, the spread in winning percentages for corner outfielders (which can be taken as an approximation of the spread in the fielding talent of corner outfielders) is approximately 19% greater than the spread in center-fielder winning percentages (fielding talent).

In words, virtually all center fielders are good fielders, whereas, while some corner outfielders are excellent fielders (e.g., Ichiro Suzuki), others are notoriously bad fielders (e.g., Manny Ramirez). The result is that the relative value of a corner outfielder who is capable of converting balls in play into outs and in preventing extra-base hits is greater than the value of a center fielder that can do the same, because such a corner outfielder is rarer. Curious, but I think it’s true.

Component 6: Singles vs. Doubles vs. Triples

In the sixth step of calculating Player Wins and Losses, batters, pitchers, and fielders are given credit and blame for the number of bases gained on hits on balls-in-play, i.e., singles versus doubles versus triples.

1.    Calculation of Component 6 Player Game Points
Given that the batter reached base safely on a ball-in-play (via either hit or error), credits/debits are assigned in Component 6 for how many bases the batter gains. Basically, Component 6 credits batters for hitting doubles and triples as opposed to singles. Component 6 points are assigned assuming average baserunner advancement. Credits for baserunner outs and advancements are assigned in Components 8 and 9. Overall, Component 6 accounts for approximately 3.4% of total Player decisions from 1934 - 2013. This has declined slightly since 2000, to 3.2%.

2.    Division of Component 6 Game Points Between Pitchers and Fielders
Component 6 Player Games are shared between pitchers and fielders based on the extent to which player winning percentages persist across different sample periods. The mathematics underlying this division was described earlier in this article.

To summarize, one measure of the extent to which a particular factor is a skill is the extent to which a player’s winning percentage persists over time. To evaluate the persistence of skills, I fit a simple persistence equation which modeled Component 6 winning percentage on even-numbered plays as a function of Component 6 winning percentage on odd-numbered plays:

(Component 6 Win Pct)Even = b•(Component 6 Win Pct)Odd + (1-b)•(WinPct)Baseline

where (WinPct)Baseline represents a baseline winning percentage toward which Component 6 winning percentages regress over time. Equations of this type were fit for Component 6 Player Game Points for pitchers and fielders. Separate equations were estimated for each fielding position (except for pitcher, obviously). The results for these equations are shown below. A brief explanation of these variables follows.

The number n is the number of players over whom the equation was estimated, that is, who accumulated any Player wins and/or losses on both odd- and even-numbered plays. The value R2 measures the percentage of variation in the dependent variable (WinPctEven) explained by the equation (i.e., explained by WinPctOdd). The numbers in parentheses are t-statistics. T-statistics measure the significance of b, that is, the confidence we have that b is greater than zero. The greater the t-statistic, the more confident we are that the true value of b is greater than zero. Roughly speaking, if the t-statistic is greater than 2, then we can be at least 95% certain that the true value of b is greater than zero (assuming that certain statistical assumptions regarding our model hold). The value of (WinPct)Baseline, the baseline winning percentage toward which winning percentages regress over time, is set equal to 0.500 by construction.
note: To be precise, I estimate unique Persistence Equations for every season, which use all of my data in all of these equations, but weight the data based on how close to the season of interest it is. The equations shown here weight each season equally.

Persistence of Component 6 Winning Percentage: Catcher

 
Pitchers:  n = 23,573, R2 = 0.1078
WinPctEven = (47.56%)•WinPctOdd + (52.44%)•0.5000 (54.61)

 
Catchers:  n = 6,095, R2 = -0.0645
WinPctEven = (23.36%)•WinPctOdd + (76.64%)•0.5000 (19.22)

There is no significant persistence in Component 6 winning percentages for catchers. Based on this, Component 6.2 decisions are allocated 100% to pitchers.
Persistence of Component 6 Winning Percentage: First Basemen

 
Pitchers:  n = 24,866, R2 = 0.0003
WinPctEven = (-10.49%)•WinPctOdd + (110.49%)•0.5000 (-16.85)

 
First Basemen:  n = 7,018, R2 = -0.0523
WinPctEven = (23.79%)•WinPctOdd + (76.21%)•0.5000 (15.14)

The share of Component 6.3 decisions allocated to pitchers is set equal to the persistence coefficient from the Pitcher equation (-10.5%) divided by the sum of the two persistence coefficients (-10.5% + 23.8%). This leaves Component 6.3 decisions split -78.8% to pitchers versus 178.8% to first basemen.

Persistence of Component 6 Winning Percentage: Second Basemen

 
Pitchers:  n = 25,870, R2 = 0.5009
WinPctEven = (78.73%)•WinPctOdd + (21.27%)•0.5000 (209.3)

 
Second Basemen:  n = 6,791, R2 = 0.2660
WinPctEven = (63.35%)•WinPctOdd + (36.65%)•0.5000 (54.90)

Component 6.4 decisions are split 55.4% to pitchers versus 44.6% to fielders.

Persistence of Component 6 Winning Percentage: Third Basemen

 
Pitchers:  n = 26,456, R2 = 0.1009
WinPctEven = (33.68%)•WinPctOdd + (66.32%)•0.5000 (62.17)

 
Third Basemen:  n = 7,696, R2 = 0.0820
WinPctEven = (34.89%)•WinPctOdd + (65.11%)•0.5000 (33.73)

Component 6.5 decisions are split 49.1% to pitchers versus 50.9% to fielders.

Persistence of Component 6 Winning Percentage: Shortstop

 
Pitchers:  n = 26,636, R2 = 0.6512
WinPctEven = (86.78%)•WinPctOdd + (13.22%)•0.5000 (276.9)

 
Shortstops:  n = 6,224, R2 = 0.1055
WinPctEven = (29.54%)•WinPctOdd + (70.46%)•0.5000 (30.51)

Component 6.6 decisions are split 74.6% to pitchers and 25.4% to fielders.

Outside of catchers, shortstops receive the second least credit for Component 6 decisions of any fielders, just more than second basemen. This makes a certain amount of sense, I think, as most extra-base hits are either hit to the outfield or are hit down the line in the infield.

Persistence of Component 6 Winning Percentage: Left Fielder

 
Pitchers:  n = 30,256, R2 = 0.0014
WinPctEven = (3.89%)•WinPctOdd + (96.11%)•0.5000 (6.567)

 
Left Fielders:  n = 11,543, R2 = 0.0524
WinPctEven = (22.47%)•WinPctOdd + (77.53%)•0.5000 (25.70)

Left fielders receive the largest percentage of Component 6 credit of any fielder: 85.2%.

This reflects two things, I believe. First, most extra-base hits are to the outfield, which is reflected in outfielders receiving more credit in general than infielders, and, specific to left-fielders, the range in fielding talent is probably greatest (at least among the outfield positions) at left field, where many teams try to hide some of their worst fielders (Frank Howard, Kevin Reimer, Manny Ramirez) while other teams put players who are very fast if nothing else (Lou Brock, Rickey Henderson, Carl Crawford), which likely helps to cut off would-be extra-base hits to the gaps.

Persistence of Component 6 Winning Percentage: Center Fielder

 
Pitchers:  n = 30,303, R2 = 0.0013
WinPctEven = (9.67%)•WinPctOdd + (90.33%)•0.5000 (16.44)

 
Center Fielders:  n = 7,958, R2 = 0.1461
WinPctEven = (38.25%)•WinPctOdd + (61.75%)•0.5000 (37.37)

Pitchers receive 20.2% of Component 6.8 player decisions, while center fielders receive 79.8%.

Persistence of Component 6 Winning Percentage: Right Fielder

 
Pitchers:  n = 30,054, R2 = 0.0057
WinPctEven = (12.58%)•WinPctOdd + (87.42%)•0.5000 (21.17)

 
Right Fielders:  n = 10,269, R2 = 0.0835
WinPctEven = (30.57%)•WinPctOdd + (69.43%)•0.5000 (31.70)

Component 6 player decisions to right field are divided 29.2% to pitchers and 70.8% to right fielders.

3.    Further Thoughts on Component 6 Player Game Points
On offense, Component 6 is, of course, allocated to batters. This is obvious and, really, there is no other reasonable alternative. It is, however, open to debate whether the ability to stretch singles into doubles and doubles into triples is properly viewed as “Batting” as opposed to “Baserunning”. In general, I classify Component 6 as “batting” to be consistent with most other general batting measures, both traditional (e.g., total bases, slugging percentage) and sabermetric (e.g., runs created, batting wins), which distinguish between extra-base hits and singles.

I explore the extent to which Component 6 (and 7) might be more reasonably viewed as a “baserunning” skill later in this article.

Overall, Component 6 makes up about 14.4% of total fielding decisions for outfielders. In contrast, most defensive metrics focus exclusively on an outfielder’s ability to convert balls-in-play to outs (e.g., UZR, PMR, +/-, TotalZone). Attention is also generally paid to outfielders’ throwing arms and their ability to throw out baserunners and/or limit baserunner advancement (Components 8 and 9 of my Player won-lost records) – e.g., John Walsh at The Hardball Times. The ability of an outfielder to hold batters to singles, preventing extra-base hits, on the other hand, is a bit of a forgotten, but nevertheless important, defensive skill.

Component 6 leaders can be found here.

Batting vs. Baserunning: The Impact of Speed on Player Won-Lost Records