Why 1 Year?The use of one-year ballpark factors is fairly widely disdained in sabermetric circles as being generally inappropriate because of the large degree of noise which is inherent in a single year’s worth of data. If one’s primary purpose is prospective, then I think that this is probably true. Even if one’s primary purpose is explanatory, if one is only considering a single-value run factor along the lines of “Ballpark A increases runs scored by 5%,” then the noise inherent in a single season of data might well be sufficiently large that one would be better off using a multi-year park factor.
(1) the league’s run-scoring environment may change,For many purposes, it may be desirable to try to remove some of these reasons, particularly numbers 2 and 4. For my purposes, however, ALL of these reasons are valid reasons and will legitimately affect the win probabilities.
(2) the efficiency of run-scoring may change (i.e., the expected runs (XR, RC, whatever) may not change, but the actual runs do, perhaps because teams hit better or worse than expected with runners in scoring position, for example),
(3) the conditions of the ballpark may change (wind, temperature, change in field dimensions), or
(4) hitters (or pitchers) may simply perform somewhat differently from one year to another.
Constructing a Ballpark-Specific Base-Out Transition Matrix
The first step in constructing a ballpark-specific base-out transition matrix is to construct a league-wide base-out transition matrix. Call this BOL.
For a particular ballpark, call it ballpark p, find all team combos that met in this ballpark as well as in at least one other ballpark. Note that teams that only played each other in one ballpark are not used in this calculation. In addition, inter-league games are not used here, since games played at American League ballparks use the designated hitter rule while games played at National League ballparks do not, which affects the relative run-scoring environments of the two leagues.*
For each team combination within the same league which met in ballpark p and at least one other ballpark, calculate a base-out transition matrix for all of their games against each other in ballpark p. For teams j and k, call this BOpjk. Construct a second base-out transition matrix, then, for all games between teams j and k that did not take place in ballpark p. Call this matrix (BO’)pjk.
Re-size each of these base-out transition matrices so that all of the BOpjk and (BO’)pjk, for all teams j and k, are the same size (by “size” I mean they should include the same number of events – i.e., plate appearances). That is, multiply each element of BOpjk by the ratio of the desired number of elements (call it E) to the raw number of elements in BOpjk. For example, suppose that BOpjk was a 3-by-3 matrix as shown below (in reality, of course, BOpjk will be a 24-by-28 matrix):
Sum all BOpjk for ballpark p (the ballpark of interest here). Call this BOp. This is, in effect, a home-game transition matrix for ballpark p.
Re-size this sum, BOp, to be the same size as each of the (BO’)pjk. Sum all of the (BO’)pjk and the re-sized BOp, and call this (BO’)p. The home-game transition matrix, BOp, is included here with the same weight as other ballparks. This creates, in effect, a league-wide transition matrix for the teams that played in ballpark p, (BO’)p.
Now, re-size BOp and (BO’)p so that they are both the same size as BOL. The initial estimate of the base-out transition matrix for ballpark p is then equal to the following:
Finally, after calculating values for BOp for every ballpark, all of the BOp matrices are summed, and re-sized, such that the size of the sum of the BOp matrices is equal to the size of BOL.
Let n equal the number of ballparks and let BOALL be the sum of the BOp. The final base-out transition matrix for ballpark p is then equal to the following:
SummaryIn words, what I do here is to construct a normalized base-out transition matrix for a ballpark and a normalized base-out transition matrix for all games played by the same teams at all of the ballparks at which they played (in effect, a ballpark-specific league-transition matrix). I then adjust the league base-out transition matrix by the difference between the ballpark-specific transition matrix and this latter transition matrix (i.e., the ballpark-specific league-transition matrix).
ExampleLet me clarify this explanation with an example. For simplicity, I will use U.S. Cellular Field in Chicago within the 2004 National League. In 2004, the Florida Marlins hosted the Montreal Expos in two games which were moved to U.S. Cellular Field in Chicago because of a hurricane in Florida. Hence, U.S. Cellular Field only hosted two games in the 2004 National League. The base-out transition matrix from these two games would be the BOpjk matrix described earlier.