can be solved by
Ordinary Least Squares (OLS), which is one of the most basic statistical regression procedures out there. There are, however, two additional complications associated with estimating Persistence Equations.
The first issue is that, in order to ensure that the estimated value b is not biased, the persistence equation should be fully specified. That is, if there are other variables that can be expected to affect (Factor A)
Even, these variables should be included on the right-hand side of the persistence equation along with (Factor A)
Odd. In general, this is not a big deal for most of the Persistence Equations that I estimate here, but it can be an issue in general regression analysis and is always worth keeping in mind.
The second issue is much more of an issue with the Persistence Equations that I estimate. The validity of OLS as an estimation technique is dependent on several assumptions about the distribution of the residual, or error, term in the persistence equation
*. One of these assumptions is that the variance of the error term is constant across all observations. That is, for example, OLS is only valid if the unexplained variation in player winning percentage is equal for all players. In this case, however, not only do we not want to assume this, but we actually know that it's wrong. Unexplained variation declines as the number of player games increases. Fortunately, there is a very easy way to adjust for this. Instead of OLS, I use
Weighted Least Squares (WLS). This weights each observation by the number of player games over which the Factor has been compiled
**, squared
***. In this way, the results for players with more games played are weighted more heavily than players with fewer games.
* To be technically correct, the persistence equation should be written as follows:
(Factor A)
Even = a + b*(Factor A)
Odd + e
where e is the "error" or "residual" term that measures unexplained variation in (Factor A)
Even. The appropriateness of OLS is then dependent on a set of assumptions regarding the distribution of e.
** The number of games is defined as the harmonic mean of the games over which (Factor A)Odd and (Factor A)Even are compiled.
***The decision to square the number of games in the weighting matrix was determined by empirical experimentation, which considered several alternative weighting schemes, based on the number of games (total games, the log of games, games squared, et al.).
Persistence Equations form the basis for dividing
shared Player Game Points between batters and baserunners as well as between pitchers and fielders for several components. I also calculate and discuss specific Persistence Equations for
Inter-Game Win Adjustments as a measure of
"clutch".
Article last updated: July 15, 2019
All articles are written so that they pull data directly from the most recent version of the Player won-lost database. Hence, any numbers cited within these articles should automatically incorporate the most recent update to Player won-lost records. In some cases, however, the accompanying text may have been written based on previous versions of Player won-lost records. I apologize if this results in non-sensical text in any cases.
Home
List of Articles