I have recently added pages to identify the players most similar to a particular player. These pages can be accessed from player pages under the link labeled "Most Similar Players to" (e.g., Most Similar Players to Eddie Murray
The default for my similar-player pages is to show the 10 players with the most similar careers. I allow for several different options, however.
First, one can vary the number of players shown.
Second, one can compare Player won-lost records over a specific age range.
Third, one can choose to include pWins (which tie to team wins and hence incorporate context) or not. The default option is to include pWins (over positional average and replacement level) in the comparison (along with eWins).
Fourth, one can normalize season lengths (to 162 games) for all players and/or extrapolate missing player games before making comparisons. The default option is to not normalize season lengths or extrapolate missing games.
Finally, one can assign unique weights for the six factors that are used for comparison: Batting, Baserunning, Pitching, Fielding, eWins, and pWins. The default weights are one for each of the six factors. Note that if the "context" option is set to "n" (No), the weight on pWins will be zero, regardless of what is entered here.
All of these options can be selected on the "Most Similar Players"
page by filling in the appropriate boxes and clicking the "Go" button.
For example, the five players most similar to Tim Raines
from ages 21 - 27 (Raines's prime, 1981 - 1987), excluding context (i.e., only considering eWins), with seasons normalized to 162 games, are here
Identifying Most Similar Players
Measures Being Compared
There are four basic factors for which players can accumulate Player wins and losses: Batting, Baserunning, Pitching, and Fielding. There are two dimensions across which player similarity can be measured: quality and quantity.
To identify a player's "most similars", I look at up to seven breakdowns of Player won-lost records: batting, baserunning, starting pitching, relief pitching, fielding, and measures of overall player value in
; and look at two measures: total wins and wins over some benchmark.
For batting and baserunning, the benchmark I look at is average non-pitcher winning percentage. It is necessary to exclude pitchers in order to accurately compare players in DH leagues with players in non-DH leagues.
For pitching, the benchmark I look at is wins over positional average. I calculate separate positional averages for starting vs. relief pitching. I therefore include two pitching measures (relative to benchmark): starting pitching wins over positional average and relief pitching wins over positional average. For total wins, however, I use a single measure of total pitching wins.
For fielding, the benchmark I use is replacement level. I chose this to allow for comparisons across fielding positions that controls for the relative difficulty of different positions. That is, an average (0.500) fielding shortstop is not comparable to an average (0.500) fielding left fielder; but a below-average fielding shortstop may be more comparable to an above-average fielding third baseman, for example (see my article on Starlin Castro).
For each of these four separate factors, I used expected (context-neutral) wins.
In addition to the four factors discussed above, I also look at total wins. For overall Player won-lost record, I do not look at total wins, but instead look at wins over both positional average
as well as replacement level
. For all comparisons, I look at expected (context-neutral) wins
. If desired, the comparison can also include pWins
over positional average (pWOPA) and replacement level (pWORL), which tie to team wins and reflect the context in which they were earned.
Identifying Most Similar Players
For each of the (up to thirteen) variables identified above, I begin by calculating the totals for the player of interest over the age range of interest (e.g., Starlin Castro at ages 20 - 22
). I then normalize all of these figures by dividing by the standard deviation for these figures across all player-seasons for which I have calculated Player won-lost records. This puts everything on the same scale so that, for example, baserunning is weighted the same as batting, and wins, wins over average, and wins over replacement level are all given equal weight. These figures serve as the baseline numbers against which all other players are then compared.
To find the "most similar" players, then, the same figures are calculated for every other player for whom I have calculated Player won-lost records over the age range of interest. For a given player, then, for each measure, I calculate the difference between that player's value and the baseline value calculated above and square that difference. Squaring the difference has two effects: first, it treats a value slightly higher than the baseline value the same as a value slightly lower than the baseline value, and second, squaring the difference spreads out the scale, increasing the penalty for being very different - in this way, being a little bit different at everything will produce greater similarity than being identical at some things but very different at some other things.
For every player, then, a weighted sum of these squared differences is then calculated, using the selected weights as described above. Players are then sorted based on these sum-of-squared differences. The n players with the smallest sum-of-squared differences are then presented as the "Most Similar" players, where n is chosen as described above.
Example Sets of Most Similar Players
I'll conclude this article with a few example sets of most similar players.
Example 1: Top 10 Most Similar Players to Jim Rice, full career, all factors equally weighted, including pWins, seasons not normalized
Example 2: Top 10 Most Similar Players to Cal Ripken, full career, batting and baserunning only, seasons normalized
Example 3: Top 5 Most Similar Players to Kerry Wood, age 21, all factors equally weighted, including pWins, seasons not normalized
Example 4: Top 10 Most Similar Players to Jack Morris, full career, pitching, total eWins, total pWins, seasons normalized
Example 5: Top 5 Most Similar Players to Tim Raines, Sr., ages 21 - 30, all factors equally weighted, excluding pWins, seasons normalized
I hope you enjoy this new application of my Player won-lost records.
All articles are written so that they pull data directly from the most recent version of the Player won-lost database. Hence, any numbers cited within these articles should automatically incorporate the most recent update to Player won-lost records. In some cases, however, the accompanying text may have been written based on previous versions of Player won-lost records. I apologize if this results in non-sensical text in any cases.
List of Articles