Baseball Player Won-Loss Records
Home     List of Articles



Most Similar Players
Determining the Most Similar Players

I have recently added pages to identify the players most similar to a particular player. These pages can be accessed from player pages under the link labeled "Most Similar Players to" (e.g., Most Similar Players to Eddie Murray).

Comparison Options
The default for my similar-player pages is to show the 10 players with the most similar careers. I allow for several different options, however.

First, one can vary the number of players shown.

Second, one can compare Player won-lost records over a specific age range.

Third, one can choose to include pWins (which tie to team wins and hence incorporate context) or not. The default option is to include pWins (over positional average and replacement level) in the comparison (along with eWins).

Fourth, one can normalize season lengths (to 162 games) for all players and/or extrapolate missing player games before making comparisons. The default option is to not normalize season lengths or extrapolate missing games.

Finally, one can assign unique weights for the six factors that are used for comparison: Batting, Baserunning, Pitching, Fielding, eWins, and pWins. The default weights are one for each of the six factors. Note that if the "context" option is set to "n" (No), the weight on pWins will be zero, regardless of what is entered here.
All of these options can be selected on the "Most Similar Players" page by filling in the appropriate boxes and clicking the "Go" button.

For example, the five players most similar to Tim Raines from ages 21 - 27 (Raines's prime, 1981 - 1987), excluding context (i.e., only considering eWins), with seasons normalized to 162 games, are here.

Identifying Most Similar Players
Measures Being Compared
There are four basic factors for which players can accumulate Player wins and losses: Batting, Baserunning, Pitching, and Fielding. There are two dimensions across which player similarity can be measured: quality and quantity.

To identify a player's "most similars", I look at up to seven breakdowns of Player won-lost records: batting, baserunning, starting pitching, relief pitching, fielding, and measures of overall player value in and out of context; and look at two measures: total wins and wins over some benchmark.

For batting and baserunning, the benchmark I look at is average non-pitcher winning percentage. It is necessary to exclude pitchers in order to accurately compare players in DH leagues with players in non-DH leagues.

For pitching, the benchmark I look at is wins over positional average. I calculate separate positional averages for starting vs. relief pitching. I therefore include two pitching measures (relative to benchmark): starting pitching wins over positional average and relief pitching wins over positional average. For total wins, however, I use a single measure of total pitching wins.

For fielding, the benchmark I use is replacement level. I chose this to allow for comparisons across fielding positions that controls for the relative difficulty of different positions. That is, an average (0.500) fielding shortstop is not comparable to an average (0.500) fielding left fielder; but a below-average fielding shortstop may be more comparable to an above-average fielding third baseman, for example (see my article on Starlin Castro).

For each of these four separate factors, I used expected (context-neutral) wins.
In addition to the four factors discussed above, I also look at total wins. For overall Player won-lost record, I do not look at total wins, but instead look at wins over both positional average as well as replacement level. For all comparisons, I look at expected (context-neutral) wins. If desired, the comparison can also include pWins over positional average (pWOPA) and replacement level (pWORL), which tie to team wins and reflect the context in which they were earned.

Identifying Most Similar Players
For each of the (up to thirteen) variables identified above, I begin by calculating the totals for the player of interest over the age range of interest (e.g., Starlin Castro at ages 20 - 22). I then normalize all of these figures by dividing by the standard deviation for these figures across all player-seasons for which I have calculated Player won-lost records. This puts everything on the same scale so that, for example, baserunning is weighted the same as batting, and wins, wins over average, and wins over replacement level are all given equal weight. These figures serve as the baseline numbers against which all other players are then compared.

To find the "most similar" players, then, the same figures are calculated for every other player for whom I have calculated Player won-lost records over the age range of interest. For a given player, then, for each measure, I calculate the difference between that player's value and the baseline value calculated above and square that difference. Squaring the difference has two effects: first, it treats a value slightly higher than the baseline value the same as a value slightly lower than the baseline value, and second, squaring the difference spreads out the scale, increasing the penalty for being very different - in this way, being a little bit different at everything will produce greater similarity than being identical at some things but very different at some other things.

For every player, then, a weighted sum of these squared differences is then calculated, using the selected weights as described above. Players are then sorted based on these sum-of-squared differences. The n players with the smallest sum-of-squared differences are then presented as the "Most Similar" players, where n is chosen as described above.

Example Sets of Most Similar Players
I'll conclude this article with a few example sets of most similar players.

Example 1: Top 10 Most Similar Players to Jim Rice, full career, all factors equally weighted, including pWins, seasons not normalized

Wins over Baseline
Player Games pWins pLosses pWOPA pWORL Batting Baserunning Pitching Fielding
Jim Rice
2089
281.5248.611.3
33.7
15.8-0.20.02.7
Vladimir Guerrero
2147
293.9250.413.4
36.5
18.5-0.70.02.8
Ellis Burks
2000
243.2213.111.0
30.1
14.8-0.60.01.9
Bernie Williams
2076
259.7236.511.3
31.3
10.7-0.00.03.3
Fred Lynn
1967
245.4220.110.3
28.5
16.5-0.10.04.1
Enos Slaughter
2058
255.8218.211.1
31.3
11.6-0.40.03.9
Matt Holliday
1772
238.5206.210.3
26.6
15.00.20.02.2
George Foster
1975
265.3222.413.0
32.0
13.7-0.20.04.9
Moises Alou
1941
257.0226.86.9
26.0
11.9-0.40.02.7
Bobby Murcer
1908
245.3215.97.8
26.9
11.1-0.30.02.7
Jack Clark
1992
249.0206.711.5
30.2
20.9-0.60.02.3


Example 2: Top 10 Most Similar Players to Cal Ripken, full career, batting and baserunning only, seasons normalized

Wins over Baseline
Player Games pWins pLosses pWOPA pWORL Batting Baserunning Pitching Fielding
Cal Ripken
3001
380.3350.127.1
55.2
7.90.20.015.6
Adrian Beltre
2720
338.6299.217.7
41.9
9.70.40.09.6
Graig Nettles
2691
304.7269.014.9
37.8
9.5-0.20.06.5
Steve Garvey
2330
261.0224.86.6
25.6
5.90.30.02.7
Al Oliver
2367
275.7261.3-1.7
20.5
7.5-0.40.01.2
Mark Grace
2243
239.6212.40.2
17.8
10.2-0.0-0.02.9
Harold Baines
2822
278.8256.06.0
32.8
11.9-0.40.02.1
Torii Hunter
2371
302.7291.04.2
26.9
4.50.10.05.3
Luis 'Gonzo' Gonzalez
2589
317.0294.31.7
25.6
11.7-0.70.04.5
Rusty Staub
2944
325.6309.2-2.2
25.8
14.9-0.30.02.8
Tony Perez
2773
305.3253.614.9
38.1
15.50.30.02.8


Example 3: Top 5 Most Similar Players to Kerry Wood, age 21, all factors equally weighted, including pWins, seasons not normalized

Wins over Baseline
Player Games pWins pLosses pWOPA pWORL Batting Baserunning Pitching Fielding
Kerry Wood
26
13.410.42.2
3.3
-0.3-0.00.90.0
Dontrelle Willis
27
11.99.02.1
3.2
-0.2-0.00.70.0
Felix Hernandez
30
12.69.41.9
3.1
-0.0-0.00.80.0
Pete Donohue
27
13.29.02.4
3.3
-0.30.01.20.0
Madison Bumgarner
35
14.212.11.7
3.0
-0.30.01.2-0.0
Curt Simmons
34
15.012.51.8
3.1
-0.4-0.01.10.1


Example 4: Top 10 Most Similar Players to Jack Morris, full career, pitching, total eWins, total pWins, seasons normalized

Wins over Baseline
Player Games pWins pLosses pWOPA pWORL Batting Baserunning Pitching Fielding
Jack Morris
568
226.2211.311.0
32.2
-0.0-0.06.20.2
Jerry Reuss
630
225.5219.112.2
31.6
-5.3-0.26.40.6
Frank Tanana
639
245.2237.27.8
30.8
-0.4-0.06.60.8
Jamie Moyer
701
239.2236.58.1
33.0
-2.9-0.14.71.2
Kenny Rogers
763
196.4182.89.5
30.4
-0.40.05.31.4
Catfish Hunter
515
212.9196.614.6
32.9
-1.90.06.50.4
Mickey Lolich
592
227.3227.67.7
28.2
-4.7-0.14.70.3
Chuck Finley
524
195.0180.710.4
30.2
-0.5-0.07.9-0.4
Paul Derringer
534
206.6197.010.4
29.9
-6.9-0.29.30.6
Dennis Martinez
694
231.7217.613.1
34.8
-3.6-0.17.71.5
Jerry Koosman
612
233.6227.910.6
30.9
-6.5-0.110.10.6


Example 5: Top 5 Most Similar Players to Tim Raines, Sr., ages 21 - 30, all factors equally weighted, excluding pWins, seasons normalized

Wins over Baseline
Player Games eWins eLosses eWOPA eWORL Batting Baserunning Pitching Fielding
Tim Raines Sr.
1384
201.5166.710.8
24.7
8.75.10.02.3
Rickey Henderson
1383
211.2164.219.7
34.1
13.45.60.05.9
Bert Campaneris
1224
161.3153.69.2
22.4
-3.34.1-0.05.3
Cesar Cedeno
1261
178.9148.310.6
23.6
8.42.50.03.6
Bobby Bonds
1258
188.7152.912.3
26.4
12.62.60.02.3
Willie Wilson
1255
163.7152.93.1
15.2
-2.74.80.06.3


I hope you enjoy this new application of my Player won-lost records.

All articles are written so that they pull data directly from the most recent version of the Player won-lost database. Hence, any numbers cited within these articles should automatically incorporate the most recent update to Player won-lost records. In some cases, however, the accompanying text may have been written based on previous versions of Player won-lost records. I apologize if this results in non-sensical text in any cases.

Home     List of Articles