In his book, The Politics of Glory
(renamed Whatever Happened to the Hall of Fame
for the paperback edition), Bill James
introduced Similarity Scores
as a way of identifying players who had similar careers. Baseball-Reference shows a player's 10 most similar players for his career
and by age
In the case of Todd Walker, Baseball-Reference identifies Carlos Guillen
as the player whose career statistics are most similar to Todd Walker's.
Similarity scores are a fun little tool to try to create historical comps and maybe envision how active players' careers might unfold. There are two main drawbacks, however, to similarity scores, as created by James and implemented by Baseball-Reference. First, they rely on raw statistics and are not adjusted for context. Hence, players in higher-offense environments will tend to get comped to better players from lower-offense environments and vice-versa. Second, similarity scores are based primarily on offensive statistics. There is a positional adjustment, but it simply seeks to compare players to other players at the same position, with no attempt to compare players defensive abilities (the positional adjustment is also a linear scale which doesn't always handle multi-position players well). The result is that players may differ significantly in overall value if one is a very good defender but the other is a very poor defender.
Todd Walker is an excellent example of both of these problems. According to Baseball-Reference.com, Todd Walker earned 8.3 wins above replacement level (WAR
) in his career. His 10 most similar players
, however earned between 13.4 and 38.6 WAR with 8 of the 10 earning more than 20 WAR. That's not really very similar.
For his career, Todd Walker batted .289 with 107 home runs and 66 stolen bases in 4,554 career at bats playing mostly second base with some 3B and 1B mixed in. For his career, Carlos Guillen batted .285 with 124 home runs and 74 stolen bases is 4,673 career at bats playing mostly shortstop with some 3B, 2B, and 1B thrown into the mix. On the surface, you can certainly see how Guillen ends up as Walker's most-similar player.
Their careers overlapped quite a bit, but even with that, the run-scoring environment was quite different for these two players. Walker's career pre-dates Guillen's by two years, 1996
, both of which saw very high run scoring. Guillen's career extended four years beyond Walkers - 2008
, and 2011
- all of which were generally lower-scoring than the years immediately preceding them. In addition, Guillen spent his career in neutral or pitching parks in Seattle
, while Walker played several years in very strong hitting parks, e.g., Colorado
, and Chicago
Put that all together, and Baseball-Reference.com says that the average ballpark-adjusted batting line over Todd Walker's career was .276/.347/.443
. For Guillen, on the other hand, the corresponding numbers were .267/.334/.424
. The result is that I calculate Todd Walker's
) batting Player won-lost record of 90.0 - 91.7 (0.495) whereas I calculate Carlos Guillen's
batting Player won-lost record at 98.8 - 93.3 (0.514).
In theory, one could use Player won-lost records to build a set of similarity scores that would do a better job of identifying players of comparable value. I decided to play around a bit with this to try to identify players who were truly similar to Todd Walker
There are three basic "win" measures I calculate: wins, wins over positional average
(WOPA), and wins over replacement level
(WORL); two basic types of "wins" that I calculate: pWins, which tie to team wins
, and eWins, which control for context
; four general factors in which players accumulate Player won-lost records: batting
, and fielding
; and nine components
of Player won-lost records.
Let's start simple: pWins, pWOPA, pWORL, eWins, eWOPA, eWORL. That's six numbers. Take the difference between Walker's career total for each of these variables and the player being compared, then square the difference. Squaring the difference does two things: (a) a number a little bit higher than Walker's gets treated the same as a number the same amount lower than Walker's, and (b) differences get magnified the further away they get. Sum up the six numbers. The smallest number is the closest comp to Todd Walker.
Doing that, here are the 10 top comps to Todd Walker based on overall Player won-lost records.
Not bad for a first pass. Walker's top 10 comps are perhaps light on second basemen (Hairston
are the only two players who saw significant time there). On the other hand, it's pretty easy to see the similarities here.
Let's compare Walker and his closest comp, Rajai Davis
. Jacoby was almost exclusively a third baseman, while Walker was primarily a second baseman. Outside of that difference, however, they really are quite a good match as the table below shows.
note: Net offensive wins are relative to average offensive performance of non-pitchers. Jacoby spent all but 15 games of his career in DH leagues while Walker played 4+ seasons in the National League.
Even though this match was made based only on aggregate Player won-lost records, Walker and Jacoby turn out to be pretty good matches at the factor level as well: average hitters, below-average baserunners, below-average fielders. Not a perfect match by any means. And we could probably get a closer match to Walker by explicitly considering batting, baserunning, and fielding separately. But pretty good for a first pass.
But what about traditional statistics? How close a match are Walker and Jacoby in more traditional stats. In raw stats, they're not a terribly close match: Jacoby batted .270/.334/.405 with 120 HRs and 16 SB in 5,027 plate appearances for his career vs. Walker's .289/.348/.435 with 107 HRs and 66 SB in 5,055 PAs. But how much of that is context?
Baseball-Reference.com has a really cool feature on their player batting pages that show "Neutralized Batting" lines - estimates of what a player's traditional statistics would have looked like in an average run-scoring environment. Baseball-Reference estimates that Todd Walker would have batted .277/.335/.417 with 101 HRs in 4,957 PAs
in a neutral environment. They estimate that Brook Jacoby would have batted .275/.339/.411 with 121 HRs in 5,080 PAs