About Me

My photo

Nice guy.  Have some blogs.  Do baseball research.

Monday, February 10, 2014

baseball-reference.com addresses Lefty/Righty splits confusion.

About a year ago I wasted a lot of time on this.

retrosheet.org ...

vs RHP       - Against Right-Handed Pitchers
vs LHP       - Against Left-Handed Pitchers
vs RHS*      - Against Right-Handed Starters
vs LHS*      - Against Left-Handed Starters ...

* - The RHS and LHS categories will be used when we don't have play-by-play data for a game and not all the pitchers on the opposing team in the game threw from the same side. In those cases, we use RHS if the starter was right-handed and LHS if the starter was left-handed.

Using the retrosheet splits I derived Hornby's splits for batting average (BA) against all RHP/LHP as .360/.355.

Rogers Hornsby splits at baseball-reference.com are only v. RHS and LHS: .362/.351.

That's a five point difference on retrosheet and an eleven point difference on baseball-reference.  Plus, total at bats (AB) are 8,172 and 8,115 respectively.

Neither seems satisfactory.

retrosheet has season data in the Year Split Page:

R vs. R      - Right-Handed Hitters Against Right-Handed Pitchers
R vs. L      - Right-Handed Hitters Against Left-Handed Pitchers
L vs. R      - Left-Handed Hitters Against Right-Handed Pitchers
L vs. L      - Left-Handed Hitters Against Left-Handed Pitchers

If we don't have play-by-play data for a game and not all the pitchers on the opposing team in the game threw from the same side, these plate appearances are not included in the "R vs. R", "R vs. L", "L vs. R" and "L vs. L" categories.

Message sent today to the leaders of retrosheet and baseball-reference

Also in retrosheet.  Here is 1920:

http://www.retrosheet.org/boxesetc/1920/YS_1920.htm  Why are batting and pitching righty/lefty splits different?

The most detail starts in 1950.  Slightly less in about five seasons preceding, then the descent into never-never land: data reported by starting pitcher's handedness.  I understand that play-by-play data trails off around this time but there should be some indication of how much, if any, of the righty/lefty splits are from play-by-play.

Suggestion: since CG percent may be around 50% for seasons before 1945, how about having separate data for the complete games, which would be accurate for those games and probably much more representative of a full season?  Oh, and graphs would be great.

Thanks for considering and keep up the great work.

I just noticed this today as the lead item on the baseball-reference.com home page under Sports Reference Blog:

the "vs LH/RH Starter" split adds up all stats accumulated in games where the opposing starter was of a certain handedness, INCLUDING STATS ACCUMULATED LATER IN THE GAME WHEN THE STARTER IS PULLED, REGARDLESS OF THE RELIEF PITCHER(S)' HANDEDNESS.
Despite bolding, italicizing, and going all-caps, I still don't think I emphasized that enough. I realize the description of the split seems like it's talking only about stats accumulated against the starters, but it's really just counting up all stats in games where the opposing starter threw a certain way -- a BIG difference. If you want to know about performance against just starters of a given handedness... well, that's a double split, so we can't answer that right now. But we do hope to add the capacity for double splits in the future.
That helps but it needs to be flagged everywhere that the misleading data appears.

No comments: