Sunday, March 3, 2013

Lefty/Righty splits: historical data may be inadvertently skewed.

Message sent today to the leaders of retrosheet and baseball-reference:

http://www.baseball-reference.com/leagues/split.cgi?t=b&lg=MLB&year=1950


Also in retrosheet.  Here is 1920:

http://www.retrosheet.org/boxesetc/1920/YS_1920.htm  Why are batting and pitching righty/lefty splits different?

The most detail starts in 1950.  Slightly less in about five seasons preceding, then the descent into never-never land: data reported by starting pitcher's handedness.  I understand that play-by-play data trails off around this time but there should be some indication of how much, if any, of the righty/lefty splits are from play-by-play.

Suggestion: since CG percent may be around 50% for seasons before 1945, how about having separate data for the complete games, which would be accurate for those games and probably much more representative of a full season?  Oh, and graphs would be great.

Thanks for considering and keep up the great work.
____________________________________________

 Click link to view sample data copied from retrosheet.org.



This graph was derived from the retrosheet data for seasons 1920, 1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000, 2010.  Sort of like the census.

Simply looking at batting average (BA) indicated that before 1950 righty batters had lower BA against lefty pitchers (gold line) than righty batters had against righty pitchers (blue line).  This is counter intuitive and probably incorrect.  But why?

- Curve balls weren't so good back then.
- Sliders were seldom, if ever, used.
- righty batters could bunt more effectively for hits against lefty pitchers who tend to fall toward third base.
- blah, blah, blah.

Since the data before 1950 lacks play-by-play detail the arbitrary decision was made to classify plate appearances (PA) as being against a pitcher according to whether the starting pitcher was righty or lefty.  If a lefty starts and is relieved by righties PA against those righties are considered as being against a lefty.

See this post:

Thursday, February 14, 2013
Percent Righty: Batters & Pitchers 1903-2012


It shows that most PA and most innings are by righties.

If a lefty is relieved he is likely to be relieved by a righty against whom a righty batter may be less successful than he would be against a lefty pitcher.  Hence the data would be inadvertently skewed to suggest that righty batters were more successful against lefty pitchers than they actually were.  Remember, this is for seasons before about 1945.  Play-by-play data data has been made available back through 1950.

Since accurate data is generally not available before 1950 I must restrict my current research to seasons after 1949.  Too bad.  I was really hoping to go back to 1903.

No comments: