Wednesday, December 19, 2012

Retrosheet: keeper of the flame or flaming out?

This is my first post in a week, since the home run (HR) proficiency post on seasons with at least 35 HR.  I think it is an important step towards understanding the relative HR achievements since the advent of modern HR hitting in 1920.  This is especially true with everyone babbling about the Hall of Fame voting on Barry Bonds, the major concern being his late career HR production after he started using steroids.  Rather than sitting like a lump and mumbling about HR totals, both season and career, I put them into some context, primarily a rate per at bat (AB) and compared to the league average minus the player's stats.

Hank Aaron's apparently comparable performance at about the same age (Aaron was actually 160 days older) to Barry Bonds is compromised by Aaron's home park advantage, extreme even for his late career boost playing in Atlanta.  I know that a park HR factor is something needed to make my HR proficiency method better.  But how?

baseball-reference.com does not seem to have the data.  Yesterday some friends suggested that I try good old trusty, rusty retrosheet.org, the granddaddy of historical baseball.

This is where I should provide the requisite blah, blah, blah accolades for a noble job well done, etc.  But I'm so appalled at the irresponsible nature of retrosheet.org that I'll go right to my concerns.

DOS.  Disk Operating System.  Retrosheet tools for working with its raw data are ancient PC programs written in the pre-Windows era, likely before 1990.  This tracks with my general theory that baseball is on its death bed, meaningful mostly to old people like me and those running Retrosheet, like founder David Smith who supposedly keeps original documents in his basement and/or garage rather than in the National Archive, Library of Congress, Iron Mountain, whatever.

I try to avoid doing my own database work.  I drag what I need/want from baseball-reference.com whenever possible and avoid the lousy interface of retrosheet.org, which I do not think has been improved since day one.  However, retrosheet.org does have some stuff that an individual cannot find elsewhere, at least not without paying for some esoteric service to access the raw data.

I use the annual data from the Lahman database and massage it in Microsoft Access, a single user Windows based database management system (DBMS), which works quite well for my purposes.  That program is the only Windows program that I still need.  I do everything else on the web with Google Drive and Google Docs.  All my spreadsheet work is done online.  I have not used Microsoft Office programs other than Access for several years.  Were it not for my occasional use of Access, I would have no need for Windows and I may get a Chromebook when the new touch screen version becomes available soon.

DOS?  Retrosheet tools are DOS programs?  Who the heck under the age of 60 is going to use them?  Rather than making baseball data available to future generations, Retrosheet officials seem intent on taking it to their graves, burying it like some ancient pharaoh entombed his treasures and supplicants in his pyramid.

No comments: