Wednesday, December 5, 2012

Steroids & Home Runs: comments on previous post.

I am neither a physicist nor a statistician.  My comments deal only with basic baseball data.

Professor Roger G. Tobin wrote the document mentioned in my previous post.  He seems like a really smart guy and I don't want him getting mad at me but some of what he wrote does not make sense to me.  Here is a quote:

For the pre-1980 sluggers, home runs generally represented 5–10% of the balls put in play. Only Babe Ruth regularly surpassed 10% and even he never reached 15%. The best present-day sluggers are much more prolific. Mark McGwire’s record is particularly remarkable, with seven seasons at or above 15%. (McGwire has acknowledged using the legal anabolic steroid androstendione.)

This seems like comparing any average among different eras.  I think it should be considered as a percentage difference from the league average.  For instance in 1930 Bill Terry led the National League (NL) in batting average (BA) with .401; NL BA: .303.  In 1968 Carl Yastrzemski led the American League (AL) in BA with .301; AL BA: .230.  It seems silly to simply compare .401 to .301.  We should compare the percent above the league BA.

Terry (401-303)/303 = 32%
Yastrzemski (301-230)/230 = 31%

The same thing should be done with whatever stat is used to measure home run (HR) hitting.

Professor Tobin uses a percent of batted balls that are home runs.  There are some small problems:
1. Some HR are inside the park (IPHR).
2. Rules changed after 1931:
    - fair/foul had been judged on where the ball landed; changed to when it passed the foul pole;
    - balls that bounced into the stands were HR; changed to doubles.
Supposedly Ruth never hit a bouncing HR but he may have lost many HR because of the fair/foul rule.
3. Sacrifice Flies (SF) are batted balls but not counted by Professor Tobin.  SF became an official stat in 1954.

Babe Ruth hit 10 IPHR, one in 1927: Friday, July 8, 1927 Navin Field Detroit, game two; number 27 of 60.
Roger Maris hit three IPHR, none in 1961 when he hit 61 HR.
Mark McGwire zero IPHR.
Sammy Sosa hit two IPHR, including number 63 of 64 in 2001.
Barry Bonds hit three IPHR, none in 2001 when he hit 73 HR.

SF for the 60 HR seasons other than Ruth:
Maris 1961 61 HR, 7 SF
McGwire 1998 70 HR, 4 SF
McGwire 1999 65 HR, 5 SF
Sosa 1998 66 HR, 5 SF
Sosa 1999 63 HR, 6 SF
Sosa 2001 64 HR (one IPHR), 12 SF (second most: 8 the previous season when Sosa hit 50 HR)
Bonds 2001 73 HR, 2 SF

For Sosa's 2001 season should we subtract that one IPHR and add the twelve SF?  Let's see if there's much difference.
64/(577-153) = 15%
(64-1)/(577-153+12) = 14.4%

Another issue I have with Tobin is his looking at the top five individual HR totals per season; it's unclear whether he's using the top 5 for all MLB or top 5 per league.  Since there are 87% more teams than before 1961 that does not make sense.  It makes more sense to take the top number equal to the number of teams: 16 before 1961, 18 in 1961, 20 in 1962, etc.

Tobin also looks at the number of players with at least 45 HR.  This has the problem just mentioned plus the fact that starting in 1961 in the AL and 1962 NL, there are eight more games, 162 to 154: 5.2%.  5.2% of 45 HR = 2.3 HR.  For seasons before 1961/1962 he should be looking for 43 HR.  Plus, now Tobin is looking at totals, not rates per: plate appearances, at bats, batted balls, whatever.

Those are my concerns about Professor Tobin's methodology with baseball data.  As he delves into the physics, I back away.  Here is his basic conclusion:

Specifically, a 10% increase in muscle mass can increase the fraction of balls put in play that result in home runs by 50% or more. This increase is comparable to the differences in home run rate between the most productive sluggers of the “steroid era” and those of earlier generations. These results certainly do not prove that recent performances are tainted, but they suggest that some suspicion is reasonable.

As I wrote in my previous post:
Just because a batter has a huge increase does not mean it's because of steroids.  For instance Carl Yastrzemski hit 44 HR in his triple crown season, 1967.  His previous HR: 11, 19, 14, 15, 20, 16.  Yaz improved more than 100%.

No comments: