Sunday, November 30, 2014

Are season sample sizes too small to be meaningful, especially for a unified theory?

A few years ago someone mentioned to me that a sample size less than 1,500 was not meaningful.  I've been thinking about that, especially for fielding stats, the third rail in a unified theory for one stat that allows comparison of all players, regardless of position and era.

War on WAR.  Friday, March 8, 2013

I’m a smart guy.  I understand baseball and really enjoy the numbers, even those on the uniforms.  I like playing with the numbers.  However, my math skills are largely confined to arithmetic.  I am not a statistician or operations research person.  Those are the really smart guys who have come up with the new stuff in recent years.  So declaring war on them or their concepts is useless for me.  I need to confine my concerns to common sense things.

There are two things about WAR for non-pitchers that should be examined:
1. WAR is a total, not an average
2. defensive WAR, more properly called fielding WAR, is suspect.  The more the really smart guys delve into fielding stats in general the more they see a need to develop more methods of measurement...

Then there’s Defensive Wins Above Replacement (dWAR).

Brian Kenny on his MLB network Clubhouse Confidential TV program once described multiple defensive metrics that come to opposing conclusions about Yankee center fielder (CF) Curtis Granderson.

Here’s an historical puzzle for me – Willie Mays dWAR, Plate Appearances (PA) and home park:
1954 2.1 641 NY Polo Grounds
1955 0.7 670 NY Polo Grounds
1956 1.3 651 NY Polo Grounds
1957 0.6 669 NY Polo Grounds
1958 1.7 685 SF Seals Stadium
1959 0.4 649 SF Seals Stadium

During his physical prime the pattern for Mays is up, down, up, down, up, down.  Why?  PA suggest that he was not injured in these seasons.

Starting in 1960 Candlestick Park was Willie’s home park.

1960-1966 (age 35) dWAR for Willie Mays is between 1.3 (1963) and 2.0 (1962).

Did the fielding of Willie Mays improve with age?  How likely is that?  Stuff like this make me suspicious of fielding stats.

Now I'm wondering if the Willie Mays puzzle isn't just small sample size.  Here are his chances (put outs (PO) + assists (A) + errors (E)) for center field:

1954 465
1955 428
1956 434
1957 438
1958 453
1959 362

For his career Mays was never in double digits for errors in a season.  His assists 1954-1959: 13, 22, 14, 14, 15, 6.  In 1959 Mays was down in games and innings.  PO were well over 400 each of those years except 1959: 351.

Fast forward to 2014, batting data top ten sorted by plate appearances (PA).

Josh Donaldson June 25, 2013
by NewJack984
via Wikimedia Commons
American Conference MVP Mike Trout was number 8 with 705 PA.  Let's see how many chances Trout had in center field: 390.  So Trout had 81% more PA than fielding chances.  In center Trout has some time to adjust to the mind numbing boredom of waiting for something to happen.  It's completely different than his facing a pitcher who could snuff out his life with any pitch.  At the plate Trout is ready.

At the opposite end of the readiness spectrum would be the third baseman who is closer to the batter than any fielder in fair territory other than the pitcher who is not expected to actually handle many, if any, chances.  Plus, unlike the first baseman, the player at third must also make a long accurate throw.  Let's look at Josh Donaldson (number 10 in PA), late of the Oakland As after being traded to Toronto for some unimaginable reason.  Donaldson must ward off the boredom of these tedious games and spring into action more quickly than any of his teammates.

Josh Donaldson had 482 chances at third base in 2014 for which he was awarded 2.7 dWAR.

Another third baseman: Evan Longoria (number 9 in PA): 396 chances, dWAR -.1.

Somewhere someone thinks that makes sense.  If a batter had 400 PA how seriously would we view his batting stats?

Catcher Russell Martin recently got a fat five year contract partly because he is good at deceiving umpires: pitch framing.  Just how many pitches did Martin frame successfully and how would we actually know?

Russell Martin ($82 million) and other catchers who don't play much.  Wednesday, November 19, 2014

Unfortunately for Toronto, they were so busy looking at Martin's ability to deceive umpires into thinking balls are strikes (pitch framing) that they did not notice that Martin had only 460 plate appearances (PA), embarrassingly short of the meager 502 PA needed to qualify for leading in an average stat like batting average.

A plate appearance is an intense event, usually involving multiple pitches, and allowing for much preparation and plenty of time to ponder between pitches.  Fielding, especially at third base, is a random occurrence, which suddenly happens, usually with long gaps between events.  It may require the player to come alive and perform with split second timing.  There's no real prep time other than getting set for each of about 150 pitches, most of which will not result in the ball being hit anywhere near the player at third.

So, what the heck?  We're pretending to measure fielding down to the nth degree and then massaging it with some alchemy so that its sparse numbers can sprout into something that is comparable to what results from a plate appearance or, even more bizarre, from a batter faced by a pitcher.

Unified theory, indeed.

Player acquisition rules: convoluted and confusing. And maybe counterproductive.

Two recent articles on

The Yankees Found Another Way To Outspend Every Other Team
by Kiley McDaniel - November 4, 2014

Yoan Moncada Is Affecting All of International Baseball
by Kiley McDaniel - November 19, 2014

Both wreak of confusion and convolution.  Who the heck understands the player acquisition rules?  It's a byproduct of an institution that is too old and too stuck in its ways.

I'd rather that the Major Baseball League (MBL) dump all these rules and let teams sign players as they will.

Saturday, November 29, 2014

90% of fielders can make 90% of plays at any position because most plays are routine.

If I restricted that to fielders at specific positions, most people would probably agree.  90% of shortstops would make 90% of plays at shortstop.  But what I'm say is that, with two caveats, 90% of fielders can make 90% of plays even if their position is assigned randomly.  The point is that most plays are routine.

The two caveats:
- exclude the positions of pitcher and catcher
- allow for handedness, i.e., do not put lefties at second, third or short.

Then pull names out of a hat and send them out there to play.  I have no data to support this.  It is basically a philosophical assertion.  It is intended to emphasise the idea that fielding is overvalued.  Many, maybe most, plays in a Major Baseball League (MBL) game can be made by competent amateurs.  For MBL players, that should increase to a very high percentage, maybe as high as 90%.

OK, now stop being anal and quibbling about the 90% thing. Consider the Willie Mays factor.  I choose Mays because he is the best example I have for a player being both a really great hitter and a really great fielder.  Substitute your own but follow the point.

In a nine inning game Willie Mays will probably bat four times for the Giants.  He will play center field, which will make it difficult for opposing batters to avoid hitting a ball to him, although that is possible.  In all four of his plate appearances (PA) Mays needs all his skill.  But for how many of his fielding chances does Mays need all his skill?  In many, if not most games, the fielding skill of Willie Mays is not needed.  Any competent MBL center fielder can make the plays.  Probably any competent MBL corner outfielder can make the plays.  In at least some games, any competent MBL player of any position can make the plays.  And in a few games, any competent amatuer can make the plays.

But at the plate, the Giants need Willie Mays in every PA of every game.  Every Willie Mays PA is a potential home run.  But only some of his fielding chances can really save a run and maybe none in a given game.

That's why I think fielding is overvalued.

Fielding Independent Pitching (FIP) is silly.

Late to the party?  Maybe.  I was minding my own business, reading The New York Times on my tablet, when I spotted this:

How Baseball Statistics Can Help Explain the Economy
NOV. 25, 2014 by Neil Irwin  The New York Times

Peripheral statistics are the more obscure indicators that correct for these kinds of quirks and aim to give you a richer and more truthful evaluation of a pitcher, taking into account factors like ballpark dimensions. (Baseball Prospectus combines some of those measures into an index it calls Peripheral Earned Run Average, or PERA.)

I foolishly clicked the link for Baseball Prospectus and yet another link to Glossary: Baseball Prospectus Exclusive:


Fielding Independent Pitching converts a pitcher's three true outcomes into an earned run average-like number. The formula is (13*HR+3*BB-2*K)/IP, plus a constant (usually around 3.2) to put it on the same scale as earned run average.

First the small stuff:

- Replace Innings Pitched (IP) with batters retired by the pitcher, i.e., at bats (AB) minus hits.  I once again call for the same stats to be used for both batters and pitchers.  For instance, batting average, not hits per nine innings.  Earned Run Average (ERA) is obviously a stat that is specific to pitchers but much and maybe most of the other stuff correlates directly to batter stats.  Use them for pitchers, too.

- Some home runs are inside the park and could be played by fielders.  Since the number is small in recent years it probably doesn't make much difference, like including hit by pitch (HBP) along with BB as some versions of FIP do.  In 1909 Ty Cobb had the triple crown, leading in home runs with 9, all inside the park.

Measuring fielding, especially on the team level, is a good thing.  Applying it to individual pitchers in this way seems silly.  More home runs allowed seems to improve a pitcher's FIP measurement.  Good, you say.  That's the point: home runs are out of the pitcher's control.  We're measuring the pitcher's fielding support.  Say what?  How about don't feed the batter a fat one down the middle?  But that also applies to rocket shots that do not go out of the park and have about as much chance of being caught as most home runs.  Example: 200 foot line drive landing on the foul line.  It's possible that a fielder could be placed there and catch the ball.  It's also possible to place a fielder at the outfield wall and that the fielder would catch a potential home run.

More to the point are fundamental attributes of home runs:

From the original document: Radical Baseball June 9, 2006 (posted February 20, 2008)

2. The Real scandal of the last 16 years: propagation of non-uniform playing areas.

Fenway Perk satellite view, March 9, 2007
by Betp [Public domain], from Wikimedia Commons
It’s not steroids. It’s the fences. Baseball is the only American team sport in which the playing area is not uniform. Imagine a National Basketball Association (NBA) game played at Madison Square Garden. The three-point line is drawn irregularly. A player can get three points by sinking a basket from behind that line but in some places the line is 25 feet from the basket and in some places it is 15 feet away. How about a National Football League (NFL) game played on a field where the sideline is wider in some places than in others? Or the end zone is shaped oddly? Silly, right? So how come baseball gets away with it? Baseball does not merely get away with it. It’s considered cute, charming, traditional, blah, blah, blah. Here’s the real travesty: the non-uniform playing area perverts baseball’s most cherished event: the home run. It undermines the very integrity of the game that is supposedly threatened by steroid use.

Some thinking fans categorize baseball events into random and non-random. To them a home run (one hit over a barrier on a fly, not an inside the park home run) is clearly a non-random event because a fielder has no chance to catch it. A home run is a random event. Here is why. Is a 180 foot fly ball a random event? Clearly, it is random: it may be caught or it may not. But how about a fly ball hit 380 feet? The non-random advocates would be forced to ask in what direction and in what park the fly ball was hit. In other words they can only certify its randomness by waiting until it lands. The same could be done for the 180 foot fly. Like the three point shot in basketball (OK, the line is closer at the sideline to fit in bounds but that’s basketball’s problem) the only thing that should matter is how far did the fly ball go. With uniform playing areas that alone would tell us if the fly ball is a home run or not.

However, in some cases a fly ball can travel 50% further than a home run and be an out. The distances to the barriers are not just different from park to park but they are different in some parts of the outfield in the same park. A home run should reward the batter for hitting a fly ball over a barrier and for that to be fair and meaningful the barrier should be the same distance and the same height in every direction in every park. That’s pretty basic stuff. How about 375 feet to a ten foot high barrier? If you were starting baseball today and making the rules, that’s clearly how you would do it. But baseball evolved and that’s how it has always been. So? About 13 new parks have been implemented in the last 16 years (with two more coming in New York) and baseball had a rare opportunity to correct this historic inequity. Instead it allowed and even encouraged teams to replace parks that were in many cases at least symmetrical with parks that were irregular in the shape of the playing areas. Irregularities were often unavoidable in old parks because of streets and other things that required some imagination in building a park. In recent years there were no such impositions, just a warped intent to make new parks that looked old fashioned. See the Rangers park in Texas, built in an open space.

Yes, this should also apply to foul territory. Here’s something no one has considered: Fenway Park helps strike outs. Because the area in foul territory is so small it is very difficult to foul out. Also, because the fences are close in the outfield, that also helps. Let’s say Roger Clemens is going for the single game strike out record and he’s pitching in Fenway Park, a hitter’s park. Every out that is not a strike out hurts this effort. Every batted ball that results in an out also hurts. A foul pop up that drifts into the stands helps. A ball hit off the wall in left also helps.

The single season and lifetime home run records are the most important sports records in America. Yet, they are subject to the greatest randomness of any records in team sports. Forget the steroids. Fix the fences.

Thursday, November 27, 2014

Hanley Ramirez v. Pablo Sandoval.

Hanley Ramirez and  Pablo Sandoval just signed to play for the Boston Red Sox.

Hanley Ramirez
Positions: Shortstop and Third Baseman
Bats: Right, Throws: Right
Height: 6' 2", Weight: 225 lb.
Born: December 23, 1983 in Samana, Samana, Dominican Republic
games: SS 1,077, 3B 98, DH 14

Pablo Sandoval
Positions: Third Baseman and First Baseman
Bats: Both, Throws: Right
Height: 5' 11", Weight: 245 lb.
Born: August 11, 1986 in Puerto Cabello, Carabobo, Venezuela
games: 3B 771, 1B 63, C 14, DH 9

Sandoval is three inches shorter but twenty pounds heavier.  Ramirez is almost three years older.

Ramirez is a shortstop who has played third base and will play left field for Boston.  Sandoval is a third baseman, with some games at first and even catching, who will play third base for Boston.  Sandoval seems ill suited to play any other fielding position in the future.

Ramirez bats right handed.  Sandoval is a switch hitter who performs better batting lefty.  Boston needs lefty batters or at least batters who hit righty pitchers.

vs RHP11603949351962610482101814649421160360662.298.368.492.860173264409213037.33097
vs LHP532132311581963559312451605023146213.307.388.524.9136071912252013.343109
vs RHP8052570232930370714715883641110197328.304.357.493.850114864130314226.318109
vs LHP4309638869523945418980262136.270.317.391.708346384110814.29876
Boston needs batters who can hit right handed pitchers but Ramirez hits righties better than Sandoval: OPS .860 to .850.  And Ramirez is much better against lefty pitchers: .913 to .708.  Ramirez has much more power.

Their new contracts with Boston:

201531Boston Red Sox$19,750,0009.014
201632Boston Red Sox$22,750,000
201733Boston Red Sox$22,750,000
201834Boston Red Sox$22,750,000
201935Boston Red Sox*$22,000,000$22M Vesting Option
201528Boston Red Sox$17,600,0006.047
201629Boston Red Sox$17,600,000
201730Boston Red Sox$17,600,000
201831Boston Red Sox$18,600,000
201932Boston Red Sox$18,600,000
202033Boston Red Sox*$17,000,000$17M Team Option, $5M Buyout
Sandoval is a fatso who may not age well.  Ramirez is more versatile and a much better hitter.  So why is everybody OK with Sandoval but skeptical about Ramirez?  As a Yankee fan, I would have liked the Yankees to have signed Ramirez but I had no interest in Sandoval.  Yeah, I know, Sandoval is a good fielder and Ramirez is not but Ramirez is a good base runner and can you imagine Sandoval playing the outfield?