Tuesday, December 25, 2012

Park Factor: how reliable?

Sunday, May 29, 2011
Mays v. Aaron OPS on road against the other teams 1954-1968 v. OPS+

The 15 seasons 1954-1968 cover both players peaks but ends before divisions were created in 1969 when unbalanced schedules were introduced ...

OPS+ home and road against all teams 1954-1968: Mays 3.77% higher

OPS on road against the other teams: Mays .88% higher

If my method is accurate, Mays and Aaron may have been closer than OPS+ would suggest.
_____________________________

I was reconsidering my method for comparing two batters in the same league.  I compare their stats on the road but not in each other's parks.  The stats are in the same parks and against the same pitching staffs.  Essentially, it eliminates the need for ball park factor, which is needed because, unlike football and basketball, baseball embraces non-uniform playing areas to the cheers of all but me.

Hank Aaron and Willie Mays are remarkably close for the years analysed.  My method makes a lot of sense to me.  OPS+ (On Base Percentage plus Slugging average adjusted for park differences)  makes sense to me generally and I use it a lot even though I really don't understand how the park factor is applied.

So why are the percentage differences for Mays and Aaron so different: 3.77% v. .88%?

In this case my method has substantial sample sizes of 4,400 plate appearances for each of the two batters.

baseball-reference.com has such a difficult time explaining park factor that it resorts to quoting the entire section from an old Total Baseball document with this:

THIS DOES NOT BELONG TO ME AND MAY BE REMOVED IF I AM ASKED TO DO SO BY A REPRESENTATIVE OF TOTALBASEBALL.COM.

baseball-reference.com then states:

lg_OPS values are for a league average player in that ballpark for single season data, and for a league average player with the same career path as the given player. This means that two players from the same league will have different values here if they played in different parks.

Historically, B-R has been using single-year park factors for recent years and 3-year park factors historically. I have changed that to now use 3-year factors by default for all years. Of course, the current season is only really a 2-year factor. The current year and last year. This can lead to some big changes in the numbers, from what had been on the site.
___________________________________

Then the fun begins with the explanation borrowed from Total Baseball.

It would seem like park factor would remain constant if all the parks remained the same.  Unfortunately, a casual glance at parks suggests that as a group they are in flux especially using the three year method.

It's disconcerting that OPS+ may be less reliable than some of us had thought.  Argh!

Also of interest:

Tuesday, November 13, 2012
Mays over Aaron.

2 comments:

Cliff Blau said...

This statement (from BB-Ref.com) makes no sense: "lg_OPS values are for a league average player in that ballpark for single season data, and for a league average player with the same career path as the given player."

Career path? Players in different parks have different lg_OPS because of their different park factors, not because of career pathes, whatever that is.

Kenneth Matinale said...

Yeah, that puzzled me too. I was waiting to read the Total Baseball stuff before asking Baseball Reference about its statement but I'll send an inquiry now.