Wednesday, November 25, 2009

Run Estimators: Avg, OBP, SLG, OPS vs. OOPS

The fits below were obtained by fitting total runs for the 14 AL teams for the 2009 season.
2009 League averages:
      AB   Runs   BA    OBP   SLG   OPS  OOPS  R1_est  OOPS2  R2_est
AL   5569   781  .266  .335  .428  .763  1.098   776   1.206    779
NL   5493   718  .259  .330  .409  .739  1.069   734   1.143    738

OOPS == 2*OBP + SLG;   OOPS2== OOPS^2
R1_est = -910 + 1350 * (2.45 * OBP + SLG)
R2_est = 646 * OOPS^2
OOPS2 has the advantage of zero offset (Runs=0 at x=0) with only 1 parameter.  The fractional change in OOPS2 is equal to the fractional change in predicted runs. That is not true for the common run estimators, which predict negative runs for very low values of the independent variables.
One could define OOPS2_BA as 0.266 * OOPS^2/1.206 for the AL. Or, OOPS2+ = 100* OOPS^2/1.206.
2009 stats:
           AVG    OBP    SLG    OPS   OOPS   OOPS2   OOPS2+   OOPS2_BA
Guerrero  .295   .334   .460   .794   1.128  1.272    106       .281
Dunn      .267   .398   .529   .928   1.325  1.756    146(AL)   .387
Pujols    .327   .443   .658  1.101   1.544  2.384    198       .526
Mauer     .365   .444   .587  1.031   1.475  2.176    180       .480
Youkilis  .306   .413   .548   .961   1.374  1.888    157       .416
Teixeira  .292   .383   .565   .948   1.331  1.772    147       .391
A-Rod     .286   .402   .532   .933   1.336  1.785    148       .394
Swisher   .249   .371   .498   .869   1.240  1.538    127       .339
Abreu     .293   .390   .435   .825   1.215  1.476    122       .326
Damon     .282   .365   .489   .854   1.219  1.486    123       .328
Matsui    .274   .367   .509   .876   1.243  1.545    128       .341
Jeter     .334   .406   .465   .871   1.277  1.631    135       .360
V_Mart    .303   .381   .480   .861   1.242  1.543    128       .340
Posada    .285   .363   .522   .885   1.248  1.558    129       .344
J Molina  .217   .292   .268   .560   0.852  0.728     60       .160
Cervelli  .298   .309   .372   .682   0.990  0.980     81       .216
Varitek   .209   .313   .390   .703   1.016  1.032     86       .228
Melky     .274   .336   .416   .752   1.088  1.184     98       .261
Gardner   .270   .345   .379   .724   1.069  1.143     95       .252
Granderson.249   .327   .453   .780   1.107  1.225    102       .270
Cameron(NL).250  .342   .452   .795   1.136  1.290    107(AL)   .285


==================================================
KaleidaGraph Results:

KG:  RUNS_OPS.qpc:

y = a + b * x  
       Value   Error
a      -811.8  160.7
b      2087.2  210.33
Chisq  8089.1  NA
R      0.9441  NA

RUNS = -810 + 2090 * OPS
CHI2 = 8090  ==>  sig = 25 runs;  R = 0.944

Note that CHI2 in both KG and ProFit basically assume sig=1;
i.e., Chi2 = SUM [ (y-y_fit)^2 ].  
"Chi2" = sum[ (y-y_fit)^2 ]
VAR = "Chi2"  /  (N-1)
sig = sqrt(VAR) = sqrt(  "Chi2" / (N-1) ) = sqrt( "Chi2"/13 )

SS_err = "Chi2"
SS_tot = SUM [ (y-y_avg)^2 ]
SS_reg = SUM [ (y_fit-y_avg)^2 ]
R^2 = 1 - SS_err/SS_tot  = 1 - "Chi2"/SS_tot
standard error in R is sqrt[ (1-R^2) / (N-2) ]

For RUNS_OPS.qpc:    R^2 = 0.8914 = 1 - 8089/SS_tot, or SS_tot = 74,460
(check:  KG gives Variance=5728;  5728 x 13 = 74464)
for this data set, R = SQRT( 1 - "Chi2"/74460 )

From Kaleidagraph fits for AL 2009:
Stat      R  Chisq   sig
AVG    0.840 21914   41.1
SLG    0.867 18497   37.7
OBP    0.915 12078   30.5
OPS    0.944  8089   24.9
OOPS   0.953  6836   22.9
OOPS2  0.948  7486   24.0
wOBA   0.953  6834   22.9
ProFit:
eq(1)  0.954  6740   22.8
eq(2)  0.959  5930   21.4

===================================================
ProFit Results:

function Fred(OBP, SLG,  b, c,d : real);
begin;
y := b*(c*OBP + SLG) + d;
end;

iterations: 19
------------------------
Chi squared        = 6743.2299

Parameters:        Standard deviations:
b   = 1353.3799    ∆b = 458.7910
c   =    2.4512    ∆c =   1.3064
d   = -910.4042    ∆d = 166.6370

RUNS = -910 + 1350 * (2.45 * OBP + SLG)        eq(1)
CHI2 = 6740  ==>  sig = 23 runs;  R = 0.954
===================================================
=================================================
function Fred(BB_PA, S_PA, D_PA, T_PA, HR_PA,  b,c,d,e,f : real);
begin;
y := b + c * (d*BB_PA + S_PA + e*D_PA + 1.6*T_PA + f*HR_PA)
end;

Iterations: 14
-------------------------------------------
Chi squared        = 5932.8988

Parameters:           Standard deviations:
b     = -1024.8696    ∆b =  241.3729
c     =  5185.8310    ∆c = 1007.9237
d     =     0.7774    ∆d =    0.2233
e     =     1.4131    ∆e =    0.3643
f     =     1.7033    ∆f =    0.3541

RUNS = -1020 + 5200 * (0.77*BB_PA + 1B_PA + 1.4*2B_PA + 1.6*3B_PA + 1.7*HR_PA) eq(2)
CHI2 = 5930  ==>  sig = 21.4 runs;  R = 0.959
=================================================

for comparison (http://www.insidethebook.com/woba.shtml)lists
HR 1.70, 3B 1.37, 2B 1.08, 1B 0.77, NIBB 0.62, equivalent to:
BB 0.81,  1B 1.00,  2B 1.40,  3B 1.78, HR 2.21
---------------------------------------------------
from http://www.hardballtimes.com/main/statpages/glossary/
GPA= Gross Production Average, a variation of OPS, but more accurate and easier to interpret. The exact formula is (1.8*OBP + SLG)/4, adjusted for ballpark factor. The scale of GPA is similar to BA: .200 is lousy, .265 is around average and .300 is a star. A simple formula for converting GPA to runs is PA*1.356*(GPA^1.77).
---------------------------------------------------
from  http://www.baseball-fever.com/showthread.php?t=66363
"the best correlation with runs comes from (1.8*OBA + SLG), or something in that range"
---------------------------------------------------
from  http://www.tangotiger.net/wiki/index.php?title=Linear_Weights
R = .49S + .61D + 1.14T + 1.50HR + .33W + .14SB + .73SF, roughly:
BB 0.67,  1B 1.00,  2B 1.24,  3B 2.33,  HR 3.06
---------------------------------------------------
from  http://www.tangotiger.net/wiki/index.php?title=Batting_Runs
BR = .47S + .85D + 1.02T + 1.40HR + .33(W + HB),  roughly:
BB 0.70,  1B 1.00,  2B 1.81,  3B 2.17,  HR 2.98
---------------------------------------------------
from  http://www.baseballmusings.com/archives/005962.php
RG = -5.84 + 22.92 (OBP) + 7.21 (SLG) + e,  roughly:
RUNS = -950 + 1170 * (3.18 * OBP + SLG)    R^2 = 0.92
---------------------------------------------------
from  http://cyrilmorong.com/Havoc.htm
The table below summarizes the correlation and r-squared that various stats had
with team runs from 2001-03:
  Stat  Correlation  R-squared
  AVG      0.858       0.736
  SLG      0.917       0.842
  OBP      0.891       0.794
  OPS      0.950       0.903
  SB/G    -0.032       0.001
  Net SB/G 0.136       0.018
  SB%      0.303       0.092

Followers

Who's on first?