Statistical Question


Hi All,

Anyone knows of a statistic that can assess the goodness of predicting out-of-sample values? For example I have N data points which I use to evaluate a model with p parameters. I use that model to predict the values of M data points. I then compare these predictions with the actual values for the M data points. Aside from Chi Square, do you know of any other statistic that can be used to measure the goodness of out-of-sample predictions?




IIRC, usually the model predicts certain values and the data points scatter around these predicted ('expected') values. If the model follows a normal distribution (or something which can be transformed into such a distribution) then confidence intervals can be calculated around the model curve. Typically these intervals are not rectangular even in a coordinate system where the model corresponds to a straight line. Within these intervals the data points shall be found with said confidence. If that's what you're looking for then I've to dig in my old files to find the exact way that's done - it was definitly not chi-square. I don't remember anything else alike.


Edited: 18 Dec 2012, 3:35 p.m.


If you are looking for overall summary of how well your model fits the observed data, then an alternative is to work directly with the likelihood ratio statistic, sometimes expressed on an additive scale as -2 ln likelihood. For categorical data this can be simply calculated as the G^2 statistic

However, if your interest is more in identifying outlying individual observations, then a calculation of residual values for each can be useful (e.g., Pearson residuals, or deviance residuals), particularly if calibrated as studentized values. Pearson residuals form the components that make up the Pearson X^2 statistics, while deviance residuals combine to form -2log likelihood, known as the deviance.

Finally, one can reduce over-fitting of a model by using a training sample of observations to estimate the model and then a separate testing set to evaluate the fit of the model (which seems to be something along the lines which you have described). The Prediction Error Sum of Squares is a summary measure of the fit of a regression model to the set of observations that were not themselves used to in estimating the model. It is the sums of squares of the prediction residuals for those observations.


Edited: 23 Dec 2012, 6:42 a.m.


I'm curious Namir, do you want to predict values outside of your sample set as apposed to inside your sample data set?

Possibly Related Threads…
Thread Author Replies Views Last Post
  Best statistical fit Richard Berler 8 2,480 10-30-2013, 11:25 PM
Last Post: Walter B
  Project Euler Problem 39: Statistical Mode on the HP 48GX? Peter Murphy (Livermore) 3 1,427 07-29-2011, 09:44 PM
Last Post: Peter Murphy (Livermore)
  Statistical analysis galore Geir Isene 9 2,557 11-19-2010, 09:51 PM
Last Post: Palmer O. Hanson, Jr.
  OT: Tutorials for R statistical language Namir 8 2,458 10-29-2009, 01:37 PM
Last Post: Tim Wessman
  Origins of HP41 numerical routines ot compute statistical distributions Les Wright 11 2,691 05-09-2006, 02:35 PM
Last Post: Namir
  Basic statistical functions on HP-32SII Ed Look 9 2,248 10-06-2003, 01:18 PM
Last Post: Ed Look
  SPC (Statistical Process Control) data connector info wanted Ellis Easley 0 729 04-16-2003, 06:41 AM
Last Post: Ellis Easley
  Re: Statistical Bug? R Lion (Spain) 5 1,601 04-07-2003, 05:30 PM
Last Post: hugh
  Statistical Bug? Trent Moseley 0 674 03-28-2003, 12:15 AM
Last Post: Trent Moseley
  Statistical Bug? Trent Moseley 11 2,437 03-27-2003, 01:28 AM
Last Post: Karl Schneider

Forum Jump: