More on approximating the cumulative normal distribution



#2

Greetings all,

In my quest for a slick, compact, and as precise as possible way to estimate the upper tail cumulative normal distribution (which is simply the complementary error function with linear scalings of x and y axis), I have become preoccupied with the routine erfcc() in Section 6.2 of Numerical Recipes:

#include <math.h>
float erfcc(float x)
//Returns the complementary error function erfc(x) with fractional
//error everywhere less than
//1.2 × 10&#8722;7.
{
float t,z,ans;
z=fabs(x);
t=1.0/(1.0+0.5*z);
ans=t*exp(-z*z-1.26551223+t*(1.00002368+t*(0.37409196+t*(0.09678418+
t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+
t*(-0.82215223+t*0.17087277)))))))));
return x >= 0.0 ? ans : 2.0-ans;
}

The authors say little about the routine's origins apart from noting that it is "based on Chebyshev fitting to an inspired guess as to the functional form." Save that, I can't seem to find it in any of the references cited in the chapter that I have ready access too--it isn't in Hastings, or Abramowitz and Stegun, or Acton, for example--at least that I can find.

I am intrigued since the relative error is not bad given that the constants in the polynomial are only single precision. The mapping of the domain onto the bounded interval (0,1] is particularly helpful, I suspect, in attaining goodness of fit that there is.

I can't help but wonder that one could do a little better by redoing the approximation with double precision constants (lopping off a few digits when implementing the result in a 10 or 12 digit calc) or extending the polynomial by a term or two.

Does anyone know of the origin of this approximation? Is it an NR original whose derivation is not given in the book, or is it from some other source that the NR team hasn't explicitly referenced? I would like to work with it more, and just trying to do a blind fit in curvefitting software is getting me nowhere fast--I can't even replicate the original fit, never mind extending it.

Grateful for wisdom in my quest for this grail. Is there a numerical analysis Professor Langdon in the group?


Les


#3

Hi, Les:

Les posted:

"I can't help but wonder that one could do a little better by redoing the approximation with double precision constants (lopping off a few digits when implementing
the result in a 10 or 12 digit calc) ..."

    That wouldn't do, since constants are given to 9 places in the formula and the alleged maximum error is 1.2E-7. Using double precision constants would do nothing to significantly improve the error.

"... or extending the polynomial by a term or two."

    That would definitely help, in which case the error might diminish to a point (say, 1.2E-11) where 12-digits constants would be indeed needed to realize it in full.

    If I were you, I'd do the following:

    • Using some adequate mathematical package or built-in function (INTEG if nothing else), I would find to 12 correct decimal places the values to polynomically fit, which would be the values of the function proper after 'taking out' the domain mapping and the
      "inspired" functional form.

      That would leave you with a table of values (or an adequate user-defined function, if you can do it all in your particular HP model) which would correspond to the polynomial part in the original formula, i.e., the P(x) = -1.26551223+t*(1.00002368+t*(... part.

    • Then, I would simply do a true minimax polynomial fit (not an approximate one), of whatever desired degree, to 12-decimal places. I would start by doing a minimax fit of the very same degree (9th) as the present P(x), to see if the minimax fit does indeed replicate its coefficients, which would be proof enough of being in the right track. If it does *not* replicate the coefficients, I would test the new polynomial to find its maximum error and compare against the 1.2E-7 alleged error for the original one. It should be equal or less. In any case, I would perform an automatic fit, to see what is the maximum degree which can be profitably used, i.e., you're fitting the actual data and not rounding errors.

      This would also provide a table of the minimax error for degrees 9, 10, 11, 12 .... Your present polynomial is of degree 9. I think that degree 10 would perhaps achieve an error like 1E-9 and degree 11 would achieve 1E-11, rounding errors notwithstanding, but that's only an educated guess.

    Try it and tell us how it went.

Best regards from V.

#4

Thanks!

Another forum contributor directed me to an older version of SPSS TableCurve2D, which is impressively diverse and fast given that it is over a decade old. Regrettably, user-defined functions are restricted to a maximum of 10 parameters, and the NR approximation that intrigues me is up to that already.

I should share a little more about what I like about the NR routine and why I want to improve on it if I can. For esoteric reasons, I am preoccupied with accuracy and, hopefully, precision, in the upper tail of the distribution, where the function values get quite tiny. The reported 1.2e-7 relative error does indeed seem to hold when I plot the error curve in Maple, and indeed the fact that it is a relative error, not absolute, does really assist the accuracy for the high values of the argument.

For example, on my HP48G, my implementation of the NR exponentiated polynomial approximation gives erfcc(33.82) = 3.01532436445E-499, whereas 0 1 33.82 SQRT * UTPN 2 * gives 3.01532450586E-499. This actually impresses and intrigues me greatly. The Hastings approximations quoted in Abramowitz and Stegun boast respectable absolute errors, but in the upper tail this is meaningless since tiny minus tiny equals tiny in absolute terms, but it can be huge in proportional terms when you calculate tiny divided by tiny to get something not tiny at all.

For example, 26.2.17 in Abramowitz and Stegun computes the cumulative normal distribution with an absolute max error of 7.5E-8. Using the complement of this to estimate the upper tail probability on my on my HP48G, I get roughly 2.12E-482 for input of 47, compared with roughly 1.78E-482 for 0 1 47 UTPN. (This latter value matches up with the results in Maple and Mathematica as well as the relevant continued fraction expansion out to a few terms, so I do trust it.) Right order of magnitude, and tiny absolute error, but the relative error is an abysmal 1.9E-1. Abramowitz and Stegun 26.2.17 does even more sadly in that upper tail, from what I understand from Rodger Rosenbaum who has done the number crunching.

Thanks for your feedback. I am learning I need to learn a bit more about minimax curve fitting and Chebyshev fits using transformed inputs. I am no computer scientist or mathematician, so this can take awhile. At least I will have fun in the process.

Cheers,

Les


#5

Quote:
0 1 33.82 SQRT * UTPN 2 *

This should read 0 1 33.82 2 SQRT * UTPN 2 * .

erfc is derived from the standard (i.e., mean=0, variance=1) upper-tail normal probability by scaling the input by a factor of sqrt(2) then doubling the result. Of course, if you are interested in these functions and are following this thread you already know that, so forgive my pedantry :)

Sorry if the typo created any confusion.

Les

#6

Quote:
Abramowitz and Stegun 26.2.17 does even more sadly in that upper tail

Another egregious typo. I meant 26.2.19.

Les

#7

Les,

re: "For esoteric reasons, I am preoccupied with accuracy and, hopefully, precision, in the upper tail of the distribution, where the function values get quite tiny."

Can you tell us the esoteric reasons? If this is for mathematical curiosity, I can understand it.

If it has anything to do with actual probablities, then I don't think you need to worry about it. Anything with probabilities like 1E-482 (which I recall from some previous part of this discussion) have not, nor will not, happen during the lifetime of the universe!

Am I missing something here?


#8

You are missing nothing. "Esoteric reasons" = "mathematical curiosity" and nothing more. Completely impractical intellectual dalliance with no envisioned practical implications whatsoever:)

#9

Yes but it could probably happen tomorrow if its probability is not zero.


Possibly Related Threads…
Thread Author Replies Views Last Post
  New HP-82240B emulator in WP34s distribution pascal_meheut 6 2,131 10-07-2012, 05:07 PM
Last Post: Christoph Giesselink
  Yet another Normal quantile function for the 34s Dieter 13 3,536 08-06-2012, 09:15 PM
Last Post: Paul Dale
  [WP34S] Inverse F Distribution--Danged "Domain Error" Issue is Back Les Wright 16 4,498 05-23-2012, 10:28 PM
Last Post: Les Wright
  Summary: Normal CDF and quantile function Dieter 3 1,396 05-13-2012, 02:20 AM
Last Post: Paul Dale
  [WP34S] Curious Bug in Inverse Normal Function Les Wright 61 12,633 05-01-2012, 02:44 AM
Last Post: Paul Dale
  [WP34S] Bug in Inverse CDF for F-distribution Les Wright 19 4,232 04-15-2012, 11:17 AM
Last Post: Dieter
  32E's Normal & Inverse Normal Distribution Matt Agajanian 27 6,928 04-14-2012, 06:07 PM
Last Post: Dieter
  HP 32sII Integration Error of Standard Normal Curve Anthony (USA) 4 1,643 03-14-2012, 03:18 AM
Last Post: Nick_S
  "How to" for WP-34s Poisson distribution Dominic Richens 19 4,956 10-09-2011, 06:10 PM
Last Post: Walter B
  A program for Normal Distribution on the 42s snaggs 5 1,896 09-22-2011, 02:19 PM
Last Post: Dieter

Forum Jump: