[WP34S] Curious Bug in Inverse Normal Function



#44

The XROM based function is perceptibly slower, which is a fair trade given the huge gains made in extra flash memory by getting this and other functions out of the native code.

However, in testing numerous arguments, I stumbled across one that creates a hang, for no good reason--namely, exactly p=0.0025, as well as its complement p=0.9975.

Running the function on either of these probabilities makes the calc hang, requiring a reset. However, adding even the smallest perturbation--say 1e-16--allows function to converge without difficulty.

Weird. Any idea why these particular values prove to be so disagreeable? Also, for what it's worth, I have written my own inverse normal routine that is at least as fast if not faster than the present built-in function, and which handles this quirky probability without problem.

I wonder if the inverse normal code got broken or at least bruised in its move to XROM?

Les


#45

No idea what is wrong here but I can definitely reproduce it. Will look into it later.

I'd be interested in your inverse normal implementation if it can handle the full range of values and gives a sufficient number of digits of accurate result.

e.g. what does it do given an argument of 1e-350 ? .999999999999 ? 0.5 ?


- Pauli


#46

Quote:
No idea what is wrong here but I can definitely reproduce it. Will look into it later.

I'd be interested in your inverse normal implementation if it can handle the full range of values and gives a sufficient number of digits of accurate result.

e.g. what does it do given an argument of 1e-350 ? .999999999999 ? 0.5 ?


No and yes. I haven't included a test that reflects the sign of the result for arguments above 0.5, but that is no problem for my purposes. It handles the first and third examples you give very well. The second I would reflect to 1e-12 and switch the sign.

It is basically a stripped down version of your own routine--I get the initial estimate using the formulae Dieter provided, and simply do three successive Halley refinements against the calc's Phi(x) whether I need them or not. No convergence tests. As you know, two Halley refinements give at least 18 digits (in theory), and three give at least 54 in an environment of adequate intermediate precision. More than enough for the calc's SP and DP. For SP it is overkill, but it is still fast enough for me. I suspect I lose a digit here and there due to rounding, as I can't take advantage of the full 39 internal digits, but it really was a fruitful programming exercise for me.

Here is my listing--forgive me for not commenting on it due to being rushed:

0001 ****LBL'INN'
0002 LocR 005
0003 # [pi]
0004 STO+ X
0005 [sqrt]
0006 STO .00
0007 R[v]
0008 # 022
0009 # 100
0010 /
0011 x[<->] Y
0012 STO .02
0013 x>? Y
0014 GTO 001
0015 LN
0016 STO+ X
0017 +/-
0018 STO .03
0019 DEC X
0020 [sqrt]
0021 RCL[times] .00
0022 RCL[times] .02
0023 LN
0024 STO+ X
0025 +/-
0026 [sqrt]
0027 # 005
0028 RCL[times] .03
0029 1/x
0030 +
0031 GTO 002
0032 **LBL 001
0033 # 002
0034 1/x
0035 RCL- Y
0036 RCL[times] .00
0037 FILL
0038 STO[times] X
0039 # 006
0040 /
0041 INC X
0042 [times]
0043 **LBL 002
0044 +/-
0045 STO .04
0046 # 003
0047 STO .01
0048 R[v]
0049 **LBL 003
0050 [PHI](x)
0051 RCL- .02
0052 RCL[times] .00
0053 RCL .04
0054 RCL[times] .04
0055 # 002
0056 /
0057 e[^x]
0058 [times]
0059 FILL
0060 RCL[times] .04
0061 # 002
0062 /
0063 INC X
0064 /
0065 STO- .04
0066 RCL .04
0067 DSZ .01
0068 GTO 003
0069 RTN
0070 END

My sense is that you would do well in your XROM version just to do three Halley refinements and not bother with tests of convergence. It may seem overkill, but I bet it will speed things up.


Les


#47

Ah, the Normal quantile again. :-) I would like to add some remarks that might help a bit here.

The applied algorithm itself is quite stable - as long as there is sufficient precision both in the CDF and, even more important, while evaluating the term

        CDF(x) - p
t = ----------
PDF(x)

This expression is used both in the Halley method as well in another approach I tend to prefer. The more x approaches the true quantile, i.e. the better CDF(x) matches p, the more important this point gets as the number of significant digits in the difference decreases. Another problem is the fact that, at least for x < 1, there are multiple n-digit values for x that will return the same n-digit CDF, so the nominator may become zero even if x is not yet exact.

My current solution (used in a 41 and 35s) works as follows:

  1. Provide a better initial estimate with an error order of 0,001.
  2. Improve this estimate with one single correction step, which leads to a result with about 11 valid digits for p > 1E-100, i.e. for the whole working range of the classic 10-digit machines.
  3. One more iteration should return more than 20 valid digits - I will yet have to try this in an environment with sufficient presicion. Maybe a Mathematica user can do this.
  4. Calculators with wider working range (down to 1E-500) require one more iteration step - or a slight adjustment of the intial estimate.
And here are the details. As always, p < 0,5 and x > 0,

i.e. p = 0,1  =>  x = +1,28155...
Initial estimate, using a very simple rational approximation:
 
r = sqrt(-2 * ln(p))
x = r - (0,394r + 2,366) / ((0,0728r + 1,108)r + 1)
 
if x < 0,003 then x = 0 ' improves numeric stability close to 0
 
Refinement:
 
t = (CDF(x) - p) / PDF(x) ' CDF here is to the right hand integral !
x = x + t + x/2 * t^2 + (2*x^2+1)/6 * t^3
The refinement is based on the method suggested in Abramovitz & Stegun, p. 954. I think it works very well. ;-)

I originally used this method for the 41C with its 10-digit precision and a range down to 1E-99. With at least 12 digits working accuracy, one single iteration step should provide 11 valid digits +/- 1 ULP, i.e. it should be fine for a machine code routine that carries 13 internal digits before it returns an exact 10-digit result to the user. During my tests in Excel (15-digit binary arithmetics) I did not notice any error beyond +0 and -4 units in the 12th significant digit.

For calculators with a working range down to 1E-500 a different rational approximation for the initial estimate may be used, or the latter gets a slight adjustment:

  if x > 21,3  then  x = x - exp(x/21,3 - 7,7)
Alternatively, one or two more iterations are fine, too - and very fast as well since the PDF is evaluated very quickly far out in the distribution tails.

The desired 16-digit accuracy for the 34s should be no problem with just two iterations, or three in double precision mode. That's why I agree that a fixed number of iterations may avoid the problem of oscillating approximations while the desired accuracy is still achieved. On the other hand I would recommend a different way of calculating the quotient t. For instance, the ratio CDF(x)/PDF(x) usually shows up during the calculation of the CDF (by series or continued fraction), so this might be used for an alternative way to calculate t.

Dieter

#48

Quote:
I suspect I lose a digit here and there due to rounding, as I can't take advantage of the full 39 internal digits, but it really was a fruitful programming exercise for me.

This is fundamentally the problem. Moving these functions to xrom means I've lost five digits of working precision, their accuracy must suffer because of this. It clearly suffered more than I'd thought :-(


- Pauli

#49

I know the problem. The iterative process to refine the result is oscillating between two values:

-2.807033768343804117221810394712332
-2.807033768343804117221810394712293

The old inverse normal distribution code carried extra digits and this wouldn't have been an issue.

Now to figure out a fix.


- Pauli


#50

Quote:
I know the problem. The iterative process to refine the result is oscillating between two values:

-2.807033768343804117221810394712332
-2.807033768343804117221810394712293


The last five digits of the true result, properly rounded, would be 12269.

Your fixed version goes with the former of the above two choices, 12332.

My code, as listed above, gets a little closer when run in DP, with the last five being 12308.

More than close enough for real life :)

Les


#51

A probable solution (instead of just relaxing the convergence criteria) could be to check for a cycle (keep two or three guesses and check for duplicates) and if one is found, disturb the value by one or a few ULPs, and try again. Do any iteration only a finite number of times.


#52

I'm thinking using the same solver as the other distributions might work better here. It handles these cycles cases (switching to a bisection instead) but is going to be a bit slower.

- Pauli

#53

Les, I think it's simply impossible to get a result with n exact digits as long as the working precision here is just the same n digits. ;-) Try it with a simple 10-digit machine - if you get 9 exact digits +/- 1 ULP you've achieved the best that can be done. That's why there are guard digits. ;-)

So if the working precision is limited to 34 digits, a result with 30-32 valid digits is fine. And that's what we get here - simply round the result to 32 digits and we're done:

x = -2.8070 33768 34380 41172 21810 39471 23
Dieter
#54

Just for the record: I did a quick-and-dirty implementation of the algorithm I proposed earlier in this thread, and I tried the given example p = 0,0025. Et voila - after just two iterations in double precision mode...

Quote:
The last five digits of the true result, properly rounded, would be 12269.

...this is exactly what I got:
  x = 2,8070 33768 34380 41172 21810 39471 2269
This does not change in further iterations. And yes, I know, this example does not prove anything. #-)

Dieter


#55

Pauli will be listening, I'm sure!

We should get the normal distribution right because it's a cornerstone function. If we do get the same level of accuracy out of the other distributions, fine, if not, just force a rounding to 30 digits (in double precision mode only, of course!).

Dieter, your knowledge about the numerical algorithms for the probability functions is outstanding! Where does it come from? Are you a mathematician or just a genius? ;-)


#56

I'm listening. If I get keen enough, I'll give this method a trial -- it will cost more bytes in flash but will be faster and likely at least as accurate. At the moment, I'm a bit burned out over coding statistical functions as keystroke programs.


My revised fix for this, using the guarded Newton solver used by the other distributions, produces:

-2.807033768343804117221810394712233

Slightly better but not perfect. I doubt we'll be able to get much better without carrying extra digits.


I'm not in favour of rounding results unnecessarily -- it will generally make them less accurate although predictably so.


- Pauli

#57

Quote:
We should get the normal distribution right because it's a cornerstone function. If we do get the same level of accuracy out of the other distributions, fine, if not, just force a rounding to 30 digits (in double precision mode only, of course!).

In any case, I think it will be wise to heed Dieter's warnings and be especially attentive to the behaviour around p = 0.5, where most digit loss is bound to occur due to that phenomenon of more than one n-digit quantile estimate associated with the same n-digit probability. Indeed, the distribution-formulas readme accounts discusses this in one of the suggested approaches, and does something special when p is in the interval (.495, .505).

A savvy user who is really obsessed with high accuracy (like me, even though that serves little practical purpose :) ), will be aware of the relationship between the inverse normal and the inverse chi-square with one df. Basically, for p < .5:

Phi(-1)(p) = -Sqrt(Chi^2INV(1df)(1-2p))

For example, for p = 0.4999, the Phi(-1)(x) built-in function gives:

-2.506628300880350989206501054077314 e-4

In comparison, passing 0.0002 to Chi^2INV(x) with 1 in register J and taking the negative square root yields:

-2.506628300880350989206501054078343 e-4

The direct method rounds to 30 good digits. Using the inverse chisquare intermediary yields ALL 34 digits correctly.

About the worst example I have been able to find is p = .4999999, where the built-in function gives:

-2.506628274631026751765674822177285 e-7

The inverse chi-square path gives:

-2.506628274631026751765674822754539 e-7

Again, the latter gives full 34-digit accuracy. The former is only 28 digits to 1 ULP, or 29 digits to 6 ULP.

Just food for thought.


#58

Hmhmhmhmhmmmm.... ;-)

I tried your examples with the emulator. It's a somewhat older version (3.0 2294). Here is what I got with the internal function, i.e. based on 39 digit internal precision:

p =  0,49
z = -2,5068 90825 87110 35762 36343 18346 9042 E-2 exact
 
p = 0,499
z = -2.5066 30899 57176 40053 58657 06701 9588 E-3 exact

p = 0,4999
z = -2,5066 28300 88035 09892 06501 05407 8343 E-4 exact

p = 0,49999
z = -2,5066 28274 89349 40015 68864 28261 7674 E-5 exact

p = 0,49999 9
z = -2,5066 28274 63362 54374 06724 79396 8633 E-6 +0,7 ULP

p = 0,49999 99
z = -2,5066 28274 63102 67517 65674 82275 4537 E-7 -2 ULP

p = 0,49999 999
z = -2,5066 28274 63100 07649 09264 38018 4708 E-8 -57 ULP

p = 0,49999 99999
z = -2,5066 28274 63100 05024 42014 63472 0583 E-10 exact

p = 0,49999 99999 99
z = -2,5066 28274 63100 05024 15767 90974 6036 E-12 exact
 
p = 0,49999 99999 9999
z = -2,5066 28274 63100 05024 15765 28507 3539 E-14 exact

p = 0,49999 99999 99999 9
z = -2,5066 28274 63100 05024 15765 28481 1072 E-16 exact

So the internal function seems to work quite well (with the one exception at 0,49999 999). I also compared the results to those obtained with the Chi-square method, but the latter often showed larger errors. For example:
  p  =  0,49999 999
Exact -2,5066 28274 63100 07649 09264 38018 4765 E-8
Normal -2,5066 28274 63100 07649 09264 38018 4708 E-8 -57 ULP
Chi^2 -2,5066 28274 63100 07649 09264 38018 2296 E-8 -2469 ULP
What's going on here? Why do we get different results? Is it because my version still uses the internal 39-digit precision and yours is based on 34-digit XROM code?

Dieter


Edited: 4 May 2012, 3:10 p.m.


#59

Most likely the extra digits being carried makes for the difference. Five bonus digits is a lot.

There is also a small change to the Newton search which might also make a difference. The older code did some range limiting on Newton steps within the known interval of the solution) and reverted to bisection if things didn't seem to be converging fast enough, whereas the newer code doesn't do this. Bisection now only happens if the Newton step would go outside the known interval of convergence.

The idea was to try to trap cases when convergence via Newton steps is very slow because each step only moved a small amount.


- Pauli


#60

Are phi(x) and PHI(x) exact to 34 significant digits? If not, it might help to leave them in as C code to improve the accuracy. I've no plans to implement quad precision in XROM. ;-)


#61

phi(x) will be close. PHI(x) several digits off. PHI^-1(x) even less so.

Pushing these back into C is always an option.


- Pauli


#62

Pauli,

I've been thinking a bit about the possible accuracy of the Normal quantile function. The 39-digit C-coded internal function seems to work fine for most cases, but there are a few flaws here and there, especially close to p = 0.5 resp. z = 0. In double precision user code (34 digits working precision) the result may even be wrong in the last six (!) digits.

But we can do better.

  • The easier part are quantiles > 1. Varying these by one ULP leads to a change in several digits in the corresponding probability (assuming the same number of significant digits).

  • The crucial region is 0 < quantile < 1. Here, multiple n-digit quantiles exist that map to the same n-digit probability. This also leads to digit cancellation as the difference between CDF(x) - p is evaluated. Here is an example:
    Probability p     = 0.49999 99
    Initial guess z = 2.5066 28274 63102 67517 65674 82217 7285 E-7
    Corresponding CDF = 0.49999 99000 00000 00000 00000 00000 0000
    Great - the initial guess seems to be exact. In fact, it is just the first two terms of the series expansion of the Normal quantile. The guess is exact to 34 digits if p is within 0.5 +/- 2E-9. However, p here is beyond this limit so the quantile cannot be exact. It just happens to return the desired probability. Actually, it has only 28 correct digits. But the present algorithm is not able to improve this result as the correction term will evaluate to zero.

    Another example:

    Probability p     = 0.49999
    Initial guess z = 2.5066 28274 89349 40015 11138 94721 5889 E-5
    Corresponding CDF = 0.49999 00000 00000 00000 00002 30290 7693
    So the CDF for the initial guess differs from p only in the last 10 digits. However, the guess has just 20 correct digits, so we cannot expect more than 30 valid digits in the final result.

  • The closer p gets to 0.5 the more digits get cancelled. Very close to p = 0.5 the initial guess already is exact (as explained above), so the mentioned digit cancellation does not matter. Also p < 0.4 usually has 32 or 33 correct digits. But between these points the effective accuracy can drop down to 28 digits. For instance, at p = 0.49995 the initial estimate already has 21 correct digits, but due to digit cancellation the correction term carries just 7 more.

  • Finally I tried a simple solution: the CDF for small x is evaluated by the well-known power series. Finally, the accumulated sum is multiplied with the PDF and 0.5 added to this product. So I wrote my own CDF routine on the 34s and simply omitted the final addition of 0.5. In fact, it is possible to write CDF(x) as well as CDF(x)-0,5 and the term (CDF(x)-p)/PDF(x) in a way that avoids excessive digit cancellation. In the first example this yields six significant digits in CDF(x)-p so that the final result is nearly exact (+1 ULP). This technique is possible both for low values of x (power series) as well as the continued fraction method that is applied for large x.

    I did it this way. Please note that a and b are global variables.

    function CDF(x)
    if x < 2.326 'here p is 0.01
    a = -(sum of powers loop)
    b = 0.5
    else
    a = (continued fraction loop)
    b = 0
    end if
    return CDF = a*pdf(x) + b
    end function

    function quantile(p)
    x = guess(p)
    repeat this...
    call CDF routine
    t = (b - p)/pdf(x) + a ' this equals (CDF(x)-p)/pdf(x)
    evaluate correction term dx(t)
    x = x + dx
    ...two or three times
    return x
    end function

    Alternatively, the loops for the power series and the continued fraction may be coded as subroutines that can be called both by CDF(x) and quantile(p). In this case, the subroutine call has to return both a and b.

  • The method seems to work. I tried the list of p values close to 0.5 in one of the previous posts, where the internal 39-digit 34s function returned values that were mostly exact, but sometimes off by as much as 57 ULP. With just 34 digit working precision in the user code program I tested, I did not notice any errors larger than 1-2 ULP for all test cases. The largest error I observed with random test values was about 1 unit in the 33th significant digit. Which essentially is as good as it gets when a 34-digit machine is all you have.
That's why I see two options for the desired 34-digit accuracy for the Normal quantile:
  1. If the function has to be written in user code (XROM), we need an additional function that returns CDF(x)-0.5. This method is good for about +/- 1 unit in the 33th digit.

  2. "Push it back to C", as you said, while using the same approach, and we'll get 4-5 more digits. This sums up to roughly 37 valid digits.
Finally we should also take a look at the CDF routine. I think it has to be coded with 39-digit accuracy (i.e. in C).

Dieter


Edited: 8 May 2012, 7:40 p.m.


#63

Dieter, can you take a closer look? It seems that your results for "Corresponding CDF" actually give more 9s than that specified in the desired probability, so I think there are typos here. If there are not, then I don't follow your analysis...


#64

Yes, sorry, the returned CDFs originally had a few nines too much here and there. #-)

I noticed this a few minutes before your reply, it should be correct now.

Dieter

#65

Never mind! According to the time stamps, you fixed the errors just minutes before I posted... ;)

#66

Dieter, this is excellent! I can't wait to find the time to code it myself.

Just to clarify things for myself--it seems you are dealing actually with the upper tail probabilities and confining your code to positive quantiles and probabilities less than 0.5, so according to convention we are talking about q, not p. I think this is really important to observe in the final code. If you want to make yourself cringe, compute, in double precision, inverse Phi for unity less 1 ULP (i.e, 1 ENTER X.FCN->ULP -). The result is good only to 3 or 4 digits. However, if you compute the quantile associated with the complement 1e-33, the result is almost exact to 34 digits (but with opposite sign, of course). Basically, the ultimate qf code really needs to take the complement of probabilities above 0.5, use that for the routine, and flip the sign.

I also really think that the refinement should go back to Halley's method here, since in this case it is just so simple. Using your variables, dx(t) = t/(1-xt/2). What could be easier?

Les

Edited: 8 May 2012, 10:59 p.m.


#67

Quote:
Dieter, this is excellent! I can't wait to find the time to code it myself.

Care to code it for xrom inclusion?


- Pauli

#68

Quote:
Basically, the ultimate qf code really needs to take the complement of probabilities above 0.5, use that for the routine, and flip the sign.

Or support both p and q for the normal and cross call between them as required.


- Pauli


#69

I thought this already got implemented a long time ago?! Did something change here?

To ensure the best possible accuracy, both the CDF and the QF should work the way I suggested in my reply to Les. Please see below.

Dieter

#70

Les, all algorithms I proposed for the Normal CDF and its inverse use the same convention: x >= 0 and p (or q) < 0,5. So, actually only the right half of the function (x is non-negative) is considered, and the upper tail integral (from x to infinity) is evaluated. This means:

  • The CDF routine generally uses abs(x) as its argument and finally returns q or 1-q, depending on the sign of x.
  • The QF works on min(p, 1-p) and calculates a positive quantile. Finally the sign of the result is set, depending on p > or < 0.5.
    This can be done by multiplying the result with sign(p-0.5).
Of course this also addresses the accuracy issue you mentioned. The (lower tail) CDF for e.g. x = +6 is calculated by evaluating the upper tail integral (usually designated Q) and then returning 1 minus this value. In other words: the routine first evaluates Q(6)= 9,86587645...E-10 to 16 (34, 39...) digits and then returns 1 minus that = 0,9999999990134123. So if Q has 34 valid digits, the returned P(hi) in this example could have 34 + 9 = 43 valid digits - if the 34s would allow it. ;-)

In other words: at least my own CDF and quantile routines already do what you suggested. So the example you gave (quantile for 1 - 1E-33) actually is evaluated as -quantile(1E-33). I also proposed this about a year ago when the distribution functions were discussed here. I assumed this went into the actual implementation, but to my astonishment this obviously is not the case. Pauli, did something change here? Coded correctly, quantile(p) and quantile(1-p) should only differ in their signs, while they otherwise agree in all digits. So, yes, this idea of "the ultimate QF code" has been proposed. I just wonder why it is not implemented, at least not in the current version.

Regarding Halley's method: yes, it converges quite fast, but there is an even better approach (cf. Abramowitz & Stegun):

Halley:   dx = t / (1 - x*t/2)
For small x*t/2 this approaches
          dx ~ t * (1 + x*t/2)
= t + t^2*x/2
Now the method I proposed adds one more term to this:
A & S:    dx = t + t^2*x/2 + t^3*(2*x^2+1)/6
This approach converges a bit faster, and there is another advantage as well: for x = 0 it is equivalent to the sum of the first terms of the Normal quantile's series expansion. I used this nice feature here and there to provide exact results close to x=0 with the initial guess set to zero.

By the way, my current double precision version for the 34s starts with the good old Normal estimate we discussed some time ago. So the rational approximation I recently suggested is not used here. This is because of the extremely wide working range in double precision mode that goes even below 1E-6000. The rational approximation works fine down to 1E-400, but farther out in the tails the usual three refinements would not suffice. That's why I reverted to the previous method, which is approximately asymptotic as p approaches zero. I just added a slight improvement: the term 0,2/u resp. 1/5u was replaced by 1/4u. The switch between the center and tail approximation is between 0,232 and 0,233, limiting the maximum absolute error to slightly less than 0,01. :-)

Now I am looking at the CDF algorithm for large x. Currently I am using the internal function. The continued fraction loop is evaluated best (and fastest) from the right to the left, so the number of loops n has to be known beforehand. Currently this is done by an estimate for n I proposed last year. However, this has not been tested for 34+ digit precision. I also thought about the Lentz method so that the continued fraction can be evaluated from left to right until convergence is reached, but I feel this is way too slow here.

Dieter


Edited: 9 May 2012, 7:37 a.m.


#71

I use the Lentz method in my own adaptation of the right-sided incomplete gamma, and I do agree that when I compute erfc or the upper-tail normal as a special case of incomplete gamma it seems slower than it needs to be. Also, because the arguments have to be manipulated before being passed to my incomplete gamma routine, that's another potential source of digit loss due to rounding.

You aspire to full internal 39-digit accuracy of the CDF (or, conversely, the upper-tail probability), but this is impossible with only 39 digits since, even if every single computation returned 39 perfect digits and there was no loss due to rounding there is the matter of subtraction error at the threshold where one moves from the series expansion to the continued fraction. You have chosen 2.326 as your break point, and I think this is a good one, since that is where the upper tail probability is is 0.01, but even as you approach that you will lose at least one digit to subtraction of the series result from 0.5 to yield the upper-tailed probability. If you make the threshold lower, digit loss is reduced, but the continued fraction gets much more sluggish as more--sometimes many more--terms are needed and that means more computation and accumulated rounding errors. The existing code uses a threshold of roughly 3.15. The series does quite well up there, and beyond it the continued fraction is very fast. However, the subtraction error is at least two or three digits lost when we subtract the series result from 0.5 to get the upper tail probability.

If we push the normal CDF and upper-tail probability and PDF back to C code, 34 digits returned to the stack should be easy to attain, but if we insist on 39 digits internally the number type used will have to introduce some more guard digits. That subtraction error is a bugger, but your use of the 2.326 threshold is probably the best compromise between speed and accuracy. Whether due to subtraction or rounding these calculations will not across the range give 39 digits of internal precision all the time unless working precision is more. However, the 34 digits we see on the stack in DP should be easy.

Les
Les


#72

Les, I know that it is not possible to get 39-digit accuracy on a 39-digit machine. This was already pointed out in several earlier posts, so on my 12-digit 35s I am happy with 11 valid digits (+/- 1 unit) in the final result. That's why, on the 34s, I would like to see the CDF (and, if possible, also the QF) implemented in an internal function with 39-digit working precision - to achieve a final result with 34-digit accuracy. ;-)

Yes, switching from the series expansion to the CF method at x = 2,326 will sacrifice one digit. The ideal break point would be x = 1,28155... where Q(x) is 0,1. But this would slow down the calculation significantly. That lost digit is also the reason why I think the CDF (and quantile) should be coded in C with 39-digit precision (again: in order to get at least 34 valid digits in the result - we will not get all 39 anyway). I think we agree that the current break point at Pi or 3,15 is not perfect if the goal is maximum accuracy. This point is even beyond x = 3,09023 (Q(x) = 0,001) so we will lose even four digits, if I get it right. Maybe Pauli can change the code to switch at least x = 3 (will save one digit) or, even better, 2,326 (two more).

On a side note, choosing 2,326 as the break point was the obvious option for my earlier 35s programs, since near this value the execution times for the series expansion and the continued fraction were roughly the same. ;-) And losing one digit was okay for me since a 12-digit machine will not produce a 12-digit result anyway. #-) It was and is exactly what you said: a compromise between speed and accuracy.

In general, I think we both see the same problems as far as numeric accuracy is concerned. I suggested the one or other method to overcome these. If we assume that on an n-digit machine we may achieve about n-2 valid digits in the computed quantile (provided the CDF is exact!) this again leads to my initial suggestion: let's have these two key functions coded in C so that the result will have 34 safe digits. I just see that's exactly what you wrote in your last sentence. ;-)

So we seem to agree in all major points on the numeric side. Now let's try to convince Pauli to move these two essential functions back to C. ;-)

Dieter


#73

Changing the break point in the xrom implementation is easy and cheap. Changing it in C is possible but requires more memory for the task. At the moment we've got the memory free to make some improvement to this function.


Quote:
So we seem to agree in all major points on the numeric side. Now let's try to convince Pauli to move these two essential functions back to C. ;-)

I'm happy to do some relocating. What functions exactly? The standard normal functions as they previously existed are 856 bytes in size -- minus the space they currently use. This is probably a bit on the large size when the current functions are returning 16 digit accurate values albeit rather slowly.


I could do an internal normal CDF function that returns the pdf, the cdf and the 'b' value you mentioned previous (or the 'a' value instead of the cdf).

The QF could then be keystroke programmed.


- Pauli

#74

I can use more than 39 digits internally too -- some complex operations are performed carrying 72 digits, the modulo two pi carries a completely insane number of digits and the standard remainder operation uses higher precision due to a flaw in the decNumber library.


- Pauli

#75

I've moved the normal pdf back into C.
I've added a C normal cdf helper function (cdfqh).

It takes x in X and returns the pdf in X, 'a' in Y and 'b' in Z. Registers Y and Z are destroyed and last X is not set -- this is a very internal function.

After executing cdfqh, you should execute * + to get the lower tail CDF result.

Better is the sequence cdfqh 1 CPX_* DROP gets the same result but with an atomic multiply-add instruction that handles double precision results without loss of accuracy.

That's most of our flash headroom gone now :-(


Now for the quantile function done better.

- Pauli

Edited: 10 May 2012, 12:49 a.m.


#76

Pauli, is this code conditionally compiled? Not everybody needs 34 digits for his statistical calculations.


#77

No it isn't a conditional include. This replacement code will be significantly faster than the existing xrom code and that justifies it place enough for me. The conversion to xrom slowed all the statistical code down a bit, the normal distribution suffered this by far the worst. More importantly, I don't want to have to support dual versions of the same thing :-)

The quantile function won't be 34 digits accurate -- that simply isn't achievable from xrom.


- Pauli

#78

For Phi(x) and Phi-sub-u(x), the constituent parts from the cdfqh helper function are returned to the stack and lose their guard digits, so there still is indeed the likelihood that things will be off a little often in the 34th digit--not the full 34-digit accuracy I know Dieter was hoping for. But, at least below the new threshold of |x| < 2.326, things seem fast and really good.

But there seems to be a more serious challenge. You have implemented the new cutoff as noted. The last four digits of of the 34-digit Phi-upper-(2.326 - 1ULP) result are 7055, whereas the true result properly rounded is ends with 7050. Excellent 34-digit precision considering that the last few steps of the computation are done in 34-digit working precision, not 39.

But when we get into continued fraction territory, there be wonkiness. The calculator's 34-digit output for Phi-upper-(2.326+1ULP) ends with 3078437, whereas the true last seven digits are 0377044.

The may be evidence that the formula used to compute the number of terms taken with the CF needs to be reconsidered, since in comparing a calc with the newest firmware and one I haven't yet re-flashed it seems this is old news. For example, when I compute the upper-tail Phi(x) for 3.2, an argument well past the OLD threshold of Pi, the 34-digit output is "wrong" in the last 8 digits in both 2965 and 2975. Moreover, that "wrongness" agrees in all of those digits except the last. On the other hand, my slower and less efficient routine that computes the upper-tail probability for such larger arguments as a special case of the right-sided incomplete gamma function (my own routine) gets 32 of the 34 digits correct BOTH with the threshold examples and the argument of 3.2.

I am sheepish I didn't pick up on this before, but I took for granted that the CDF and upper CDF were pretty solid in double precision whether in C-code or XROM. It may be worth revisiting whether a modified Lentz computation is worthwhile after all. Keep in mind the calculator's code already uses this algorithm for the betacf and gcf subroutines. Phi and Phi-upper are really just special cases of the left and right gamma function respectively, so maybe we should forgo a dedicated brute force right to left computation of the continued fraction and use the modified Lentz code that is there? I admit this would mean another byte-gobbling constant to pass to gcf() as the gln parameter (ln(sqrt(Pi)) = ln(Pi)/2 = 0.572364942924700087071713675676529355824), but if we ditch the dedicated CF code, would it be worth it? My user code of the right-sided incomplete gamma is perceptibly slower, but if I am getting 32 or 34 digits in a 34-digit environment, there is something right about this alternative.

Just my 3 euros...

Les

Edited: 10 May 2012, 3:12 a.m. after one or more responses were posted


#79

There are quite possibly errors in the code still :-(

I started using gammap for the normal cdf but cancellation limited the number of correct digits at times -- this is what Dieter first picked up on.

I really need to implement the right sides of the incomplete gamma and beta functions and use them for the other distributions (the upper tails for F, chi^2, binomial and Poisson are pathetic at present).

User code for both sides of the incomplete beta and gamma would probably make things space neutral -- xrom seems to be about half the size of C for this kind of thing. Speed has also been subservient to accuracy during the development of the 34S.


Like I said previously, my motivation for statistical distributions is low at the moment.


- Pauli

PS: the upper tail CDF's were introduced this past week -- so you couldn't have taken them for granted.


#80

Quote:

PS: the upper tail CDF's were introduced this past week -- so you couldn't have taken them for granted.


Ah, but in a fashion it has always been there for the normal CDF. I have simply flipped the sign on the argument to get the upper CDF.

Les

Edited: 10 May 2012, 4:24 a.m.

#81

Les wrote:

Quote:
For Phi(x) and Phi-sub-u(x), the constituent parts from the cdfqh helper function are returned to the stack and lose their guard digits, so there still is indeed the likelihood that things will be off a little often in the 34th digit--not the full 34-digit accuracy I know Dieter was hoping for.

I have not tried the new version yet (all I have is the emulator) but if it's not worse than my user code solution, we should get close to 34 digits. I think an error within 1 unit of the 33th digit is feasible.
Quote:
The last four digits of of the 34-digit Phi-upper-(2.326 - 1ULP) result are 7055, whereas the true result properly rounded is ends with 7050. Excellent 34-digit precision considering that the last few steps of the computation are done in 34-digit working precision, not 39.

But when we get into continued fraction territory, there be wonkiness. The calculator's 34-digit output for Phi-upper-(2.326+1ULP) ends with 3078437, whereas the true last seven digits are 0377044.


I tried these examples with an older emulator version (39-digit C) and in all cases the exact 34-digit value was returned:
2,326 [ENTER] ULP [-] => 0,010009 27534 08676 67260 76355 29037 7050
2,326 => 0,010009 27534 08676 67260 76355 29037 7047
2,326 [ENTER] ULP [+] => 0,010009 27534 08676 67260 76355 29037 7044
That's why I prefer the CDF in C.

Then I tried the same values examples with my own 34s user code implementation, and this is what I got:

2,326 [ENTER] ULP [-] => 0,010009 27534 08676 67260 76355 29037 7049
2,326 => 0,010009 27534 08676 67260 76355 29037 7047
2,326 [ENTER] ULP [+] => 0,010009 27534 08676 67260 76355 29037 7044
The true values are
Phi(2,326 - 1E-33)  =    0,010009 27534 08676 67260 76355 29037 70496 885...
Phi(2,326) = 0,010009 27534 08676 67260 76355 29037 70470 211...
Phi(2,326 + 1E-33) = 0,010009 27534 08676 67260 76355 29037 70443 537...
So in all cases the results are correct within 1 unit of the 34th digit.
Quote:
The may be evidence that the formula used to compute the number of terms taken with the CF needs to be reconsidered

Yes, definitely. As I said before, the original estimate for the number of loops has been tested only for up to 16 digit accuracy. According to the original formula, the nominator in the estimate for the required number of loops would be something like 500 here, i.e. n = 500/(x-1) + a few to be safe. In fact it seems to me (just tried a few examples) that this number of loops is good for 34 digits, but with some error in the last place due to roundoff errors. We should be aware that the number of iterations must be as high as required, but it should be as low as possible to prevent excessive roundoff.

So I tried n = 400/(x-1) + 8 (caveat - this has not been tested!) and I got the results you see above. I do not know how the number of iterations it determined currently, but I think I remember something like 200/(x-1), i.e. roughly half of what I tried. Which means that we will not get 30+ digits at all. And that's what you see in the results you posted.

But I would even go one step further. I agree with Marcus in that not everyone needs 34-digit accuracy here. My suggestion is: let's make the special functions that require quite a bit of execution time accuracy-dependent. In other words: they should be evaluated to 16+ digits if this is what is returned to the user, or full 39 digits if we are in double precision mode, or that function is called internally by another one that in turn has to return a 34+ digit result. There is no need to wait for a 34-digit result if the user only gets 16 digits returned. The former would require about 4x the time of the latter. Compared to the current implementation this will roughly double (!) its execution speed, since for 16 digits around 100/(x-1) loops (instead of 200/(x-1)) are sufficent.

Dieter


#82

Quote:
I tried these examples with an older emulator version (39-digit C) and in all cases the exact 34-digit value was returned:

Of course you would! In an older version you would be dealing with the series still for arguments around 2.325, since the cut-off was around Pi in that older code. I would expect the results would be pretty good, as you see. If you want to see how the continued fraction performed, you have to look at a larger argument. Try 3.15 or 3.2. I am sure you will see that there are not enough CF loops for 34 digits and you'll get 27 to 30 correct.

I got a nice email from Pauli asking for an actual code sample, and I will see what I can do. In the meantime, I am starting to wonder if my obsession with full double precision accuracy for these functions is reasonable or practical. Getting even sixteen digits, only 12 of which are routinely displayed is pretty much all most reasonable humans need!

Les

Edited: 10 May 2012, 3:31 p.m. after one or more responses were posted


#83

Quote:
Is pretty much all most reasonable humans need!

I doubt we are reasonable humans. ;-)
#84

Les, of course I expected the internal function to be exact. The point here was that the user-coded function also may return a result with 33 or maybe even 34 digits - provided the number of terms in the CF loop is sufficient. Obviously I did not express this clearly enough - sorry for that.

In the meantime I tried to determine how many loops it will take for a relative error below 5E-35. So I wrote a short routine for Casio's Keisan service and set it to 38 digits working precision (sorry, no Mathematica or Maple here). Finally I tried to describe the relation between x and the required number of terms n, and here's the result:

  n >= 12 + 940 / (x - 1/3)^1,57
or alternatively
  n >= 12 + 1000 / (x - 0,3)^1,6
In both cases this gives a number of terms slightly higher than required (in a perfect calculator ;-)). The formula works down to x = 1 so that it can even be used if the break point is set to 1,28155. Which is what I now use on the emulator - it is fast enough for 1000 terms in virtually no time. ;-)

Edit: I also tried |x| > 3,14 with the emulator's function, and yes, Phi(-3,2) has only 27 correct digits (when rounded). The user code function, based on the CF method with the number of terms calculated with the formula above, agrees with the true value in all 34 digits. Well, OK, the last one is 1 while it actually should get rounded up to 2. ;-) I did some further tests and as far as I can tell, the error stays with a few units in the 34th digit. Until now, I did not notice any error exceeding 1 unit in the 33th digit. So if all this gets coded in C and 39 digit precision it looks like we can assume that all returned 34 digits should be exakt (within 1 ULP).

Dieter

Edited: 10 May 2012, 1:54 p.m. after one or more responses were posted


#85

Dieter, I don't think we want to use exponentiation just to determine the number of loops so I implemented the old approach with 500/(|x|-1)+8 in DP. I reduced the numbers to 150 & 4 for single precision but I didn't do any tests.


#86

What? No exponents and powers ?-)

But fine - then please try this:

   n  >= 300/(x-1,5) + 8
This should be fine for 34 digits and x >= 2,326. It also does not overestimate the number of required loops as much as the 500/(x-1) method does.

For 16-digit accuracy something like 100/(x-1) + 8 should be okay. But I have not tested this yet.

Edit: Here's my suggestion for 16-digit accuracy and x >= 2,326:

   n  >= 85/(x-1,3) + 4

Finally: this does not have to be coded with high-precision DecNumber functions. Even standard C single precision will do. ;-)

Dieter


Edited: 10 May 2012, 5:45 p.m. after one or more responses were posted


#87

Quote:
What? No exponents and powers ?-)

They use LOGs which are are expensive. It's all about saving time, isn't it? I will try your formula but I cannot yet tell you when. Maybe Pauli is quicker. Can you try the latest incarnation and do some tests? It uses the 2.326 threshold. For SP, is 100/(x-1)+8 better than 150/(x-1)+4 ?

Edited: 10 May 2012, 2:38 p.m.


#88

Quote:
They use LOGs which are are expensive. It's all about saving time, isn't it?

Sure. But an optimized n may save many useless loops. ;-)

Quote:
For SP, is 100/(x-1)+8 better than 150/(x-1)+4 ?

The latter returns a loop count about 30-40% high. Replace the 150 by 110 and you get a very good estimate that works fine down to x = 1,7 and is just a few counts higher than the required minimum.

I edited my last post to add another formula for 16-digit precision. According to my tests this should give no relative errors larger than 3E-17. Actually 110/(x-1) + 4 (and round up via CEIL) works just as well. #-)

You can try this yourself: Visit the Casio Keisan Website, go to the calculator and enter this:

n = -int(-(110/(x-1) + 4));
k = n;
summe = x;
do {
summe = k/summe + x;
k = k-1;
}
while (k>0);
cdf = normalpd(x)/summe;
exc = normalcd(x);
print(cdf, exc, (cdf-exc)/exc, n);
Press "Execute" and use "2 , 0.1 , 10" as your x-value. This will generate a table with the approximated and exact PDF as well as the relative error and finally the number of iterations. :-)

I will also try the latest version in the emulator. For me this is always a lot of guesswork since I am not sure which files need to be updated. #-) My emulator version is 3.0 2725. Could you provide a list which files need to be replaced?

Dieter


#89

Quote:
I will also try the latest version in the emulator. For me this is always a lot of guesswork since I am not sure which files need to be updated. #-) My emulator version is 3.0 2725. Could you provide a list which files need to be replaced?

Generally, wp34sgui.exe and wp34s-lib.dat from the windows bin directory should be fine. You may need to remove any older wp34s.dat file. If you want to use the assembler, update the tools directory contents, except for the .exe files which aren't up-to-date. You will need a working Perl installation.

The best way is to install SVN, check out the complete tree once and then just do an SVN update to become up-to-date.

EDIT: I've put up a new release file. This is the easiest option.

Edited: 11 May 2012, 4:45 a.m.

#90

Quote:
Yes, definitely. As I said before, the original estimate for the number of loops has been tested only for up to 16 digit accuracy. According to the original formula, the nominator in the estimate for the required number of loops would be something like 500 here, i.e. n = 500/(x-1) + a few to be safe. In fact it seems to me (just tried a few examples) that this number of loops is good for 34 digits, but with some error in the last place due to roundoff errors. We should be aware that the number of iterations must be as high as required, but it should be as low as possible to prevent excessive roundoff.

If I read Pauli's code correctly, the loop count is fixed at 500, not depending on X. It breaks out of the loop if the iteration does not change the value.

Edit: I misread the code, the 500 loop is not for the continued fraction but for the series expansion below the threshold. The CF count is something like 256/(|x|-1). I might be able to make this dependent on the requested accuracy.

	if (dn_lt(&absx, &const_2_326)) {
decNumberSquare(&x2, &absx);
decNumberCopy(&t, &absx);
decNumberCopy(&a, &absx);
decNumberCopy(&d, &const_3);
for (i=0;i<500; i++) {
dn_multiply(&u, &t, &x2);
dn_divide(&t, &u, &d);
dn_add(&u, &a, &t);
if (dn_eq(&u, &a))
break;
decNumberCopy(&a, &u);
dn_p2(&d, &d);
}
decNumberCopy(&b, &const_0_5);
if (decNumberIsNegative(&x))
dn_minus(&a, &a);
} else {
//dn_minus(&x2, &absx);
//n = ceil(5 + k / (|x| - 1))
dn_m1(&b, &absx);
dn_divide(&t, &const_256, &b);
dn_add(&u, &t, &const_4);
decNumberCeil(&b, &u);
decNumberZero(&t);
do {
dn_add(&u, &x, &t);
dn_divide(&t, &b, &u);
dn_dec(&b);
} while (! dn_eq0(&b));

dn_add(&u, &t, &x);
decNumberRecip(&a, &u);

if (decNumberIsNegative(&a)) {
dn_minus(&a, &a);
decNumberZero(&b);
} else {
dn_1(&b);
dn_minus(&a, &a);
}
}

Edited: 10 May 2012, 12:58 p.m.


#91

This is harder than you expect -- the routine doesn't know the requested accuracy. It is called from xrom so the machine will always be in double precision so that mode flag isn't helpful.

The flag indicating the original user's single/double setting isn't adequate either since this code could be called from the QF routine that really does require some extra precision even though the user is running in single precision.

We'll need an out of band way to indicate this. i.e. use one of the other xrom local flags to indicate what precision to use if the user's mode is set to single.


Sigh.


- Pauli


#92

You could just pass the required accuracy on the stack if necessary. We can always compute to an accuracy high enough even for nested calls in single precision. What is implemented now should be fine for all cases I can envision even if it may waste some cycles in single precision.

Edit: I've implemented Dieter's latest suggestions. Pauli, can you give an example when you need higher accuracy? This should be easy to implement. The nominator for SP is now 100 (instead of 83 as suggested by Dieter) which gives a little headroom.

Edited: 11 May 2012, 4:48 a.m.


#93

More accuracy will be required when solving for the quantile function.
Possibly for erfc as well -- I don't remember the transform from the normal cdf used there.


- Pauli

#94

I think we will need no extra headroom now. ;-)

It's 3:45 a.m. in central Europe now - and I did some tests on the continued fraction approach for the Normal CDF and x > 2,326. This time I tried the Lentz algorithm in order to determine the exact number of terms required for a specific accuracy. This lead to a quite extensive table with exact data. And it turned out that the relation between x and the number of required terms can be described very precisely the following way, so that that we get the desired accuracy without significant overhead.

For single (16-digit) precision I think it's nice to have one or two guard digits to make sure that the last digit is rounded correctly. So I set the tolerance to 1E-18 to get 18 digits +/- 1 ULP. Which lead to...

   n >= 480 / x^2  +  9
This matches almost exactly the number of required terms for the given accuracy. Only here and there the result is 1 or 2 terms lower, so that the relative error might increase to 1,2 E-18 or something similar.

For double (34-digit) precision I think it makes sense to simply use the full 39 internal digits. The last one or two may be slightly off anyway. With a target accuracy of 1E-39 I got this:

   n >= 2200 / x^2  +  20
Just far out in the tails where only very few terms are required - and processed in virtually no time - the calculated n is a bit high due to the finally added constant.

So I would definitely suggest that the number of terms for the Normal CDF should be set according to the respective formula above. They very exactly match the number of required terms for the desired accuracy while on the other hand no time is wasted for useless loops "just to be sure". ;-)

Dieter

#95

Quote:
Phi and Phi-upper are really just special cases of the left and right gamma function respectively, so maybe we should forgo a dedicated brute force right to left computation of the continued fraction and use the modified Lentz code that is there? I admit this would mean another byte-gobbling constant to pass to gcf() as the gln parameter (ln(sqrt(Pi)) = ln(Pi)/2 = 0.572364942924700087071713675676529355824), but if we ditch the dedicated CF code, would it be worth it? My user code of the right-sided incomplete gamma is perceptibly slower, but if I am getting 32 or 34 digits in a 34-digit environment, there is something right about this alternative.

I'm quite tempted to remove the special case for the standard normal distribution and just use gammap and gammaq for this. We'll lose speed in the quantile function but save lots of flash -- we lost almost three pages of flash special casing the normal cdf & pdf and doing the special purpose normal qf solver.


In a keystroke program a double precision constant is 17 bytes plus two for each access of it. I'd happily pay this price to save 600+ bytes of C.


Do you know the relationship between gammap/gammaq and the cumulative normals? I don't remember it and a quick google hasn't turned up anything viable.


- Pauli


#96

Quote:

Do you know the relationship between gammap/gammaq and the cumulative normals? I don't remember it and a quick google hasn't turned up anything viable.


For x > 0:

Lower Tail Normal CDF = 0.5*(1 + gammap(0.5*x^2, 0.5))
Upper Tail Normal CDF = 0.5*gammaq(0.5*x^2, 0.5)

The parameter order is as you implement it in the calculator, and the opposite of the convention in Maple and Mathematica--the "a" parameter second, the "x" parameter first.

See A&S 26.2.30.

Les


#97

Quote:
The parameter order is as you implement it in the calculator, and the opposite of the convention in Maple and Mathematica--the "a" parameter second, the "x" parameter first.

I switched the parameter order for gammap and ibeta this morning :-)


- Pauli


#98

I notice that in the documentation for the newly baptized P and Q the parameter order reflects what actually goes on the stack--second goes first so second is in y and the first is in x, just as the manual indicates. Works for me. With that in mind, in the normal distribution special cases x^2/2 goes into y and .5 into x before calling P or Q. So maybe I should've written things the conventional way to match the manual and the soon-to-be compiled functions.

Les

#99

Quote:
Lower Tail Normal CDF = 0.5*(1 + gammap(0.5*x^2, 0.5))
Upper Tail Normal CDF = 0.5*gammaq(0.5*x^2, 0.5)

Thanks. I'll have a go with these two. I remember some difficulties with the lower tail cdf via the gammap function or rather via erf and then gammap. That's the way I did it originally.


- Pauli


I remember the problem with the first formula. For x<<0 you can't just invert using 1 - cdf which cancellation. However, with both versions present, this isn't such an issue.

- Pauli


Exactly. With x < 0, things are just reflected--the lower CDF formula for negative argument is just the upper CDF formula with positive statistic, and the upper tail formula is just the lower tail formula with positive statistic. You couldn't do that reflection thing when you didn't have gammaq available.

Les

I think I am going to have to revisit the inverse normal function yet again. Any chance you could send the quick and dirty code please? Even better a commented version :-)

- Pauli


Pauli, Les et al. -

I'll try to summarize all this in one single post. ;-)

Yesterday I replaced the initial rational approximation from by previous suggestion by a dedicated version for p down to 1E-400. It now provides an estimate with an absolute error near +/- 0,002 and requires just two short real constants. The rest can be done with INC commands. ;-)

Estimate for 1E-400 =< p =< 0.5:

   r = sqrt(-2 ln p)
x = r - (0,295*(r+1) + 2) / ((0,0511*r + 1) * r) + 1)
An estimate x below 0.001 should get rounded to zero to ensure best results in the following correction step. The approximation was designed to give eight valid digits after the first iteration (max. error approx. +/- 2 units in the 9th digit), and with a second resp. third iteration we should be on the safe side for standard resp. double precision - as long as the working precision is sufficient. But that's another story - see my other remarks below.

Here's my implementation for the 34s.

001 **LBL A
002 STO 00 // save p in R00
003 ENTER[^]
004 DEC X
005 +/-
006 MIN
007 STO 01 // q = min(p, 1-p) in R01
008 LN
009 STO+ X
010 +/-
011 [sqrt] // r = sqrt(-2 ln q)
012 ENTER[^]
013 INC X // rational approximation starts here
014 .
015 2
016 9
017 5
018 [times]
019 INC X
020 INC X
021 .
022 0
023 5
024 1
025 1
026 RCL[times] Z
027 INC X
028 RCL[times] Z
029 INC X
030 /
031 - // first extimate x
032 ENTER[^]
033 SDL 003
034 x<1? // x < 0,001? resp. 1000 x < 1 ?
035 CLSTK // then x = 0
036 DROP
037 STO 02 // save x in R02
038 2
039 DBL?
040 INC X // 2 resp. 3 iterations will do
041 STO 03 // loop counter in R03
042 **LBL 000
043 RCL 02
044 +/-
045 [PHI](x) // Q(x) = right hand integral = Phi(-x)
046 RCL- 01
047 RCL 02
048 [phi](x)
049 / // t = (CDF(x) - q) / PDF(x)
050 FILL // save t on stack
051 x[^3]
052 6
053 / // least correction term is t^3 / 6...
054 RCL 02
055 x[^2]
056 STO+ X
057 INC X
058 [times] // ... * (2 x^2 + 1)
059 x[<->] Y
060 x[^2] // second last term is...
061 RCL[times] 02
062 2
063 / // ... t^2 * x / 2
064 + // add two smallest terms together
065 + // add this to t to get complete delta
066 STO+ 02 // correct x by delta
067 DSZ 03
068 GTO 000 // repeat 2 or 3 times
069 RCL 00
070 .
071 5
072 -
073 SIGN // finally adjust sign of x
074 RCL[times] 02
075 END

0,0025 [A] -2,8070 33768 34380 41172 21810 39471 2269 E+0 exact

This works fine at least for quantiles >= 1. Now let's look at p close to 0,5.

Note: "+1D34" = +1 unit in the 34th significant digit.
0,49               [A] -2,5068 90825 87110 35762 36343 18346 9043 E-2   +1D34
0,4999 [A] -2,5066 28300 88035 09892 06501 05407 8430 E-4 +1D32
0,4999999 [A] -2,5066 28274 63102 67517 65674 82243 4196 E-7 -3D29
0,4999999999 [A] -2,5066 28274 63100 05024 42013 42148 8860 E-10 -1D25
0,4999999999999999 [A] -2,5066 28274 63100 05029 49753 49666 4200 E-16 +5D20
This shows the problem Les and I have been talking about previously. Generally spoken, our basic approach (i.e. evaluate the CDF for the current approximation, see how good it matches p and apply a correction) is good for up to 34 decimals, but not for 34 significant digits. This is because close to zero there are numerous n-digit quantiles that return the desired p so that no further correction is applied. In other words, a calculated quantile near zero with a magnitude of 10^-n may be off in its last n digits, so that we may expect something like 34-n digit accuracy. Yes, the given examples are one digit better. ;-)

What can we do? We will have to handle p close to 0,5 separately. This can be accomplished in several ways:

  • Use the Chi-square method Les suggested. We just have to find a limit where the program switches from one method to the other.
  • Use an direct and exact series expansion. Last year I proposed a suitable solution. Les mentioned it these days in one of his posts as he found it in the readme files. Here is the basic approach:
    let   u  =  sqrt(2*pi)*(p-0,5)
    then z = u + u^3 * 1/3! + u^5 * 7/5! + u^7 * 127/7! + u^9 * 4369/9! + ...
    The nominators (1, 1, 7, 127, 4369, ...) are given by integer sequence A002067. For 34 digits and p = 0,499...0,501 only a few terms up to u^11 should be sufficient. The equation should be evaluated from right to left, i.e. add the smallest terms first.
  • A quick and easy solution may be a special function that evaluates Phi(x)-0,5. In other words, a solution similar to the well-known e^x-1. During the Phi(x) routine this value is already calculated (integral from 0 to x), before finally 0,5 is added to get Phi(x), i.e. the integral from -infinity to x.

But there's also another way to address this issue: Forget about 34-digit accuracy. There simply is no way to ensure a valid n-digit result as long as all calculations carry just the same n digits and no further guard digits. I remember how I tried to get an exact 10-digit result for the Normal quantile on the 41C with the same 10 digits working precision. It's simply not possible (at least I do not know of a method to accomplish this). That's why I designed the algorithm I suggested a few days ago - it returns 11 good digits over the 41C's working range as long as the calculation is done with a few more guard digits, e.g. in machine code.

If I remember correctly, the original idea behind the double precision mode was this: enable the user to write user-code functions with lots of extended precision, so that the final result can be assumed to carry at least the usual 16 digits that are normally returned to the user. Once again: in many, if not most cases it's simply not possible to get a true 34-digit result unless the working precision exceeds these 34 digits. So we cannot expect 34 digits anyway. What is possible? 32? 30? 25? It depends. Round it to 16 digits and you are on the safe side. :-)

Dieter


Quote:
Yesterday I replaced the initial rational approximation from by previous suggestion by a dedicated version for p down to 1E-400. It now provides an estimate with an absolute error near +/- 0,002 and requires just two short real constants. The rest can be done with INC commands. ;-)

Thanks very much for this, I'll have a look at including it or something similar.

We've not so constant adverse any more. Constants for C occupy 36 bytes plus the code to load them, for keystroke programs they require nine bytes for single precision and 17 for double (plus two bytes to load them). Additionally, integers 0 - 255 are directly available and with the decimal point shifts, many short constants are only two instructions (4 bytes) to get.


- Pauli

Try now :)

- Pauli


Possibly Related Threads...
Thread Author Replies Views Last Post
  HP50g: Writing a function that returns a function Chris de Castro 2 711 12-10-2013, 06:49 PM
Last Post: Han
  Possible bug with sqrt function in the HP prime Michael de Estrada 6 820 11-15-2013, 12:49 PM
Last Post: Michael de Estrada
  wp34s binomial bug Andrew Nikitin 4 584 09-22-2013, 05:20 PM
Last Post: Paul Dale
  Expon bug in wp34s Andrew Nikitin 7 852 07-14-2013, 03:23 AM
Last Post: Marcus von Cube, Germany
  Inverse binomial Richard Berler 7 725 07-09-2013, 06:23 AM
Last Post: Marcus von Cube, Germany
  another wp34s bug Andrew Nikitin 8 914 06-26-2013, 01:01 AM
Last Post: Paul Dale
  weird statistics bug in wp34s Andrew Nikitin 5 811 06-20-2013, 01:54 PM
Last Post: Namir
  SandMath routine of the week: Inverse Gamma Function Ángel Martin 39 3,198 03-24-2013, 08:19 AM
Last Post: peacecalc
  [WP34s] RSD function Dieter 11 1,175 01-29-2013, 08:58 AM
Last Post: Walter B
  [WP34S] A funny bug in Pi (prod) Eduardo Duenez 3 597 01-28-2013, 03:41 AM
Last Post: Walter B

Forum Jump: