▼
Posts: 1,368
Threads: 212
Joined: Dec 2006
The XROM based function is perceptibly slower, which is a fair trade given the huge gains made in extra flash memory by getting this and other functions out of the native code.
However, in testing numerous arguments, I stumbled across one that creates a hang, for no good reasonnamely, exactly p=0.0025, as well as its complement p=0.9975.
Running the function on either of these probabilities makes the calc hang, requiring a reset. However, adding even the smallest perturbationsay 1e16allows function to converge without difficulty.
Weird. Any idea why these particular values prove to be so disagreeable? Also, for what it's worth, I have written my own inverse normal routine that is at least as fast if not faster than the present builtin function, and which handles this quirky probability without problem.
I wonder if the inverse normal code got broken or at least bruised in its move to XROM?
Les
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
No idea what is wrong here but I can definitely reproduce it. Will look into it later.
I'd be interested in your inverse normal implementation if it can handle the full range of values and gives a sufficient number of digits of accurate result.
e.g. what does it do given an argument of 1e350 ? .999999999999 ? 0.5 ?
 Pauli
▼
Posts: 1,368
Threads: 212
Joined: Dec 2006
Quote:
No idea what is wrong here but I can definitely reproduce it. Will look into it later.
I'd be interested in your inverse normal implementation if it can handle the full range of values and gives a sufficient number of digits of accurate result.
e.g. what does it do given an argument of 1e350 ? .999999999999 ? 0.5 ?
No and yes. I haven't included a test that reflects the sign of the result for arguments above 0.5, but that is no problem for my purposes. It handles the first and third examples you give very well. The second I would reflect to 1e12 and switch the sign.
It is basically a stripped down version of your own routineI get the initial estimate using the formulae Dieter provided, and simply do three successive Halley refinements against the calc's Phi(x) whether I need them or not. No convergence tests. As you know, two Halley refinements give at least 18 digits (in theory), and three give at least 54 in an environment of adequate intermediate precision. More than enough for the calc's SP and DP. For SP it is overkill, but it is still fast enough for me. I suspect I lose a digit here and there due to rounding, as I can't take advantage of the full 39 internal digits, but it really was a fruitful programming exercise for me.
Here is my listingforgive me for not commenting on it due to being rushed:
0001 ****LBL'INN'
0002 LocR 005
0003 # [pi]
0004 STO+ X
0005 [sqrt]
0006 STO .00
0007 R[v]
0008 # 022
0009 # 100
0010 /
0011 x[<>] Y
0012 STO .02
0013 x>? Y
0014 GTO 001
0015 LN
0016 STO+ X
0017 +/
0018 STO .03
0019 DEC X
0020 [sqrt]
0021 RCL[times] .00
0022 RCL[times] .02
0023 LN
0024 STO+ X
0025 +/
0026 [sqrt]
0027 # 005
0028 RCL[times] .03
0029 1/x
0030 +
0031 GTO 002
0032 **LBL 001
0033 # 002
0034 1/x
0035 RCL Y
0036 RCL[times] .00
0037 FILL
0038 STO[times] X
0039 # 006
0040 /
0041 INC X
0042 [times]
0043 **LBL 002
0044 +/
0045 STO .04
0046 # 003
0047 STO .01
0048 R[v]
0049 **LBL 003
0050 [PHI](x)
0051 RCL .02
0052 RCL[times] .00
0053 RCL .04
0054 RCL[times] .04
0055 # 002
0056 /
0057 e[^x]
0058 [times]
0059 FILL
0060 RCL[times] .04
0061 # 002
0062 /
0063 INC X
0064 /
0065 STO .04
0066 RCL .04
0067 DSZ .01
0068 GTO 003
0069 RTN
0070 END
My sense is that you would do well in your XROM version just to do three Halley refinements and not bother with tests of convergence. It may seem overkill, but I bet it will speed things up.
Les
▼
Posts: 653
Threads: 26
Joined: Aug 2010
Ah, the Normal quantile again. :) I would like to add some remarks that might help a bit here.
The applied algorithm itself is quite stable  as long as there is sufficient precision both in the CDF and, even more important, while evaluating the term
CDF(x)  p
t = 
PDF(x)
This expression is used both in the Halley method as well in another approach I tend to prefer. The more x approaches the true quantile, i.e. the better CDF(x) matches p, the more important this point gets as the number of significant digits in the difference decreases. Another problem is the fact that, at least for x < 1, there are multiple ndigit values for x that will return the same ndigit CDF, so the nominator may become zero even if x is not yet exact.
My current solution (used in a 41 and 35s) works as follows:
 Provide a better initial estimate with an error order of 0,001.
 Improve this estimate with one single correction step, which leads to a result with about 11 valid digits for p > 1E100, i.e. for the whole working range of the classic 10digit machines.
 One more iteration should return more than 20 valid digits  I will yet have to try this in an environment with sufficient presicion. Maybe a Mathematica user can do this.
 Calculators with wider working range (down to 1E500) require one more iteration step  or a slight adjustment of the intial estimate.
And here are the details. As always, p < 0,5 and x > 0,
i.e. p = 0,1 => x = +1,28155...
Initial estimate, using a very simple rational approximation:
r = sqrt(2 * ln(p))
x = r  (0,394r + 2,366) / ((0,0728r + 1,108)r + 1)
if x < 0,003 then x = 0 ' improves numeric stability close to 0
Refinement:
t = (CDF(x)  p) / PDF(x) ' CDF here is to the right hand integral !
x = x + t + x/2 * t^2 + (2*x^2+1)/6 * t^3
The refinement is based on the method suggested in Abramovitz & Stegun, p. 954. I think it works very well. ;)
I originally used this method for the 41C with its 10digit precision and a range down to 1E99. With at least 12 digits working accuracy, one single iteration step should provide 11 valid digits +/ 1 ULP, i.e. it should be fine for a machine code routine that carries 13 internal digits before it returns an exact 10digit result to the user. During my tests in Excel (15digit binary arithmetics) I did not notice any error beyond +0 and 4 units in the 12th significant digit.
For calculators with a working range down to 1E500 a different rational approximation for the initial estimate may be used, or the latter gets a slight adjustment:
if x > 21,3 then x = x  exp(x/21,3  7,7)
Alternatively, one or two more iterations are fine, too  and very fast as well since the PDF is evaluated very quickly far out in the distribution tails.
The desired 16digit accuracy for the 34s should be no problem with just two iterations, or three in double precision mode. That's why I agree that a fixed number of iterations may avoid the problem of oscillating approximations while the desired accuracy is still achieved. On the other hand I would recommend a different way of calculating the quotient t. For instance, the ratio CDF(x)/PDF(x) usually shows up during the calculation of the CDF (by series or continued fraction), so this might be used for an alternative way to calculate t.
Dieter
Posts: 3,229
Threads: 42
Joined: Jul 2006
Quote: I suspect I lose a digit here and there due to rounding, as I can't take advantage of the full 39 internal digits, but it really was a fruitful programming exercise for me.
This is fundamentally the problem. Moving these functions to xrom means I've lost five digits of working precision, their accuracy must suffer because of this. It clearly suffered more than I'd thought :(
 Pauli
Posts: 3,229
Threads: 42
Joined: Jul 2006
I know the problem. The iterative process to refine the result is oscillating between two values:
2.807033768343804117221810394712332
2.807033768343804117221810394712293
The old inverse normal distribution code carried extra digits and this wouldn't have been an issue.
Now to figure out a fix.
 Pauli
▼
Posts: 1,368
Threads: 212
Joined: Dec 2006
Quote:
I know the problem. The iterative process to refine the result is oscillating between two values:
2.807033768343804117221810394712332
2.807033768343804117221810394712293
The last five digits of the true result, properly rounded, would be 12269.
Your fixed version goes with the former of the above two choices, 12332.
My code, as listed above, gets a little closer when run in DP, with the last five being 12308.
More than close enough for real life :)
Les
▼
Posts: 3,283
Threads: 104
Joined: Jul 2005
A probable solution (instead of just relaxing the convergence criteria) could be to check for a cycle (keep two or three guesses and check for duplicates) and if one is found, disturb the value by one or a few ULPs, and try again. Do any iteration only a finite number of times.
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
I'm thinking using the same solver as the other distributions might work better here. It handles these cycles cases (switching to a bisection instead) but is going to be a bit slower.
 Pauli
Posts: 653
Threads: 26
Joined: Aug 2010
Les, I think it's simply impossible to get a result with n exact digits as long as the working precision here is just the same n digits. ;) Try it with a simple 10digit machine  if you get 9 exact digits +/ 1 ULP you've achieved the best that can be done. That's why there are guard digits. ;)
So if the working precision is limited to 34 digits, a result with 3032 valid digits is fine. And that's what we get here  simply round the result to 32 digits and we're done:
x = 2.8070 33768 34380 41172 21810 39471 23
Dieter
Posts: 653
Threads: 26
Joined: Aug 2010
Just for the record: I did a quickanddirty implementation of the algorithm I proposed earlier in this thread, and I tried the given example p = 0,0025. Et voila  after just two iterations in double precision mode...
Quote:
The last five digits of the true result, properly rounded, would be 12269.
...this is exactly what I got:
x = 2,8070 33768 34380 41172 21810 39471 2269
This does not change in further iterations. And yes, I know, this example does not prove anything. #)
Dieter
▼
Posts: 3,283
Threads: 104
Joined: Jul 2005
Pauli will be listening, I'm sure!
We should get the normal distribution right because it's a cornerstone function. If we do get the same level of accuracy out of the other distributions, fine, if not, just force a rounding to 30 digits (in double precision mode only, of course!).
Dieter, your knowledge about the numerical algorithms for the probability functions is outstanding! Where does it come from? Are you a mathematician or just a genius? ;)
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
I'm listening. If I get keen enough, I'll give this method a trial  it will cost more bytes in flash but will be faster and likely at least as accurate. At the moment, I'm a bit burned out over coding statistical functions as keystroke programs.
My revised fix for this, using the guarded Newton solver used by the other distributions, produces:
2.807033768343804117221810394712233
Slightly better but not perfect. I doubt we'll be able to get much better without carrying extra digits.
I'm not in favour of rounding results unnecessarily  it will generally make them less accurate although predictably so.
 Pauli
Posts: 1,368
Threads: 212
Joined: Dec 2006
Quote:
We should get the normal distribution right because it's a cornerstone function. If we do get the same level of accuracy out of the other distributions, fine, if not, just force a rounding to 30 digits (in double precision mode only, of course!).
In any case, I think it will be wise to heed Dieter's warnings and be especially attentive to the behaviour around p = 0.5, where most digit loss is bound to occur due to that phenomenon of more than one ndigit quantile estimate associated with the same ndigit probability. Indeed, the distributionformulas readme accounts discusses this in one of the suggested approaches, and does something special when p is in the interval (.495, .505).
A savvy user who is really obsessed with high accuracy (like me, even though that serves little practical purpose :) ), will be aware of the relationship between the inverse normal and the inverse chisquare with one df. Basically, for p < .5:
Phi(1)(p) = Sqrt(Chi^2INV(1df)(12p))
For example, for p = 0.4999, the Phi(1)(x) builtin function gives:
2.506628300880350989206501054077314 e4
In comparison, passing 0.0002 to Chi^2INV(x) with 1 in register J and taking the negative square root yields:
2.506628300880350989206501054078343 e4
The direct method rounds to 30 good digits. Using the inverse chisquare intermediary yields ALL 34 digits correctly.
About the worst example I have been able to find is p = .4999999, where the builtin function gives:
2.506628274631026751765674822177285 e7
The inverse chisquare path gives:
2.506628274631026751765674822754539 e7
Again, the latter gives full 34digit accuracy. The former is only 28 digits to 1 ULP, or 29 digits to 6 ULP.
Just food for thought.
▼
Posts: 653
Threads: 26
Joined: Aug 2010
Hmhmhmhmhmmmm.... ;)
I tried your examples with the emulator. It's a somewhat older version (3.0 2294). Here is what I got with the internal function, i.e. based on 39 digit internal precision:
p = 0,49
z = 2,5068 90825 87110 35762 36343 18346 9042 E2 exact
p = 0,499
z = 2.5066 30899 57176 40053 58657 06701 9588 E3 exact
p = 0,4999
z = 2,5066 28300 88035 09892 06501 05407 8343 E4 exact
p = 0,49999
z = 2,5066 28274 89349 40015 68864 28261 7674 E5 exact
p = 0,49999 9
z = 2,5066 28274 63362 54374 06724 79396 8633 E6 +0,7 ULP
p = 0,49999 99
z = 2,5066 28274 63102 67517 65674 82275 4537 E7 2 ULP
p = 0,49999 999
z = 2,5066 28274 63100 07649 09264 38018 4708 E8 57 ULP
p = 0,49999 99999
z = 2,5066 28274 63100 05024 42014 63472 0583 E10 exact
p = 0,49999 99999 99
z = 2,5066 28274 63100 05024 15767 90974 6036 E12 exact
p = 0,49999 99999 9999
z = 2,5066 28274 63100 05024 15765 28507 3539 E14 exact
p = 0,49999 99999 99999 9
z = 2,5066 28274 63100 05024 15765 28481 1072 E16 exact
So the internal function seems to work quite well (with the one exception at 0,49999 999). I also compared the results to those obtained with the Chisquare method, but the latter often showed larger errors. For example:
p = 0,49999 999
Exact 2,5066 28274 63100 07649 09264 38018 4765 E8
Normal 2,5066 28274 63100 07649 09264 38018 4708 E8 57 ULP
Chi^2 2,5066 28274 63100 07649 09264 38018 2296 E8 2469 ULP
What's going on here? Why do we get different results? Is it because my version still uses the internal 39digit precision and yours is based on 34digit XROM code?
Dieter
Edited: 4 May 2012, 3:10 p.m.
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
Most likely the extra digits being carried makes for the difference. Five bonus digits is a lot.
There is also a small change to the Newton search which might also make a difference. The older code did some range limiting on Newton steps within the known interval of the solution) and reverted to bisection if things didn't seem to be converging fast enough, whereas the newer code doesn't do this. Bisection now only happens if the Newton step would go outside the known interval of convergence.
The idea was to try to trap cases when convergence via Newton steps is very slow because each step only moved a small amount.
 Pauli
▼
Posts: 3,283
Threads: 104
Joined: Jul 2005
Are phi(x) and PHI(x) exact to 34 significant digits? If not, it might help to leave them in as C code to improve the accuracy. I've no plans to implement quad precision in XROM. ;)
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
phi(x) will be close. PHI(x) several digits off. PHI^1(x) even less so.
Pushing these back into C is always an option.
 Pauli
▼
Posts: 653
Threads: 26
Joined: Aug 2010
Pauli,
I've been thinking a bit about the possible accuracy of the Normal quantile function. The 39digit Ccoded internal function seems to work fine for most cases, but there are a few flaws here and there, especially close to p = 0.5 resp. z = 0. In double precision user code (34 digits working precision) the result may even be wrong in the last six (!) digits.
But we can do better.
 The easier part are quantiles > 1. Varying these by one ULP leads to a change in several digits in the corresponding probability (assuming the same number of significant digits).
 The crucial region is 0 < quantile < 1. Here, multiple ndigit quantiles exist that map to the same ndigit probability. This also leads to digit cancellation as the difference between CDF(x)  p is evaluated. Here is an example:
Probability p = 0.49999 99
Initial guess z = 2.5066 28274 63102 67517 65674 82217 7285 E7
Corresponding CDF = 0.49999 99000 00000 00000 00000 00000 0000
Great  the initial guess seems to be exact. In fact, it is just the first two terms of the series expansion of the Normal quantile. The guess is exact to 34 digits if p is within 0.5 +/ 2E9. However, p here is beyond this limit so the quantile cannot be exact. It just happens to return the desired probability. Actually, it has only 28 correct digits. But the present algorithm is not able to improve this result as the correction term will evaluate to zero.
Another example:
Probability p = 0.49999
Initial guess z = 2.5066 28274 89349 40015 11138 94721 5889 E5
Corresponding CDF = 0.49999 00000 00000 00000 00002 30290 7693
So the CDF for the initial guess differs from p only in the last 10 digits. However, the guess has just 20 correct digits, so we cannot expect more than 30 valid digits in the final result.
 The closer p gets to 0.5 the more digits get cancelled. Very close to p = 0.5 the initial guess already is exact (as explained above), so the mentioned digit cancellation does not matter. Also p < 0.4 usually has 32 or 33 correct digits. But between these points the effective accuracy can drop down to 28 digits. For instance, at p = 0.49995 the initial estimate already has 21 correct digits, but due to digit cancellation the correction term carries just 7 more.
 Finally I tried a simple solution: the CDF for small x is evaluated by the wellknown power series. Finally, the accumulated sum is multiplied with the PDF and 0.5 added to this product. So I wrote my own CDF routine on the 34s and simply omitted the final addition of 0.5. In fact, it is possible to write CDF(x) as well as CDF(x)0,5 and the term (CDF(x)p)/PDF(x) in a way that avoids excessive digit cancellation. In the first example this yields six significant digits in CDF(x)p so that the final result is nearly exact (+1 ULP). This technique is possible both for low values of x (power series) as well as the continued fraction method that is applied for large x.
I did it this way. Please note that a and b are global variables.
function CDF(x)
if x < 2.326 'here p is 0.01
a = (sum of powers loop)
b = 0.5
else
a = (continued fraction loop)
b = 0
end if
return CDF = a*pdf(x) + b
end function
function quantile(p)
x = guess(p)
repeat this...
call CDF routine
t = (b  p)/pdf(x) + a ' this equals (CDF(x)p)/pdf(x)
evaluate correction term dx(t)
x = x + dx
...two or three times
return x
end function
Alternatively, the loops for the power series and the continued fraction may be coded as subroutines that can be called both by CDF(x) and quantile(p). In this case, the subroutine call has to return both a and b.
 The method seems to work. I tried the list of p values close to 0.5 in one of the previous posts, where the internal 39digit 34s function returned values that were mostly exact, but sometimes off by as much as 57 ULP. With just 34 digit working precision in the user code program I tested, I did not notice any errors larger than 12 ULP for all test cases. The largest error I observed with random test values was about 1 unit in the 33th significant digit. Which essentially is as good as it gets when a 34digit machine is all you have.
That's why I see two options for the desired 34digit accuracy for the Normal quantile:
 If the function has to be written in user code (XROM), we need an additional function that returns CDF(x)0.5. This method is good for about +/ 1 unit in the 33th digit.
 "Push it back to C", as you said, while using the same approach, and we'll get 45 more digits. This sums up to roughly 37 valid digits.
Finally we should also take a look at the CDF routine. I think it has to be coded with 39digit accuracy (i.e. in C).
Dieter
Edited: 8 May 2012, 7:40 p.m.
▼
Posts: 1,368
Threads: 212
Joined: Dec 2006
Dieter, can you take a closer look? It seems that your results for "Corresponding CDF" actually give more 9s than that specified in the desired probability, so I think there are typos here. If there are not, then I don't follow your analysis...
▼
Posts: 653
Threads: 26
Joined: Aug 2010
Yes, sorry, the returned CDFs originally had a few nines too much here and there. #)
I noticed this a few minutes before your reply, it should be correct now.
Dieter
Posts: 1,368
Threads: 212
Joined: Dec 2006
Never mind! According to the time stamps, you fixed the errors just minutes before I posted... ;)
Posts: 1,368
Threads: 212
Joined: Dec 2006
Dieter, this is excellent! I can't wait to find the time to code it myself.
Just to clarify things for myselfit seems you are dealing actually with the upper tail probabilities and confining your code to positive quantiles and probabilities less than 0.5, so according to convention we are talking about q, not p. I think this is really important to observe in the final code. If you want to make yourself cringe, compute, in double precision, inverse Phi for unity less 1 ULP (i.e, 1 ENTER X.FCN>ULP ). The result is good only to 3 or 4 digits. However, if you compute the quantile associated with the complement 1e33, the result is almost exact to 34 digits (but with opposite sign, of course). Basically, the ultimate qf code really needs to take the complement of probabilities above 0.5, use that for the routine, and flip the sign.
I also really think that the refinement should go back to Halley's method here, since in this case it is just so simple. Using your variables, dx(t) = t/(1xt/2). What could be easier?
Les
Edited: 8 May 2012, 10:59 p.m.
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
Quote: Dieter, this is excellent! I can't wait to find the time to code it myself.
Care to code it for xrom inclusion?
 Pauli
Posts: 3,229
Threads: 42
Joined: Jul 2006
Quote: Basically, the ultimate qf code really needs to take the complement of probabilities above 0.5, use that for the routine, and flip the sign.
Or support both p and q for the normal and cross call between them as required.
 Pauli
▼
Posts: 653
Threads: 26
Joined: Aug 2010
I thought this already got implemented a long time ago?! Did something change here?
To ensure the best possible accuracy, both the CDF and the QF should work the way I suggested in my reply to Les. Please see below.
Dieter
Posts: 653
Threads: 26
Joined: Aug 2010
Les, all algorithms I proposed for the Normal CDF and its inverse use the same convention: x >= 0 and p (or q) < 0,5. So, actually only the right half of the function (x is nonnegative) is considered, and the upper tail integral (from x to infinity) is evaluated. This means:
 The CDF routine generally uses abs(x) as its argument and finally returns q or 1q, depending on the sign of x.
 The QF works on min(p, 1p) and calculates a positive quantile. Finally the sign of the result is set, depending on p > or < 0.5.
This can be done by multiplying the result with sign(p0.5).
Of course this also addresses the accuracy issue you mentioned. The (lower tail) CDF for e.g. x = +6 is calculated by evaluating the upper tail integral (usually designated Q) and then returning 1 minus this value. In other words: the routine first evaluates Q(6)= 9,86587645...E10 to 16 (34, 39...) digits and then returns 1 minus that = 0,9999999990134123. So if Q has 34 valid digits, the returned P(hi) in this example could have 34 + 9 = 43 valid digits  if the 34s would allow it. ;)
In other words: at least my own CDF and quantile routines already do what you suggested. So the example you gave (quantile for 1  1E33) actually is evaluated as quantile(1E33). I also proposed this about a year ago when the distribution functions were discussed here. I assumed this went into the actual implementation, but to my astonishment this obviously is not the case. Pauli, did something change here? Coded correctly, quantile(p) and quantile(1p) should only differ in their signs, while they otherwise agree in all digits. So, yes, this idea of "the ultimate QF code" has been proposed. I just wonder why it is not implemented, at least not in the current version.
Regarding Halley's method: yes, it converges quite fast, but there is an even better approach (cf. Abramowitz & Stegun):
Halley: dx = t / (1  x*t/2)
For small x*t/2 this approaches
dx ~ t * (1 + x*t/2)
= t + t^2*x/2
Now the method I proposed adds one more term to this:
A & S: dx = t + t^2*x/2 + t^3*(2*x^2+1)/6
This approach converges a bit faster, and there is another advantage as well: for x = 0 it is equivalent to the sum of the first terms of the Normal quantile's series expansion. I used this nice feature here and there to provide exact results close to x=0 with the initial guess set to zero.
By the way, my current double precision version for the 34s starts with the good old Normal estimate we discussed some time ago. So the rational approximation I recently suggested is not used here. This is because of the extremely wide working range in double precision mode that goes even below 1E6000. The rational approximation works fine down to 1E400, but farther out in the tails the usual three refinements would not suffice. That's why I reverted to the previous method, which is approximately asymptotic as p approaches zero. I just added a slight improvement: the term 0,2/u resp. 1/5u was replaced by 1/4u. The switch between the center and tail approximation is between 0,232 and 0,233, limiting the maximum absolute error to slightly less than 0,01. :)
Now I am looking at the CDF algorithm for large x. Currently I am using the internal function. The continued fraction loop is evaluated best (and fastest) from the right to the left, so the number of loops n has to be known beforehand. Currently this is done by an estimate for n I proposed last year. However, this has not been tested for 34+ digit precision. I also thought about the Lentz method so that the continued fraction can be evaluated from left to right until convergence is reached, but I feel this is way too slow here.
Dieter
Edited: 9 May 2012, 7:37 a.m.
▼
Posts: 1,368
Threads: 212
Joined: Dec 2006
I use the Lentz method in my own adaptation of the rightsided incomplete gamma, and I do agree that when I compute erfc or the uppertail normal as a special case of incomplete gamma it seems slower than it needs to be. Also, because the arguments have to be manipulated before being passed to my incomplete gamma routine, that's another potential source of digit loss due to rounding.
You aspire to full internal 39digit accuracy of the CDF (or, conversely, the uppertail probability), but this is impossible with only 39 digits since, even if every single computation returned 39 perfect digits and there was no loss due to rounding there is the matter of subtraction error at the threshold where one moves from the series expansion to the continued fraction. You have chosen 2.326 as your break point, and I think this is a good one, since that is where the upper tail probability is is 0.01, but even as you approach that you will lose at least one digit to subtraction of the series result from 0.5 to yield the uppertailed probability. If you make the threshold lower, digit loss is reduced, but the continued fraction gets much more sluggish as moresometimes many moreterms are needed and that means more computation and accumulated rounding errors. The existing code uses a threshold of roughly 3.15. The series does quite well up there, and beyond it the continued fraction is very fast. However, the subtraction error is at least two or three digits lost when we subtract the series result from 0.5 to get the upper tail probability.
If we push the normal CDF and uppertail probability and PDF back to C code, 34 digits returned to the stack should be easy to attain, but if we insist on 39 digits internally the number type used will have to introduce some more guard digits. That subtraction error is a bugger, but your use of the 2.326 threshold is probably the best compromise between speed and accuracy. Whether due to subtraction or rounding these calculations will not across the range give 39 digits of internal precision all the time unless working precision is more. However, the 34 digits we see on the stack in DP should be easy.
Les
Les
▼
Posts: 653
Threads: 26
Joined: Aug 2010
Les, I know that it is not possible to get 39digit accuracy on a 39digit machine. This was already pointed out in several earlier posts, so on my 12digit 35s I am happy with 11 valid digits (+/ 1 unit) in the final result. That's why, on the 34s, I would like to see the CDF (and, if possible, also the QF) implemented in an internal function with 39digit working precision  to achieve a final result with 34digit accuracy. ;)
Yes, switching from the series expansion to the CF method at x = 2,326 will sacrifice one digit. The ideal break point would be x = 1,28155... where Q(x) is 0,1. But this would slow down the calculation significantly. That lost digit is also the reason why I think the CDF (and quantile) should be coded in C with 39digit precision (again: in order to get at least 34 valid digits in the result  we will not get all 39 anyway). I think we agree that the current break point at Pi or 3,15 is not perfect if the goal is maximum accuracy. This point is even beyond x = 3,09023 (Q(x) = 0,001) so we will lose even four digits, if I get it right. Maybe Pauli can change the code to switch at least x = 3 (will save one digit) or, even better, 2,326 (two more).
On a side note, choosing 2,326 as the break point was the obvious option for my earlier 35s programs, since near this value the execution times for the series expansion and the continued fraction were roughly the same. ;) And losing one digit was okay for me since a 12digit machine will not produce a 12digit result anyway. #) It was and is exactly what you said: a compromise between speed and accuracy.
In general, I think we both see the same problems as far as numeric accuracy is concerned. I suggested the one or other method to overcome these. If we assume that on an ndigit machine we may achieve about n2 valid digits in the computed quantile (provided the CDF is exact!) this again leads to my initial suggestion: let's have these two key functions coded in C so that the result will have 34 safe digits. I just see that's exactly what you wrote in your last sentence. ;)
So we seem to agree in all major points on the numeric side. Now let's try to convince Pauli to move these two essential functions back to C. ;)
Dieter
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
Changing the break point in the xrom implementation is easy and cheap. Changing it in C is possible but requires more memory for the task. At the moment we've got the memory free to make some improvement to this function.
Quote: So we seem to agree in all major points on the numeric side. Now let's try to convince Pauli to move these two essential functions back to C. ;)
I'm happy to do some relocating. What functions exactly? The standard normal functions as they previously existed are 856 bytes in size  minus the space they currently use. This is probably a bit on the large size when the current functions are returning 16 digit accurate values albeit rather slowly.
I could do an internal normal CDF function that returns the pdf, the cdf and the 'b' value you mentioned previous (or the 'a' value instead of the cdf).
The QF could then be keystroke programmed.
 Pauli
Posts: 3,229
Threads: 42
Joined: Jul 2006
I can use more than 39 digits internally too  some complex operations are performed carrying 72 digits, the modulo two pi carries a completely insane number of digits and the standard remainder operation uses higher precision due to a flaw in the decNumber library.
 Pauli
Posts: 3,229
Threads: 42
Joined: Jul 2006
I've moved the normal pdf back into C.
I've added a C normal cdf helper function (cdfqh).
It takes x in X and returns the pdf in X, 'a' in Y and 'b' in Z. Registers Y and Z are destroyed and last X is not set  this is a very internal function.
After executing cdfqh, you should execute * + to get the lower tail CDF result.
Better is the sequence cdfqh 1 CPX_* DROP gets the same result but with an atomic multiplyadd instruction that handles double precision results without loss of accuracy.
That's most of our flash headroom gone now :(
Now for the quantile function done better.
 Pauli
Edited: 10 May 2012, 12:49 a.m.
▼
Posts: 3,283
Threads: 104
Joined: Jul 2005
Pauli, is this code conditionally compiled? Not everybody needs 34 digits for his statistical calculations.
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
No it isn't a conditional include. This replacement code will be significantly faster than the existing xrom code and that justifies it place enough for me. The conversion to xrom slowed all the statistical code down a bit, the normal distribution suffered this by far the worst. More importantly, I don't want to have to support dual versions of the same thing :)
The quantile function won't be 34 digits accurate  that simply isn't achievable from xrom.
 Pauli
Posts: 1,368
Threads: 212
Joined: Dec 2006
For Phi(x) and Phisubu(x), the constituent parts from the cdfqh helper function are returned to the stack and lose their guard digits, so there still is indeed the likelihood that things will be off a little often in the 34th digitnot the full 34digit accuracy I know Dieter was hoping for. But, at least below the new threshold of x < 2.326, things seem fast and really good.
But there seems to be a more serious challenge. You have implemented the new cutoff as noted. The last four digits of of the 34digit Phiupper(2.326  1ULP) result are 7055, whereas the true result properly rounded is ends with 7050. Excellent 34digit precision considering that the last few steps of the computation are done in 34digit working precision, not 39.
But when we get into continued fraction territory, there be wonkiness. The calculator's 34digit output for Phiupper(2.326+1ULP) ends with 3078437, whereas the true last seven digits are 0377044.
The may be evidence that the formula used to compute the number of terms taken with the CF needs to be reconsidered, since in comparing a calc with the newest firmware and one I haven't yet reflashed it seems this is old news. For example, when I compute the uppertail Phi(x) for 3.2, an argument well past the OLD threshold of Pi, the 34digit output is "wrong" in the last 8 digits in both 2965 and 2975. Moreover, that "wrongness" agrees in all of those digits except the last. On the other hand, my slower and less efficient routine that computes the uppertail probability for such larger arguments as a special case of the rightsided incomplete gamma function (my own routine) gets 32 of the 34 digits correct BOTH with the threshold examples and the argument of 3.2.
I am sheepish I didn't pick up on this before, but I took for granted that the CDF and upper CDF were pretty solid in double precision whether in Ccode or XROM. It may be worth revisiting whether a modified Lentz computation is worthwhile after all. Keep in mind the calculator's code already uses this algorithm for the betacf and gcf subroutines. Phi and Phiupper are really just special cases of the left and right gamma function respectively, so maybe we should forgo a dedicated brute force right to left computation of the continued fraction and use the modified Lentz code that is there? I admit this would mean another bytegobbling constant to pass to gcf() as the gln parameter (ln(sqrt(Pi)) = ln(Pi)/2 = 0.572364942924700087071713675676529355824), but if we ditch the dedicated CF code, would it be worth it? My user code of the rightsided incomplete gamma is perceptibly slower, but if I am getting 32 or 34 digits in a 34digit environment, there is something right about this alternative.
Just my 3 euros...
Les
Edited: 10 May 2012, 3:12 a.m. after one or more responses were posted
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
There are quite possibly errors in the code still :(
I started using gammap for the normal cdf but cancellation limited the number of correct digits at times  this is what Dieter first picked up on.
I really need to implement the right sides of the incomplete gamma and beta functions and use them for the other distributions (the upper tails for F, chi^2, binomial and Poisson are pathetic at present).
User code for both sides of the incomplete beta and gamma would probably make things space neutral  xrom seems to be about half the size of C for this kind of thing. Speed has also been subservient to accuracy during the development of the 34S.
Like I said previously, my motivation for statistical distributions is low at the moment.
 Pauli
PS: the upper tail CDF's were introduced this past week  so you couldn't have taken them for granted.
▼
Posts: 1,368
Threads: 212
Joined: Dec 2006
Quote:
PS: the upper tail CDF's were introduced this past week  so you couldn't have taken them for granted.
Ah, but in a fashion it has always been there for the normal CDF. I have simply flipped the sign on the argument to get the upper CDF.
Les
Edited: 10 May 2012, 4:24 a.m.
Posts: 653
Threads: 26
Joined: Aug 2010
Les wrote:
Quote:
For Phi(x) and Phisubu(x), the constituent parts from the cdfqh helper function are returned to the stack and lose their guard digits, so there still is indeed the likelihood that things will be off a little often in the 34th digitnot the full 34digit accuracy I know Dieter was hoping for.
I have not tried the new version yet (all I have is the emulator) but if it's not worse than my user code solution, we should get close to 34 digits. I think an error within 1 unit of the 33th digit is feasible.
Quote:
The last four digits of of the 34digit Phiupper(2.326  1ULP) result are 7055, whereas the true result properly rounded is ends with 7050. Excellent 34digit precision considering that the last few steps of the computation are done in 34digit working precision, not 39.
But when we get into continued fraction territory, there be wonkiness. The calculator's 34digit output for Phiupper(2.326+1ULP) ends with 3078437, whereas the true last seven digits are 0377044.
I tried these examples with an older emulator version (39digit C) and in all cases the exact 34digit value was returned:
2,326 [ENTER] ULP [] => 0,010009 27534 08676 67260 76355 29037 7050
2,326 => 0,010009 27534 08676 67260 76355 29037 7047
2,326 [ENTER] ULP [+] => 0,010009 27534 08676 67260 76355 29037 7044
That's why I prefer the CDF in C.
Then I tried the same values examples with my own 34s user code implementation, and this is what I got:
2,326 [ENTER] ULP [] => 0,010009 27534 08676 67260 76355 29037 7049
2,326 => 0,010009 27534 08676 67260 76355 29037 7047
2,326 [ENTER] ULP [+] => 0,010009 27534 08676 67260 76355 29037 7044
The true values are
Phi(2,326  1E33) = 0,010009 27534 08676 67260 76355 29037 70496 885...
Phi(2,326) = 0,010009 27534 08676 67260 76355 29037 70470 211...
Phi(2,326 + 1E33) = 0,010009 27534 08676 67260 76355 29037 70443 537...
So in all cases the results are correct within 1 unit of the 34th digit.
Quote:
The may be evidence that the formula used to compute the number of terms taken with the CF needs to be reconsidered
Yes, definitely. As I said before, the original estimate for the number of loops has been tested only for up to 16 digit accuracy. According to the original formula, the nominator in the estimate for the required number of loops would be something like 500 here, i.e. n = 500/(x1) + a few to be safe. In fact it seems to me (just tried a few examples) that this number of loops is good for 34 digits, but with some error in the last place due to roundoff errors. We should be aware that the number of iterations must be as high as required, but it should be as low as possible to prevent excessive roundoff.
So I tried n = 400/(x1) + 8 (caveat  this has not been tested!) and I got the results you see above. I do not know how the number of iterations it determined currently, but I think I remember something like 200/(x1), i.e. roughly half of what I tried. Which means that we will not get 30+ digits at all. And that's what you see in the results you posted.
But I would even go one step further. I agree with Marcus in that not everyone needs 34digit accuracy here. My suggestion is: let's make the special functions that require quite a bit of execution time accuracydependent. In other words: they should be evaluated to 16+ digits if this is what is returned to the user, or full 39 digits if we are in double precision mode, or that function is called internally by another one that in turn has to return a 34+ digit result. There is no need to wait for a 34digit result if the user only gets 16 digits returned. The former would require about 4x the time of the latter. Compared to the current implementation this will roughly double (!) its execution speed, since for 16 digits around 100/(x1) loops (instead of 200/(x1)) are sufficent.
Dieter
▼
Posts: 1,368
Threads: 212
Joined: Dec 2006
Quote:
I tried these examples with an older emulator version (39digit C) and in all cases the exact 34digit value was returned:
Of course you would! In an older version you would be dealing with the series still for arguments around 2.325, since the cutoff was around Pi in that older code. I would expect the results would be pretty good, as you see. If you want to see how the continued fraction performed, you have to look at a larger argument. Try 3.15 or 3.2. I am sure you will see that there are not enough CF loops for 34 digits and you'll get 27 to 30 correct.
I got a nice email from Pauli asking for an actual code sample, and I will see what I can do. In the meantime, I am starting to wonder if my obsession with full double precision accuracy for these functions is reasonable or practical. Getting even sixteen digits, only 12 of which are routinely displayed is pretty much all most reasonable humans need!
Les
Edited: 10 May 2012, 3:31 p.m. after one or more responses were posted
▼
Posts: 3,283
Threads: 104
Joined: Jul 2005
Quote:
Is pretty much all most reasonable humans need!
I doubt we are reasonable humans. ;)
Posts: 653
Threads: 26
Joined: Aug 2010
Les, of course I expected the internal function to be exact. The point here was that the usercoded function also may return a result with 33 or maybe even 34 digits  provided the number of terms in the CF loop is sufficient. Obviously I did not express this clearly enough  sorry for that.
In the meantime I tried to determine how many loops it will take for a relative error below 5E35. So I wrote a short routine for Casio's Keisan service and set it to 38 digits working precision (sorry, no Mathematica or Maple here). Finally I tried to describe the relation between x and the required number of terms n, and here's the result:
n >= 12 + 940 / (x  1/3)^1,57
or alternatively
n >= 12 + 1000 / (x  0,3)^1,6
In both cases this gives a number of terms slightly higher than required (in a perfect calculator ;)). The formula works down to x = 1 so that it can even be used if the break point is set to 1,28155. Which is what I now use on the emulator  it is fast enough for 1000 terms in virtually no time. ;)
Edit: I also tried x > 3,14 with the emulator's function, and yes, Phi(3,2) has only 27 correct digits (when rounded). The user code function, based on the CF method with the number of terms calculated with the formula above, agrees with the true value in all 34 digits. Well, OK, the last one is 1 while it actually should get rounded up to 2. ;) I did some further tests and as far as I can tell, the error stays with a few units in the 34th digit. Until now, I did not notice any error exceeding 1 unit in the 33th digit. So if all this gets coded in C and 39 digit precision it looks like we can assume that all returned 34 digits should be exakt (within 1 ULP).
Dieter
Edited: 10 May 2012, 1:54 p.m. after one or more responses were posted
▼
Posts: 3,283
Threads: 104
Joined: Jul 2005
Dieter, I don't think we want to use exponentiation just to determine the number of loops so I implemented the old approach with 500/(x1)+8 in DP. I reduced the numbers to 150 & 4 for single precision but I didn't do any tests.
▼
Posts: 653
Threads: 26
Joined: Aug 2010
What? No exponents and powers ?)
But fine  then please try this:
n >= 300/(x1,5) + 8
This should be fine for 34 digits and x >= 2,326. It also does not overestimate the number of required loops as much as the 500/(x1) method does.
For 16digit accuracy something like 100/(x1) + 8 should be okay. But I have not tested this yet.
Edit: Here's my suggestion for 16digit accuracy and x >= 2,326:
n >= 85/(x1,3) + 4
Finally: this does not have to be coded with highprecision DecNumber functions. Even standard C single precision will do. ;)
Dieter
Edited: 10 May 2012, 5:45 p.m. after one or more responses were posted
▼
Posts: 3,283
Threads: 104
Joined: Jul 2005
Quote:
What? No exponents and powers ?)
They use LOGs which are are expensive. It's all about saving time, isn't it? I will try your formula but I cannot yet tell you when. Maybe Pauli is quicker. Can you try the latest incarnation and do some tests? It uses the 2.326 threshold. For SP, is 100/(x1)+8 better than 150/(x1)+4 ?
Edited: 10 May 2012, 2:38 p.m.
▼
Posts: 653
Threads: 26
Joined: Aug 2010
Quote:
They use LOGs which are are expensive. It's all about saving time, isn't it?
Sure. But an optimized n may save many useless loops. ;)
Quote:
For SP, is 100/(x1)+8 better than 150/(x1)+4 ?
The latter returns a loop count about 3040% high. Replace the 150 by 110 and you get a very good estimate that works fine down to x = 1,7 and is just a few counts higher than the required minimum.
I edited my last post to add another formula for 16digit precision. According to my tests this should give no relative errors larger than 3E17. Actually 110/(x1) + 4 (and round up via CEIL) works just as well. #)
You can try this yourself: Visit the Casio Keisan Website, go to the calculator and enter this:
n = int((110/(x1) + 4));
k = n;
summe = x;
do {
summe = k/summe + x;
k = k1;
}
while (k>0);
cdf = normalpd(x)/summe;
exc = normalcd(x);
print(cdf, exc, (cdfexc)/exc, n);
Press "Execute" and use "2 , 0.1 , 10" as your xvalue. This will generate a table with the approximated and exact PDF as well as the relative error and finally the number of iterations. :)
I will also try the latest version in the emulator. For me this is always a lot of guesswork since I am not sure which files need to be updated. #) My emulator version is 3.0 2725. Could you provide a list which files need to be replaced?
Dieter
▼
Posts: 3,283
Threads: 104
Joined: Jul 2005
Quote:
I will also try the latest version in the emulator. For me this is always a lot of guesswork since I am not sure which files need to be updated. #) My emulator version is 3.0 2725. Could you provide a list which files need to be replaced?
Generally, wp34sgui.exe and wp34slib.dat from the windows bin directory should be fine. You may need to remove any older wp34s.dat file. If you want to use the assembler, update the tools directory contents, except for the .exe files which aren't uptodate. You will need a working Perl installation.
The best way is to install SVN, check out the complete tree once and then just do an SVN update to become uptodate.
EDIT: I've put up a new release file. This is the easiest option.
Edited: 11 May 2012, 4:45 a.m.
Posts: 3,283
Threads: 104
Joined: Jul 2005
Quote:
Yes, definitely. As I said before, the original estimate for the number of loops has been tested only for up to 16 digit accuracy. According to the original formula, the nominator in the estimate for the required number of loops would be something like 500 here, i.e. n = 500/(x1) + a few to be safe. In fact it seems to me (just tried a few examples) that this number of loops is good for 34 digits, but with some error in the last place due to roundoff errors. We should be aware that the number of iterations must be as high as required, but it should be as low as possible to prevent excessive roundoff.
If I read Pauli's code correctly, the loop count is fixed at 500, not depending on X. It breaks out of the loop if the iteration does not change the value.
Edit: I misread the code, the 500 loop is not for the continued fraction but for the series expansion below the threshold. The CF count is something like 256/(x1). I might be able to make this dependent on the requested accuracy.
if (dn_lt(&absx, &const_2_326)) {
decNumberSquare(&x2, &absx);
decNumberCopy(&t, &absx);
decNumberCopy(&a, &absx);
decNumberCopy(&d, &const_3);
for (i=0;i<500; i++) {
dn_multiply(&u, &t, &x2);
dn_divide(&t, &u, &d);
dn_add(&u, &a, &t);
if (dn_eq(&u, &a))
break;
decNumberCopy(&a, &u);
dn_p2(&d, &d);
}
decNumberCopy(&b, &const_0_5);
if (decNumberIsNegative(&x))
dn_minus(&a, &a);
} else {
//dn_minus(&x2, &absx);
//n = ceil(5 + k / (x  1))
dn_m1(&b, &absx);
dn_divide(&t, &const_256, &b);
dn_add(&u, &t, &const_4);
decNumberCeil(&b, &u);
decNumberZero(&t);
do {
dn_add(&u, &x, &t);
dn_divide(&t, &b, &u);
dn_dec(&b);
} while (! dn_eq0(&b));
dn_add(&u, &t, &x);
decNumberRecip(&a, &u);
if (decNumberIsNegative(&a)) {
dn_minus(&a, &a);
decNumberZero(&b);
} else {
dn_1(&b);
dn_minus(&a, &a);
}
}
Edited: 10 May 2012, 12:58 p.m.
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
This is harder than you expect  the routine doesn't know the requested accuracy. It is called from xrom so the machine will always be in double precision so that mode flag isn't helpful.
The flag indicating the original user's single/double setting isn't adequate either since this code could be called from the QF routine that really does require some extra precision even though the user is running in single precision.
We'll need an out of band way to indicate this. i.e. use one of the other xrom local flags to indicate what precision to use if the user's mode is set to single.
Sigh.
 Pauli
▼
Posts: 3,283
Threads: 104
Joined: Jul 2005
You could just pass the required accuracy on the stack if necessary. We can always compute to an accuracy high enough even for nested calls in single precision. What is implemented now should be fine for all cases I can envision even if it may waste some cycles in single precision.
Edit: I've implemented Dieter's latest suggestions. Pauli, can you give an example when you need higher accuracy? This should be easy to implement. The nominator for SP is now 100 (instead of 83 as suggested by Dieter) which gives a little headroom.
Edited: 11 May 2012, 4:48 a.m.
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
More accuracy will be required when solving for the quantile function.
Possibly for erfc as well  I don't remember the transform from the normal cdf used there.
 Pauli
Posts: 653
Threads: 26
Joined: Aug 2010
I think we will need no extra headroom now. ;)
It's 3:45 a.m. in central Europe now  and I did some tests on the continued fraction approach for the Normal CDF and x > 2,326. This time I tried the Lentz algorithm in order to determine the exact number of terms required for a specific accuracy. This lead to a quite extensive table with exact data. And it turned out that the relation between x and the number of required terms can be described very precisely the following way, so that that we get the desired accuracy without significant overhead.
For single (16digit) precision I think it's nice to have one or two guard digits to make sure that the last digit is rounded correctly. So I set the tolerance to 1E18 to get 18 digits +/ 1 ULP. Which lead to...
n >= 480 / x^2 + 9
This matches almost exactly the number of required terms for the given accuracy. Only here and there the result is 1 or 2 terms lower, so that the relative error might increase to 1,2 E18 or something similar.
For double (34digit) precision I think it makes sense to simply use the full 39 internal digits. The last one or two may be slightly off anyway. With a target accuracy of 1E39 I got this:
n >= 2200 / x^2 + 20
Just far out in the tails where only very few terms are required  and processed in virtually no time  the calculated n is a bit high due to the finally added constant.
So I would definitely suggest that the number of terms for the Normal CDF should be set according to the respective formula above. They very exactly match the number of required terms for the desired accuracy while on the other hand no time is wasted for useless loops "just to be sure". ;)
Dieter
Posts: 3,229
Threads: 42
Joined: Jul 2006
Quote: Phi and Phiupper are really just special cases of the left and right gamma function respectively, so maybe we should forgo a dedicated brute force right to left computation of the continued fraction and use the modified Lentz code that is there? I admit this would mean another bytegobbling constant to pass to gcf() as the gln parameter (ln(sqrt(Pi)) = ln(Pi)/2 = 0.572364942924700087071713675676529355824), but if we ditch the dedicated CF code, would it be worth it? My user code of the rightsided incomplete gamma is perceptibly slower, but if I am getting 32 or 34 digits in a 34digit environment, there is something right about this alternative.
I'm quite tempted to remove the special case for the standard normal distribution and just use gammap and gammaq for this. We'll lose speed in the quantile function but save lots of flash  we lost almost three pages of flash special casing the normal cdf & pdf and doing the special purpose normal qf solver.
In a keystroke program a double precision constant is 17 bytes plus two for each access of it. I'd happily pay this price to save 600+ bytes of C.
Do you know the relationship between gammap/gammaq and the cumulative normals? I don't remember it and a quick google hasn't turned up anything viable.
 Pauli
▼
Posts: 1,368
Threads: 212
Joined: Dec 2006
Quote:
Do you know the relationship between gammap/gammaq and the cumulative normals? I don't remember it and a quick google hasn't turned up anything viable.
For x > 0:
Lower Tail Normal CDF = 0.5*(1 + gammap(0.5*x^2, 0.5))
Upper Tail Normal CDF = 0.5*gammaq(0.5*x^2, 0.5)
The parameter order is as you implement it in the calculator, and the opposite of the convention in Maple and Mathematicathe "a" parameter second, the "x" parameter first.
See A&S 26.2.30.
Les
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
Quote: The parameter order is as you implement it in the calculator, and the opposite of the convention in Maple and Mathematicathe "a" parameter second, the "x" parameter first.
I switched the parameter order for gammap and ibeta this morning :)
 Pauli
▼
Posts: 1,368
Threads: 212
Joined: Dec 2006
I notice that in the documentation for the newly baptized P and Q the parameter order reflects what actually goes on the stacksecond goes first so second is in y and the first is in x, just as the manual indicates. Works for me. With that in mind, in the normal distribution special cases x^2/2 goes into y and .5 into x before calling P or Q. So maybe I should've written things the conventional way to match the manual and the soontobe compiled functions.
Les
Posts: 3,229
Threads: 42
Joined: Jul 2006
Quote:
Lower Tail Normal CDF = 0.5*(1 + gammap(0.5*x^2, 0.5))
Upper Tail Normal CDF = 0.5*gammaq(0.5*x^2, 0.5)
Thanks. I'll have a go with these two. I remember some difficulties with the lower tail cdf via the gammap function or rather via erf and then gammap. That's the way I did it originally.
 Pauli
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
I remember the problem with the first formula. For x<<0 you can't just invert using 1  cdf which cancellation. However, with both versions present, this isn't such an issue.
 Pauli
▼
Posts: 1,368
Threads: 212
Joined: Dec 2006
Exactly. With x < 0, things are just reflectedthe lower CDF formula for negative argument is just the upper CDF formula with positive statistic, and the upper tail formula is just the lower tail formula with positive statistic. You couldn't do that reflection thing when you didn't have gammaq available.
Les
Posts: 3,229
Threads: 42
Joined: Jul 2006
I think I am going to have to revisit the inverse normal function yet again. Any chance you could send the quick and dirty code please? Even better a commented version :)
 Pauli
▼
Posts: 653
Threads: 26
Joined: Aug 2010
Pauli, Les et al. 
I'll try to summarize all this in one single post. ;)
Yesterday I replaced the initial rational approximation from by previous suggestion by a dedicated version for p down to 1E400. It now provides an estimate with an absolute error near +/ 0,002 and requires just two short real constants. The rest can be done with INC commands. ;)
Estimate for 1E400 =< p =< 0.5:
r = sqrt(2 ln p)
x = r  (0,295*(r+1) + 2) / ((0,0511*r + 1) * r) + 1)
An estimate x below 0.001 should get rounded to zero to ensure best results in the following correction step. The approximation was designed to give eight valid digits after the first iteration (max. error approx. +/ 2 units in the 9th digit), and with a second resp. third iteration we should be on the safe side for standard resp. double precision  as long as the working precision is sufficient. But that's another story  see my other remarks below.
Here's my implementation for the 34s.
001 **LBL A
002 STO 00 // save p in R00
003 ENTER[^]
004 DEC X
005 +/
006 MIN
007 STO 01 // q = min(p, 1p) in R01
008 LN
009 STO+ X
010 +/
011 [sqrt] // r = sqrt(2 ln q)
012 ENTER[^]
013 INC X // rational approximation starts here
014 .
015 2
016 9
017 5
018 [times]
019 INC X
020 INC X
021 .
022 0
023 5
024 1
025 1
026 RCL[times] Z
027 INC X
028 RCL[times] Z
029 INC X
030 /
031  // first extimate x
032 ENTER[^]
033 SDL 003
034 x<1? // x < 0,001? resp. 1000 x < 1 ?
035 CLSTK // then x = 0
036 DROP
037 STO 02 // save x in R02
038 2
039 DBL?
040 INC X // 2 resp. 3 iterations will do
041 STO 03 // loop counter in R03
042 **LBL 000
043 RCL 02
044 +/
045 [PHI](x) // Q(x) = right hand integral = Phi(x)
046 RCL 01
047 RCL 02
048 [phi](x)
049 / // t = (CDF(x)  q) / PDF(x)
050 FILL // save t on stack
051 x[^3]
052 6
053 / // least correction term is t^3 / 6...
054 RCL 02
055 x[^2]
056 STO+ X
057 INC X
058 [times] // ... * (2 x^2 + 1)
059 x[<>] Y
060 x[^2] // second last term is...
061 RCL[times] 02
062 2
063 / // ... t^2 * x / 2
064 + // add two smallest terms together
065 + // add this to t to get complete delta
066 STO+ 02 // correct x by delta
067 DSZ 03
068 GTO 000 // repeat 2 or 3 times
069 RCL 00
070 .
071 5
072 
073 SIGN // finally adjust sign of x
074 RCL[times] 02
075 END
0,0025 [A] 2,8070 33768 34380 41172 21810 39471 2269 E+0 exact
This works fine at least for quantiles >= 1. Now let's look at p close to 0,5.
Note: "+1D34" = +1 unit in the 34th significant digit.
0,49 [A] 2,5068 90825 87110 35762 36343 18346 9043 E2 +1D34
0,4999 [A] 2,5066 28300 88035 09892 06501 05407 8430 E4 +1D32
0,4999999 [A] 2,5066 28274 63102 67517 65674 82243 4196 E7 3D29
0,4999999999 [A] 2,5066 28274 63100 05024 42013 42148 8860 E10 1D25
0,4999999999999999 [A] 2,5066 28274 63100 05029 49753 49666 4200 E16 +5D20
This shows the problem Les and I have been talking about previously. Generally spoken, our basic approach (i.e. evaluate the CDF for the current approximation, see how good it matches p and apply a correction) is good for up to 34 decimals, but not for 34 significant digits. This is because close to zero there are numerous ndigit quantiles that return the desired p so that no further correction is applied. In other words, a calculated quantile near zero with a magnitude of 10^n may be off in its last n digits, so that we may expect something like 34n digit accuracy. Yes, the given examples are one digit better. ;)
What can we do? We will have to handle p close to 0,5 separately. This can be accomplished in several ways:
But there's also another way to address this issue: Forget about 34digit accuracy. There simply is no way to ensure a valid ndigit result as long as all calculations carry just the same n digits and no further guard digits. I remember how I tried to get an exact 10digit result for the Normal quantile on the 41C with the same 10 digits working precision. It's simply not possible (at least I do not know of a method to accomplish this). That's why I designed the algorithm I suggested a few days ago  it returns 11 good digits over the 41C's working range as long as the calculation is done with a few more guard digits, e.g. in machine code.
If I remember correctly, the original idea behind the double precision mode was this: enable the user to write usercode functions with lots of extended precision, so that the final result can be assumed to carry at least the usual 16 digits that are normally returned to the user. Once again: in many, if not most cases it's simply not possible to get a true 34digit result unless the working precision exceeds these 34 digits. So we cannot expect 34 digits anyway. What is possible? 32? 30? 25? It depends. Round it to 16 digits and you are on the safe side. :)
Dieter
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
Quote: Yesterday I replaced the initial rational approximation from by previous suggestion by a dedicated version for p down to 1E400. It now provides an estimate with an absolute error near +/ 0,002 and requires just two short real constants. The rest can be done with INC commands. ;)
Thanks very much for this, I'll have a look at including it or something similar.
We've not so constant adverse any more. Constants for C occupy 36 bytes plus the code to load them, for keystroke programs they require nine bytes for single precision and 17 for double (plus two bytes to load them). Additionally, integers 0  255 are directly available and with the decimal point shifts, many short constants are only two instructions (4 bytes) to get.
 Pauli
Posts: 3,229
Threads: 42
Joined: Jul 2006
