[WP34S] Improving the Incomplete Gammas



#2

I welcomed the non-regularized incomplete gammas, and was eager to see how they would compare with my one user code, ported from my HP41 routine, in itself ported from the very Numerical Recipes routines that inspire your code.

I must admit that I have found the performance of the upper non-regularized incomplete gamma underwhelming, and I have been boggled as to why, given that the whole thing is computed in native code to 39 digits before getting returned to the stack.

Take the upper non-regularized gamma for the parameter pair (5,7). My owner routine gives ALL 34 digits correct. The result is roughly 4.15, and the last four digits are 5464.

However, the newly-minted GAMMAxy gives 5493 in the last four digits--off a whopping 27ULP. I say "whopping" since all of this function is computed natively with 39 digits, none in XROM with less working precision.

At first I thought it had something to do with the taking of logarithms and then reexponentiation at the end of the routine. However, when I replicated this in my user code I lost a few ULP, but was still a lot closer than than the built in routine.

Then I looked at your implementation of the Lentz algorithm, and I wondered about your choice for the "tiny" parameter. You choose 1e-32, which is small enough for machine double precision of roughly 16 digits, but may be somewhat too large when working with 39-digit decNumbers. In my own analogue for gcf(), I use 1e-256, computed on the fly as 8 2^x 10^x 1/x. Even on the HP41, I somewhat arbitrarily use 1e-25 given that we have only 10 digits to deal with.

Running with this hypothesis, I changed my code to use 1e-32, and guess what? I get a result this time with 5484 in the last 4 digits--off by 20ULP.

In the modified Lentz algorithm, the purpose of the "tiny" parameter is to permit the routine to proceed should a division-by-zero error threaten to derail things. I think choosing its size is a matter of art, not science, but I understand it should be small enough that it simply rescues one from a division-by-zero error but does not, in the end, throw off the computation of the continued fraction.

I would be interested to see how gcf() performs if you substitute, say, 1e-400, a number already in available constants, as the tiny parameter. But if you do decide to keep the existing 1e-32, you need to correct line 2122 of decn.c for read "decNumberCopy(&d, &const_1e_32);". The underscore is missing in the original. However, based on the experiment with my own user code, I opt for moving to a much tinier tiny parameter (and, correspondingly, a much bigger "big" parameter where it is used at the start of the gcf routine.)

I hope this is a worthwhile petition. If I am right, we will see improved accuracy for the cumulative normal, erf, chi-square, and Poisson.

Les


#3

Quote:
I would be interested to see how gcf() performs if you substitute, say, 1e-400, a number already in available constants, as the tiny parameter. But if you do decide to keep the existing 1e-32, you need to correct line 2122 of decn.c for read "decNumberCopy(&d, &const_1e_32);". The underscore is missing in the original. However, based on the experiment with my own user code, I opt for moving to a much tinier tiny parameter (and, correspondingly, a much bigger "big" parameter where it is used at the start of the gcf routine.)

Oops, that bug has been there forever :-(

The reason I chose 1e-32 was simply because it was the smallest constant I had available at the time I wrote this code. Double precision mode didn't come until later, necessitating smaller magnitude constants. I use 1e-32 as a termination criteria for several series.

The 1e-400 constant isn't actually used and doesn't get included in the image. It used to be used in the C code for the chi^2 distribution but that lives in xrom now.

I've tried 1e-400., 1e-256 and the 1e-32. The result is the same regardless. This isn't the source of this problem in this piece of code.

The problem lies in the initialisation of c on line 2110. That value needs to be larger. Making this infinite and leaving the rest as 1e-32, I get the proper solution.

I'll push the 1e-32 down to 1e-37 for now (that is the smallest constant that is used elsewhere). If this proves insufficient, I can always add a new constant and use that.


- Pauli


#4

Quote:

The problem lies in the initialisation of c on line 2110. That value needs to be larger. Making this infinite and leaving the rest as 1e-32, I get the proper solution.


Oops! I should have mentioned that in my experiment on my user code I also reduced "BIG" from 1e256 to 1e32. (Indeed, that is how I get my "tiny"--compute "big" and store it, then take the reciprocal and store that as "tiny".) THAT is what probably led to me replicating the lower accuracy issue.

Look forward to playing with the new function in the next build.

Les


#5

It is built already :-)


- Pauli


#6

And the accuracy is vastly improved, as predicted.

This improved accuracy works its way to the normal CDF. For example, I note that in using the incomplete gamma, the breakpoint as to whether one goes with gser or gcf internally is lower than we have used before--i.e., sqrt(3). If we go a little above this (in absolute terms), Phi(-1.733) returns a 34-digit result off just 1ULP. This impresses me, given that this is the territory where the continued fraction goes through the most iterations and we multiply through by 1/sqrt(Pi) in XROM, not native code, introducing the possibility of rounding error. Not bad.

Now when we go just below the sqrt(3) cut-off (in absolute terms), Phi(-1.732) is given exactly to 34 digits--again impressive considering this is the very territory where we can get subtraction error, as well as rounding error since this is were the most series terms are computed. Thankfully, the 5 extra digits in the native code routine seems to absorb things.

Fine work!

Les


#7

Vastly improved????? You jest. 32 correct digits improving to 33 or 34 digits in some cases? Hardly a vast change. The old result already had ten or more digits in excess of what is actually required by the device. Remember we make no accuracy guarantees for double precision.


Importantly, a long resident bug was identified and corrected and, less importantly, a flash based constant was removed as unnecessary.


- Pauli


#8

Maybe not vastly. But I think it is a fine refinement.

In terms of the the inverse normal, the Achilles' heel for speed seems to be around p = .0416, where the quantile is roughly -sqrt(3)--right at the incomplete gamma break-point between the series and continued fraction where each subroutine computes the most terms for convergence. Speeds up on either side of that.

The accuracy Achilles' heels of the inverse include probabilities close to .5, as has been discussed ad nauseum, but even still we are getting at least 27 digits there without fancy treatment. But the fixable accuracy issue is the treatment of probabilities close to 1. For example, in DP, unity less 1ULP returns a quantile of 12.0477119..., whereas the true first nine digits are 12.0474677.... However, the existing routine is very capable of returning the quantile associated with the complement of 1e-33 to a full 34-digits, albeit with sign flipped. The code doesn't capitalize on the symmetry. This also happens with symmetrical t distribution as well. In the inverse normal and inverse t, I vote for testing if the probability argument exceeds .5. If so, set a flag indicating such and compute the complementary probability. Find the quantile given the probability less than 0.5, and when that flag tests "true" simply flip the sign. That's basically what I do manually--I don't compute the quantile for .99999999. I compute it for .00000001 and flip the sign of the result.

More gristle on which to chew...

Les


#9

Cauchy would also all into the same category I suspect, being symmetric as well.

Feel free to submit a patch with these changes. You've got access to the source code after all :-)


I'm happy that the correct result is returned for unity less 1 ULP in single precision mode. As I've said on many occasions, accuracy not guaranteed in double precision.


- Pauli


#10

Quote:
You've got access to the source code after all :-)

Then I'd guess we'll soon have 64-128 digit precision for these distribution functions. ;-)

Franz


Edited: 15 May 2012, 5:24 a.m.


#11

Not from me :-)

- Pauli


#12

Quote:
Not from me :-)

I know, I rather meant your new 'high-precision' team member Les. ;-)
#13

Please don't exaggerate. It may be interesting from an academic point of view but I've never seen any real world application of such distributions needing more than just a very small number of significant digits. We went far beyond already.


#14

Quote:
Please don't exaggerate.

No sense of humor? ;-)

In fact I already had to smile while reading those discussions about 34 or 39 digits precision ...


#15

Quote:

No sense of humor? ;-)


I didn't mean you this time ;-)

#16

This Message was deleted. This empty message preserves the threading when a post with followup(s) is deleted. If all followups have been removed, the original poster may delete this post again to make this placeholder disappear.


#17

I wasn't implying STFU in any manner or sense.

I was saying that since you've coded the required changes already and have validated them, perhaps it would be better if you submitted a patch set rather than hinting that I should go through the same process coding essentially the same thing you've done.

I really see no point me repeating effort you've already done. Especially, when it is a keystroke program not C -- not everyone can build & test changes to the C version of the firmware but everyone can play with keystroke programs. I'm quite willing to add you as an official developer on the project if you want some autonomy making changes -- your input has been good and helpful thus far.


Since the function is meeting our accuracy goals already, such changes are just extra icing -- I'm certainly not going to object to having them included but I'm not going to rate them super-high immediate priority to spend lots of time (re)implementing and testing. I've a finite amount of time I can dedicate to the 34S and I already spend more than I should on it.


- Pauli

#18

Well, the "real world" seems to work with not more than about three significant digits. ;-) But that's not the point here. "A ship in harbor is safe - but that is not what ships are built for."

We are talking about a device that returns 16 or 34 digits to the user. With an internal 39-digit precision it is no problem to have the functions (at least those we are discussing here) return results with this accuracy. Doing so does not mean more than one simple thing: let the user have trust in the results he sees. Yes, in most cases 16 digits are more than enough. That's why I proposed to speed up especially the more complicated and slow functions by limiting the internal routines to, say, 18 valid digits if (and only if) the 34s is set to standard precision.

Deep down somewhere in a closet I have this Novus RPN calculator, back in the Seventies sold by Quelle (a major German mail order store back then) under their "Privileg" brand. It displays 12 nice red LED digits. But don't you expect all of them to be exact. ;-)

This is not what I'd accept from a device like the 34s. Especially when realizing the goal to have all displayed digits right requires no major effort.

Dieter


#19

Just to avoid misunderstandings:

Nothing against each and every result being accurate to 16 digits - that's an undisputed target of our WP 34S efforts. Before going beyond, however, please check CONST. I don't see any fundamental real world constant known to that precision besides the ones set "per definition".

Pure mathematic's another world though. So I don't object efforts to do matrix operations with some 30 digits precision, for example. Statistical distributions are just mathematical models of real world events, thus I don't think it will pay extending precision there. YMMV, of course, but as usual I'm posting my opinion here :-)

Edit: Corrected a spelling error in a foreign language.


Edited: 17 May 2012, 3:59 p.m.

#20

There has been a lot of discussion around the Normal CDF and the quantile function. Les, I read you use the Gamma function and the Lentz algorithm for the continued fraction routine. On the other hand, my own 34s program is based on the previously discussed series expansion and the CF method that is directly evaluated - with the number of terms computed in advance. I now would like to know a bit about the performance of both methods. Or, as the English say: "the proof is in the pudding". :-)

  • Assuming a program in 34s user code and 34-digit working precision - how exact are the results with your method?

  • If the quantile function is based on the well-known initial guess with two refinements for 16-digit accuracy, resp. three for double precision: what are the largest errors you get? For instance with p = 0,5 - 1E-7, 0.1, 0.01 and 1E-10.

  • Finally: how fast is it on actual hardware? How long does it take to calculate the CDF for x = 1, 2 and 3 for both 16 digit and full (32..34 digit) accuracy? Resp. the quantile for the four mentioned p values.
Just for comparison: my version usually is within -2...+2 ULP for the Normal CDF, I only observed one case that was off by 5 ULP. So 33 digits should be fine. The quantile usually is within "a few" units of the 34th digit, here and there 1 - 2 units of the 33th digit may occur.

Assuming your approach will give similar results, we can expect 32 correct digits. Or 36+ with 39 digit precision in a C-coded version. So again I would like to suggest that the Normal CDF and its quantile should be internal functions. With careful coding (including the known improvements) we could get full 34-digit accuracy easily.

BTW - I also would suggest a function for the central (two-sided symmetric) CDF, i.e. the Normal integral from -x to +x. This not only is convenient for the usual confidence tests, it also ensures maximum accuracy for x close to 0. There also is a quantile function for this case that ensures best possible accuracy both for p close to 1 and 0.

Quote:
The accuracy Achilles' heels of the inverse include probabilities close to .5, as has been discussed ad nauseum, but even still we are getting at least 27 digits there without fancy treatment.

No problems here with p close to 0,5. The accuracy is the same as in other regions.
Quote:
I vote for testing if the probability argument exceeds .5. If so, set a flag indicating such and compute the complementary probability. Find the quantile given the probability less than 0.5, and when that flag tests "true" simply flip the sign.

Sure - that's a method so basic and obvious that I wonder that it still was not implemented (or was it?). For me, this is an absolute must. Instead of the sign flip you may also just simply multiply the result with sgn(x-0,5). Pauli, do you listen ?-)

Dieter

Edited: 15 May 2012, 5:51 p.m.


#21

I see some points here:

1. We can get 34 digits from these functions in all cases if we code it in C and use all the recommendations from Dieter and/or Les.

2. We can get somewhat lower accuracy if we do the same in user code.

The questions are:

Is (1) worth the space cost?

Who is implementing (2)?

(1) is a worth a discussion because flash space is a scarce resource and if we can live with a somewhat lower accuracy we might find a better use for the saved flash space.

(2) is an open question to the community. Whoever has working code in this area and wants to see it incorporated in the software is asked to publish it in one way or the other. Just explaining the algorithms leaves to much work for us (especially Pauli). So if you have code: contribute it or don't complain!


Possibly Related Threads...
Thread Author Replies Views Last Post
  [WP34S] WP34S firmware on the AT91SAM7L-STK dev kit? jerome ibanes 1 741 10-04-2012, 04:59 PM
Last Post: Paul Dale
  [WP34S] Eagerly Anticipating Non-Regularized Incomplete Gammas Les Wright 6 1,344 05-13-2012, 06:40 PM
Last Post: Paul Dale
  [WP34S] Improving the Upper-Tail F and Chi-Square Les Wright 7 1,535 05-12-2012, 06:31 PM
Last Post: Paul Dale
  [wp34s] Incomplete Gamma on the wp34s Les Wright 18 3,094 12-06-2011, 11:07 AM
Last Post: Namir
  [wp34s] Romberg Integration on the wp34s Les Wright 2 949 12-04-2011, 10:49 PM
Last Post: Les Wright
  HP-41 Flex-PCB (improving replacement) Diego Diaz 0 479 05-08-2009, 07:15 AM
Last Post: Diego Diaz
  Incomplete gamma function et al. for 35s Les Wright 8 1,445 08-27-2007, 10:21 PM
Last Post: Karl Schneider

Forum Jump: