I welcomed the non-regularized incomplete gammas, and was eager to see how they would compare with my one user code, ported from my HP41 routine, in itself ported from the very Numerical Recipes routines that inspire your code.
I must admit that I have found the performance of the upper non-regularized incomplete gamma underwhelming, and I have been boggled as to why, given that the whole thing is computed in native code to 39 digits before getting returned to the stack.
Take the upper non-regularized gamma for the parameter pair (5,7). My owner routine gives ALL 34 digits correct. The result is roughly 4.15, and the last four digits are 5464.
However, the newly-minted GAMMAxy gives 5493 in the last four digits--off a whopping 27ULP. I say "whopping" since all of this function is computed natively with 39 digits, none in XROM with less working precision.
At first I thought it had something to do with the taking of logarithms and then reexponentiation at the end of the routine. However, when I replicated this in my user code I lost a few ULP, but was still a lot closer than than the built in routine.
Then I looked at your implementation of the Lentz algorithm, and I wondered about your choice for the "tiny" parameter. You choose 1e-32, which is small enough for machine double precision of roughly 16 digits, but may be somewhat too large when working with 39-digit decNumbers. In my own analogue for gcf(), I use 1e-256, computed on the fly as 8 2^x 10^x 1/x. Even on the HP41, I somewhat arbitrarily use 1e-25 given that we have only 10 digits to deal with.
Running with this hypothesis, I changed my code to use 1e-32, and guess what? I get a result this time with 5484 in the last 4 digits--off by 20ULP.
In the modified Lentz algorithm, the purpose of the "tiny" parameter is to permit the routine to proceed should a division-by-zero error threaten to derail things. I think choosing its size is a matter of art, not science, but I understand it should be small enough that it simply rescues one from a division-by-zero error but does not, in the end, throw off the computation of the continued fraction.
I would be interested to see how gcf() performs if you substitute, say, 1e-400, a number already in available constants, as the tiny parameter. But if you do decide to keep the existing 1e-32, you need to correct line 2122 of decn.c for read "decNumberCopy(&d, &const_1e_32);". The underscore is missing in the original. However, based on the experiment with my own user code, I opt for moving to a much tinier tiny parameter (and, correspondingly, a much bigger "big" parameter where it is used at the start of the gcf routine.)
I hope this is a worthwhile petition. If I am right, we will see improved accuracy for the cumulative normal, erf, chi-square, and Poisson.
Les