# HP Forums

Full Version: [WP34s] Parallel function
You're currently viewing a stripped down version of our content. View the full version with proper formatting.

As far as I can see the current implementation of the 34s parallel function (cf. sourceforge.net) works this way:

```                x*y
par(x,y)  =  -----
x+y
```
This may lead to overflow as soon as the product in the nominator becomes too large, even if the result falls within the working range. And for (almost) any positive x, y it will.

The current code also checks whether x*y is zero in order to test if either x or y (or both) are zero -- in this case the returned result is zero as well. For very small values of x and/or y this may lead to underflow, returning a plain zero where the true result actually is very small, but not zero.

Example:

```   x          y         34s ||      true
-----------------------------------------
1 E+4000   1 E+4000   infinity   5 E+3999
1 E-4000   1 E-4000     0        5 E-4001
```
There are numerous ways to compute the parallel function, and there usually is one best way to do it for a certain combination of the two arguments. Not all, but most overflow/underflow problems can be solved if the function would work this way:
```                   1
par(x,y)  =  ---------
1/x + 1/y
001 LBL"PAR"
002 x=0?
003 RTN
004 1/x
005 x<>y
006 x=0?
007 RTN
008 1/x
009 +
010 1/x
011 RTN
```
That's why I would suggest an update of the parallel function. The code above is just a first example -- I am sure it can be done better (e.g. without slight rounding errors in the last digit). What do you think?

Dieter

Edited: 4 Nov 2012, 6:42 a.m.

The problem with this implementation is that none of the arguments may be zero (which is valid and should return zero).

Please take a look at the suggested code, especially line 002/003 and 006/007. Of course any of the two arguments may be zero -- in this case zero is returned. That's the same behaviour as in the current implementation and how you say it is supposed to work ("should return zero").

Dieter

Yes, it returns zero, but the stack (and LastX) is not correctly set.

Franz

The suggested code is not intended to replace the current implementation literally. It's a suggestion for a different way to compute the parallel function, here in the way it could be done in a standard 34s user program. An XROM routine with a similar approach would use the respective standard commands on entry (XIN DYADIC)
and exit (xOUT xOUT_NORMAL) that also take care of the stack and last X.

Once again: the idea here is just the suggestion of a different way to compute the parallel function, thus overcoming the current limitations. Maybe (probably) there is an even better solution - what's your idea here ?-)

Dieter

Franz, an XROM routine following the line Dieter suggested would return a proper stack and Lastx AFAIUI.

Agreed, I just did look at the formula, not the suggested implementation.

I'm pretty sure that Pauli had a specific reason to implement || the way it is now. XROM executes in double precision with an extended exponent range so exponent overflow isn't really an issue here.

Marcus,

Quote:
XROM executes in double precision with an extended exponent range so exponent overflow isn't really an issue here.

That's only true if the 34s is set to standard precision. Yes, for XROM routines in double precision the working range exceeds 1E+6000, which is more than enough to handle every input in standard precision, i.e. up to 1E+384. However, the user may just as well have the device set to DP, et voila...

Please take a look at the two examples I posted: values like 1E+4000 can be easily entered in DP mode - and will lead to overflow resp. underflow, just as shown there.

```  [DBLON]
4000 [10x] [ENTER] [||]  =>  infinity
-4000 [10x] [ENTER] [||]  =>  0
```
Dieter

When I implemented these two in xrom, I didn't consider the edge cases in double precision -- the code as is will work fine for single precision which was my main concern at the time. We've always said double precision accuracy is not guaranteed. This is still the case. Double precision was intended for internal implementation and exposed at the request of this community.

The previous C version would have worked in double precision too -- it has a much larger exponent range again.

Still, I don't mind improving our double precision performance if the cost is low which it seems to be here.

- Pauli

I doubt that 10^4000 is a meaningful physical dimension...

Exponents up to 999 can be entered directly in DP and they will not lead to an overflow here.

I'm still not sure why Pauli decided to use the present algorithm but I assume he had a cancellation issue in mind.

As usual, it's a tradeoff: the current method is potentially a bit more accurate, at least in the last digit. Try x = y = 15 and the method I posted will be +2 ULP off (7.500....002 instead of 7.5). So the best possible implementation would use (at least) two different ways to compute the result, for instance depending on x*y returning "infinity" or not. ;-)

Dieter

It would also be necessary to check x*y under flowing to zero.

- Pauli

Performance actually -- x*y / (x+y) is faster to calculate than 1 / (1/x+1/y) and given that full accuracy in double precision wasn't a requirement, I chose the former.

- Pauli

Ok, what about this one:

``` 001 LBL"PAR"
002 RCL* Y
003 x!=0?
004 infinity?
005 SKIP 004
006 X<>Y
007 RCL+ L
008 /
009 RTN
010 X<> L
011 X=0?
012 RTN
013 1/x
014 X<>Y
015 X=0?
016 RTN
017 1/x
018 +
019 1/x
020 RTN
```
Just a thought, without any further testing. ;-)

Dieter

It is worse than needing a comparison against zero -- if x*y goes subnormal it is possible to have as few as one significant digit in the product -- better to switch formulas before this point I think.

It is starting to seem not worth the effort...

- Pauli

Or, the best of both worlds:

```*LBL"PAR"
X=0?
GTO 00
ENTER
RCL+ ST Z
/
*LBL 00
*
RTN
```

(code 42S-style, as I don't own a 34S. Sadly, perhaps ;-)

This would suffer the same problems as my original code -- it uses the same formula.

42S style isn't far removed from 34S -- the 42S was a guiding light for the project in many ways.

- Pauli

No, it doesn't. Try it.

Werner

perhaps too short a reply: it does use the same formula, but a different order of operations, and avoids overflow. Try it

Edited: 5 Nov 2012, 2:52 a.m.

Yeah, it avoids the overflow, not thinking straight a the moment.
It also seems like it avoids the underflow as well.

It probably needs a test for x = -y to produce a -infinity result but this is easy.

Thanks.

- Pauli

Great - I knew there was a better way to do this. ;-) However, the code may behave differently depending on the order of the two arguments, i.e. whether X or Y is the larger value. Since par(x,y) = par(y,x) I think this should be avoided. I am not quite sure, but maybe the code works best if X is the larger value (avoids underflow here and there). A simple x<y?  x<>y could do the trick. At least it returns consistent results for par(x,y) and par(y,x).

BTW - during these tests with very large numbers like 106000 I noticed a quite ..."special behaviour" of the 10x function. Usually integer arguments should return exact powers of ten, but somewhere there is a point where - in DP mode - this is no longer guaranteed:

```  500 [10x] => 1.00000...0 E+500
999 [10x] => 1.00000...0 E+999
2000 [10x] => 9.99999...9 E+1999
6000 [10x] => 9.99999...8 E+5999
```

Dieter

I feared such academic problems would arise. We should have left DP closed to the public - but we didn't. Perhaps we shall rethink that decision ...

We do not special case 10integer. The error you mention is 2 ULPs. I think we can live with that.

Let's get back to the facts.

1. The "problem" here is the error. Not the error report.

2. Without DP mode there was no way to design and test XROM functions in user code, this way replacing former internal routines like, for instance, my humble contributions, the Normal quantile or Lambert's W. If you don't like the average user to access this mode, simply have it mapped to a special key sequence and do not mention it in the manual. Maybe it can be shown on the last page in small print and ROT13-encoded. #-)

3. The error in the exponential functions can be explained with a closer look at the present code. Functions like 2x or 10x are implemented by calling the general power function yx, using the relation yx = ex * ln y. All this is done with 39-digit internal precision. It can be shown that during the evaluation of 106000 an error of 1 ULP in ln 10 will cause a relative error of 10-34 or 1 unit in the 34th digit. The shown errors are larger, so obviously ln 10 has a slightly larger error.

4. The current (internal) constants catalogue includes exact values for ln 2 and ln 10. Using these instead of evaluating the logs at runtime would yield two significant advantages:
• First of all there is a substantial gain in speed for 2x and 10x, as the 34s log functions are generally on the slow side.
• Second, the mentioned cases 102000 and 106000 should return more accurate results: the 39-digit value of ln 10 has a roundoff error of merely 1,1 E-39 which means that the relative error of the final result should be within 1 E-35. Even with an additional error of 1 ULP in 6000 * ln 10 it's still 1 E-34.
That's why I think this error can be fixed easily.

Dieter

You're correct I'm not using the builtin log2 and log10 constants for 2x and 10x :-( I missed this optimisation unfortunately. I do use them for log2 and log10 at least.

Now to see if the code can be easily changed to pass in the log of base....

- Pauli

The next build will include this optimisation :)

- Pauli

Thanks for the details. As you see, this detailed report triggered some improvement, especially points 3 and 4 :-) No doubt the original error report wasn't the problem, but it wasn't leading to a solution yet - the second report was far better in that aspect. Thanks again.

BTW, DP mode is covered in an appendix of the manual - no easy way to hide it even more ;-)