[WP34s] Parallel function « Next Oldest | Next Newest »

 ▼ Dieter Unregistered Posts: 653 Threads: 26 Joined: Aug 2010 11-04-2012, 06:39 AM As far as I can see the current implementation of the 34s parallel function (cf. sourceforge.net) works this way: ``` x*y par(x,y) = ----- x+y ``` This may lead to overflow as soon as the product in the nominator becomes too large, even if the result falls within the working range. And for (almost) any positive x, y it will. The current code also checks whether x*y is zero in order to test if either x or y (or both) are zero -- in this case the returned result is zero as well. For very small values of x and/or y this may lead to underflow, returning a plain zero where the true result actually is very small, but not zero. Example: ``` x y 34s || true ----------------------------------------- 1 E+4000 1 E+4000 infinity 5 E+3999 1 E-4000 1 E-4000 0 5 E-4001 ``` There are numerous ways to compute the parallel function, and there usually is one best way to do it for a certain combination of the two arguments. Not all, but most overflow/underflow problems can be solved if the function would work this way: ``` 1 par(x,y) = --------- 1/x + 1/y 001 LBL"PAR" 002 x=0? 003 RTN 004 1/x 005 x<>y 006 x=0? 007 RTN 008 1/x 009 + 010 1/x 011 RTN ``` That's why I would suggest an update of the parallel function. The code above is just a first example -- I am sure it can be done better (e.g. without slight rounding errors in the last digit). What do you think? Dieter Edited: 4 Nov 2012, 6:42 a.m. ▼ Marcus von Cube, Germany Unregistered Posts: 3,283 Threads: 104 Joined: Jul 2005 11-04-2012, 11:00 AM The problem with this implementation is that none of the arguments may be zero (which is valid and should return zero). ▼ Dieter Unregistered Posts: 653 Threads: 26 Joined: Aug 2010 11-04-2012, 12:48 PM Please take a look at the suggested code, especially line 002/003 and 006/007. Of course any of the two arguments may be zero -- in this case zero is returned. That's the same behaviour as in the current implementation and how you say it is supposed to work ("should return zero"). Dieter ▼ fhub Unregistered Posts: 1,216 Threads: 75 Joined: Jun 2011 11-04-2012, 01:53 PM Yes, it returns zero, but the stack (and LastX) is not correctly set. Franz ▼ Dieter Unregistered Posts: 653 Threads: 26 Joined: Aug 2010 11-04-2012, 02:22 PM The suggested code is not intended to replace the current implementation literally. It's a suggestion for a different way to compute the parallel function, here in the way it could be done in a standard 34s user program. An XROM routine with a similar approach would use the respective standard commands on entry (XIN DYADIC) and exit (xOUT xOUT_NORMAL) that also take care of the stack and last X. Once again: the idea here is just the suggestion of a different way to compute the parallel function, thus overcoming the current limitations. Maybe (probably) there is an even better solution - what's your idea here ?-) Dieter ▼ Walter B Unregistered Posts: 4,587 Threads: 105 Joined: Jul 2005 11-04-2012, 02:38 PM Franz, an XROM routine following the line Dieter suggested would return a proper stack and Lastx AFAIUI. Marcus von Cube, Germany Unregistered Posts: 3,283 Threads: 104 Joined: Jul 2005 11-04-2012, 03:33 PM Agreed, I just did look at the formula, not the suggested implementation. I'm pretty sure that Pauli had a specific reason to implement || the way it is now. XROM executes in double precision with an extended exponent range so exponent overflow isn't really an issue here. ▼ Dieter Unregistered Posts: 653 Threads: 26 Joined: Aug 2010 11-04-2012, 04:26 PM Marcus, Quote: XROM executes in double precision with an extended exponent range so exponent overflow isn't really an issue here. That's only true if the 34s is set to standard precision. Yes, for XROM routines in double precision the working range exceeds 1E+6000, which is more than enough to handle every input in standard precision, i.e. up to 1E+384. However, the user may just as well have the device set to DP, et voila... Please take a look at the two examples I posted: values like 1E+4000 can be easily entered in DP mode - and will lead to overflow resp. underflow, just as shown there. ``` [DBLON] 4000 [10x] [ENTER] [||] => infinity -4000 [10x] [ENTER] [||] => 0 ``` Dieter ▼ Marcus von Cube, Germany Unregistered Posts: 3,283 Threads: 104 Joined: Jul 2005 11-04-2012, 04:34 PM I doubt that 10^4000 is a meaningful physical dimension... Exponents up to 999 can be entered directly in DP and they will not lead to an overflow here. I'm still not sure why Pauli decided to use the present algorithm but I assume he had a cancellation issue in mind. ▼ Paul Dale Unregistered Posts: 3,229 Threads: 42 Joined: Jul 2006 11-04-2012, 04:50 PM Performance actually -- x*y / (x+y) is faster to calculate than 1 / (1/x+1/y) and given that full accuracy in double precision wasn't a requirement, I chose the former. - Pauli Paul Dale Unregistered Posts: 3,229 Threads: 42 Joined: Jul 2006 11-04-2012, 04:30 PM When I implemented these two in xrom, I didn't consider the edge cases in double precision -- the code as is will work fine for single precision which was my main concern at the time. We've always said double precision accuracy is not guaranteed. This is still the case. Double precision was intended for internal implementation and exposed at the request of this community. The previous C version would have worked in double precision too -- it has a much larger exponent range again. Still, I don't mind improving our double precision performance if the cost is low which it seems to be here. - Pauli ▼ Dieter Unregistered Posts: 653 Threads: 26 Joined: Aug 2010 11-04-2012, 04:42 PM As usual, it's a tradeoff: the current method is potentially a bit more accurate, at least in the last digit. Try x = y = 15 and the method I posted will be +2 ULP off (7.500....002 instead of 7.5). So the best possible implementation would use (at least) two different ways to compute the result, for instance depending on x*y returning "infinity" or not. ;-) Dieter ▼ Paul Dale Unregistered Posts: 3,229 Threads: 42 Joined: Jul 2006 11-04-2012, 04:48 PM It would also be necessary to check x*y under flowing to zero. - Pauli ▼ Dieter Unregistered Posts: 653 Threads: 26 Joined: Aug 2010 11-04-2012, 05:09 PM Ok, what about this one: ``` 001 LBL"PAR" 002 RCL* Y 003 x!=0? 004 infinity? 005 SKIP 004 006 X<>Y 007 RCL+ L 008 / 009 RTN 010 X<> L 011 X=0? 012 RTN 013 1/x 014 X<>Y 015 X=0? 016 RTN 017 1/x 018 + 019 1/x 020 RTN ``` Just a thought, without any further testing. ;-) Dieter ▼ Paul Dale Unregistered Posts: 3,229 Threads: 42 Joined: Jul 2006 11-04-2012, 06:36 PM It is worse than needing a comparison against zero -- if x*y goes subnormal it is possible to have as few as one significant digit in the product -- better to switch formulas before this point I think. It is starting to seem not worth the effort... - Pauli Werner Unregistered Posts: 163 Threads: 7 Joined: Jul 2007 11-05-2012, 02:41 AM Or, the best of both worlds: ```*LBL"PAR" X=0? GTO 00 ENTER RCL+ ST Z / *LBL 00 * RTN ``` (code 42S-style, as I don't own a 34S. Sadly, perhaps ;-) ▼ Paul Dale Unregistered Posts: 3,229 Threads: 42 Joined: Jul 2006 11-05-2012, 02:45 AM This would suffer the same problems as my original code -- it uses the same formula. 42S style isn't far removed from 34S -- the 42S was a guiding light for the project in many ways. - Pauli ▼ Werner Unregistered Posts: 163 Threads: 7 Joined: Jul 2007 11-05-2012, 02:50 AM No, it doesn't. Try it. Werner perhaps too short a reply: it does use the same formula, but a different order of operations, and avoids overflow. Try it Edited: 5 Nov 2012, 2:52 a.m. ▼ Paul Dale Unregistered Posts: 3,229 Threads: 42 Joined: Jul 2006 11-05-2012, 03:04 AM Yeah, it avoids the overflow, not thinking straight a the moment. It also seems like it avoids the underflow as well. It probably needs a test for x = -y to produce a -infinity result but this is easy. Thanks. - Pauli Dieter Unregistered Posts: 653 Threads: 26 Joined: Aug 2010 11-05-2012, 03:09 PM Great - I knew there was a better way to do this. ;-) However, the code may behave differently depending on the order of the two arguments, i.e. whether X or Y is the larger value. Since par(x,y) = par(y,x) I think this should be avoided. I am not quite sure, but maybe the code works best if X is the larger value (avoids underflow here and there). A simple xy could do the trick. At least it returns consistent results for par(x,y) and par(y,x). BTW - during these tests with very large numbers like 106000 I noticed a quite ..."special behaviour" of the 10x function. Usually integer arguments should return exact powers of ten, but somewhere there is a point where - in DP mode - this is no longer guaranteed: ``` 500 [10x] => 1.00000...0 E+500 999 [10x] => 1.00000...0 E+999 2000 [10x] => 9.99999...9 E+1999 6000 [10x] => 9.99999...8 E+5999 ``` Dieter ▼ Walter B Unregistered Posts: 4,587 Threads: 105 Joined: Jul 2005 11-05-2012, 03:38 PM I feared such academic problems would arise. We should have left DP closed to the public - but we didn't. Perhaps we shall rethink that decision ... ▼ Dieter Unregistered Posts: 653 Threads: 26 Joined: Aug 2010 11-08-2012, 09:02 AM Let's get back to the facts. The "problem" here is the error. Not the error report. Without DP mode there was no way to design and test XROM functions in user code, this way replacing former internal routines like, for instance, my humble contributions, the Normal quantile or Lambert's W. If you don't like the average user to access this mode, simply have it mapped to a special key sequence and do not mention it in the manual. Maybe it can be shown on the last page in small print and ROT13-encoded. #-) The error in the exponential functions can be explained with a closer look at the present code. Functions like 2x or 10x are implemented by calling the general power function yx, using the relation yx = ex * ln y. All this is done with 39-digit internal precision. It can be shown that during the evaluation of 106000 an error of 1 ULP in ln 10 will cause a relative error of 10-34 or 1 unit in the 34th digit. The shown errors are larger, so obviously ln 10 has a slightly larger error. The current (internal) constants catalogue includes exact values for ln 2 and ln 10. Using these instead of evaluating the logs at runtime would yield two significant advantages: First of all there is a substantial gain in speed for 2x and 10x, as the 34s log functions are generally on the slow side. Second, the mentioned cases 102000 and 106000 should return more accurate results: the 39-digit value of ln 10 has a roundoff error of merely 1,1 E-39 which means that the relative error of the final result should be within 1 E-35. Even with an additional error of 1 ULP in 6000 * ln 10 it's still 1 E-34. That's why I think this error can be fixed easily. Dieter ▼ Paul Dale Unregistered Posts: 3,229 Threads: 42 Joined: Jul 2006 11-08-2012, 03:52 PM You're correct I'm not using the builtin log2 and log10 constants for 2x and 10x :-( I missed this optimisation unfortunately. I do use them for log2 and log10 at least. Now to see if the code can be easily changed to pass in the log of base.... - Pauli ▼ Paul Dale Unregistered Posts: 3,229 Threads: 42 Joined: Jul 2006 11-08-2012, 03:58 PM The next build will include this optimisation :) - Pauli Walter B Unregistered Posts: 4,587 Threads: 105 Joined: Jul 2005 11-09-2012, 06:19 PM Thanks for the details. As you see, this detailed report triggered some improvement, especially points 3 and 4 :-) No doubt the original error report wasn't the problem, but it wasn't leading to a solution yet - the second report was far better in that aspect. Thanks again. BTW, DP mode is covered in an appendix of the manual - no easy way to hide it even more ;-) Marcus von Cube, Germany Unregistered Posts: 3,283 Threads: 104 Joined: Jul 2005 11-05-2012, 04:04 PM We do not special case 10integer. The error you mention is 2 ULPs. I think we can live with that.

 Possibly Related Threads… Thread Author Replies Views Last Post HP50g: Writing a function that returns a function Chris de Castro 2 2,082 12-10-2013, 06:49 PM Last Post: Han HP 35S: Resistors in Series or Parallel (Portred from HP-33E) Eddie W. Shore 2 1,364 10-14-2013, 05:29 AM Last Post: R. Pienne [HP-prime] Parallel Processing, vectors, matrices as a only data type =( CompSystems 1 1,174 08-08-2013, 04:48 PM Last Post: peacecalc [WP34s] RSD function Dieter 11 3,229 01-29-2013, 08:58 AM Last Post: Walter B [WP34S] WP34S firmware on the AT91SAM7L-STK dev kit? jerome ibanes 1 1,189 10-04-2012, 04:59 PM Last Post: Paul Dale wp34s INC s function Jeff O. 18 4,459 10-03-2012, 02:20 AM Last Post: Paul Dale WP34S Fibonacci function wildpig 3 1,299 09-02-2012, 02:25 AM Last Post: Walter B HP 39gII and parallel list processing Gilles Carpentier 12 3,088 07-25-2012, 08:49 AM Last Post: Wes Loewer [WP34S] LocRm allocation function missing? Chris Tvergard 10 3,156 05-14-2012, 10:14 AM Last Post: Chris Tvergard [WP34S] Curious Bug in Inverse Normal Function Les Wright 61 12,534 05-01-2012, 02:44 AM Last Post: Paul Dale

Forum Jump: