Lambert W (HP-41)



#2

This is an implementation of Crawl's suggestion (message #7 in this old thread). The convergence is fast, but because successive approximations oscillate as x approaches -1/e, the comparison test is made between yn+1 and yn-1 instead of yn+1 and yn. Perhaps using a flag would be a better solution than incrementing the register 03 and doing MOD 2 to save every other approximation into the register 04 as I did, but that's the only idea I have had. The running times on my HP-41CV are quite reasonable all through the valid ranges, but some optimization is possible, in case someone wants to try:

- using the stack to save one or two registers;

- avoiding numerical constants inside the loop to improve the speed;

- etc.

A WP 34S port should be also interesting, at it appears Wm takes too long on it and even on the emulator when x approaches -1/e, at least in my outdated version.

001*LBL "LWM"
002 STO M
003 CHS
004 LN
005 GTO 00
006*LBL "LWP"
007 STO 01
008 2
009 +
010 LN
011 ENTER^
012 LN
013 -
014*LBL 00
015 STO 02
016 STO 04
017 0
018 STO 03
019*LBL 01
020 1
021 ST+ 03
022 RCL 02
023 +
024 LASTX
025 X^2
026 RCL 01
027 LASTX
028 E^X
029 /
030 +
031 X<>Y
032 /
033 RCL 02
034 X<>Y
035 STO 02
036 RCL 03
037 2
038 MOD
039 X=0?
040 GTO 02
041 RDN
042 X<>Y
043 STO 04
044*LBL 02
045 RCL 02
046 RCL 04
047 X#Y?
048 GTO 01
049 RCL 02
050 RTN
051 .END.

actual time
x LWP(x) LWP(x) ( s )

9.999999e99 224.8431064 224.8431063 4.8
99.999999990 3.385630140 3.385630140 6.6
2.000000000 0.852605502 0.852605502 7.6
1.000000000 0.567143290 0.567143290 11.5
0.5 0.351733711 0.351733711 10.1
0.000000000 0.000000000 0.000000000 14.0
-0.367879441 -0.999969978 -0.999969471 22.6
-0.36787944 -0.999918898 -0.999920199 20.5
-0.3678794 -0.999526969 -0.999526967 20.4
-0.367 -0.932399183 -0.932399185 14.5
-0.36 -0.806084317 -0.806084316 12.6
-0.32 -0.560489483 -0.560489483 12.7
-0.3 -0.489402227 -0.489402227 10.5
-0.2 -0.259171102 -0.259171102 9.8
-0.000100000 -0.000100010 -0.000100010 9.2
-1e-10 -1e-10 -1e-10 9.1
-1e-20 -2e-20 -2e-20 9.1
-9e-90 -9e-90 -9e-90 10.7

actual time
x LWM(x) LWM(x) ( s )

-0.36787944 -1.000086314 -1.000079806 15.8
-0.3678794 -1.000047374 -1.000047318 15.8
-0.367 -1.070791895 -1.070791887 14.0
-0.36 -1.222770133 -1.222770134 10.4
-0.32 -1.624849446 -1.624849446 7.7
-0.3 -1.781337024 -1.781337023 5.8
-0.2 -2.542641358 -2.542641358 5.7
-0.000000001 -23.89701959 -23.89701959 9.4
-1e-20 -49.96298427 -49.96298428 9.4
-2e-60 -142.4207440 -142.4207440 11.5
-9e-90 -210.3843700 -210.3843700 11.3


#3

Hi Gerson,

those results sure are impressive! My MCODE "WL0" and "WL1" functions don't quite match the same accuray for the very large/small arguments!

Worth revising the code, I'm afraid.

Thanks for the vector, I missed the original thread altogether.

'AM


#4

Quote:
those results sure are impressive! My MCODE "WL0" and "WL1" functions don't quite match the same accuray for the very large/small arguments!

Thanks, but let's thank Crawl for his insights :-)

I've just tested "WL0" and "WL1" on the emulator (I'd forgotten to check my results upon yours). The results near -1/e appear to be better than the ones here. Perhaps I should test the algorithm on a 12-digit calculator, like the HP-42S, and see how the 10-digit results will look like.

Best regards,

Gerson.


#5

Yes I think the better results in the vicinity of -1/e are due to the 13-digit math routines I use in the MCODE functions.

For the mods, I think I'd need to change the convergence factor for large arguments, perhaps losing accuracy but at least completing the execution when it now starts oscillating.

Cheers,
'AM

#6

Quote:
The running times on my HP-41CV are quite reasonable all through the valid ranges, but some optimization is possible, in case someone wants to try:

Times are also reasonable on HP-41C. I have just follow your suggestions :

- using the stack to save three registers; only x is memorized into R00.

- History of previous estimates (y" and y') of actual y estimate are all store in stack.

- ...





                  t:          z:          y:                  x:      L:        R00

001 LBL "LWM x
002 STO 00 x
003 CHS -x x
004 LN LN(-x) -x
005 GTO 02
006 LBL "LWP x
007 STO 00 x
008 2 x 2
009 + x+2 2
010 LN LN(x+2) x+2
011 ENTER^ LN(x+2) LN(x+2)
012 LN LN(x+2) LN(LN(x+2)) LN(x+2)
013 - LN(x+2)-LN(LN(x+2))
014 LBL 02 y0
015 0 y0 0
016 x:y 0 y0
017 ENTER^ 0 y0 y0
018 LBL 01 y' y y"
019 RDN y' y
020 x=y?
021 STOP
022 RCL 00 y' y x x
023 RCL Y y' y x y
024 E^X y' y x exp(y) y
025 / y' y x/exp(y) exp(y)
026 RCL Y y' y x/exp(y) y
027 x^2 y' y x/exp(y) y² y
028 + y' y x/exp(y)+y² y²
029 x:y y' x/exp(y)+y² y
030 1 y' x/exp(y)+y² y 1
031 x:y y' x/exp(y)+y² 1 y
032 + y' x/exp(y)+y² 1+y y
033 Lastx y' x/exp(y)+y² 1+y y
034 RDN y y' x/exp(y)+y² 1+y
035 / y y' (x/exp(y)+y²)/(1+y) 1+y
036 x:y y' y y"
037 x#y? y' y y"
038 GTO 01
039 END
y


Edited: 3 Sept 2012, 5:22 a.m.


#7

Quote:
I have just follow your suggestions :

- using the stack to save three registers; only x is memorized into R00.

- History of previous estimates (y" and y') of actual y estimate are all store in stack.


C'est formidable ! Thanks for taking the time to optimize it.


Quote:
Times are also reasonable on HP-41C.

Even more reasonable after your changes, as I can check on my HP-41C:

                                   actual        time     
x LWP(x) LWP(x) ( s )

9.999999e99 224.8431064 224.8431063 4.1
99.999999990 3.385630140 3.385630140 5.4
2.000000000 0.852605502 0.852605502 5.5
1.000000000 0.567143291 0.567143290 6.4
0.5 0.351733711 0.351733711 5.5
0.000000000 0.000000000 0.000000000 7.4
-0.367879441 -0.999969978 -0.999969471 17.9
-0.36787944 -0.999918898 -0.999920199 16.2
-0.3678794 -0.999526969 -0.999526967 16.0
-0.367 -0.932399183 -0.932399185 11.6
-0.36 -0.806084317 -0.806084316 10.1
-0.32 -0.560489483 -0.560489483 9.6
-0.3 -0.489402227 -0.489402227 8.5
-0.2 -0.259171102 -0.259171102 7.3
-0.0001 -0.000100010 -0.000100010 6.6
-1e-10 -1e-10 -1e-10 6.6
-1e-20 -2e-20 -2e-20 7.0
-9e-90 -9e-90 -9e-90 8.0

actual time
x LWM(x) LWM(x) ( s )

-0.36787944 -1.000086314 -1.000079806 12.0
-0.3678794 -1.000473741(*) -1.000473183 12.2
-0.367 -1.070791895 -1.070791887 8.1
-0.36 -1.222770133 -1.222770134 7.5
-0.32 -1.624849446 -1.624849446 5.4
-0.3 -1.781337024 -1.781337023 4.7
-0.2 -2.542641358 -2.542641358 4.6
-0.000000001 -23.89701959 -23.89701959 6.6
-1e-20 -49.96298427 -49.96298428 7.4
-2e-60 -142.4207440 -142.4207440 8.2
-9e-90 -210.3843700 -210.3843700 8.1

(*) The result in my first table is incorrect, this is the correct one.

Two results are slightly different, but I think I know the reason. If I replace the '0' in my step 017 with '1' I get the same results. Not an issue.


Regards,

Gerson.


#8

On the WP 34S you can replace the steps 30 to 34 by two instructions: STO T, INC X, however LWP is not converging for x > 0.89591 : y, y' & y'' are staying different by a few ULPs.

Edited: 3 Sept 2012, 12:19 p.m.


#9

Would truncating the mantissas to 15 significant digits fix that? Thanks!


#10

Truncating or rounding to 15 digits would be bad. Not just because the 34S carries 16 digits in dingle precision. Double precision? Extra guard digits to attempt to get correct rounding?

Better would be to use the CNVG? instruction to check for convergence. It can determine a level of accuracy based on the current mode.


- Pauli


#11

I've just entered C. Ret's code with no change into my WP 34S. It has some issues when x is around 0.9 as Didier has mentioned, but it is very fast for the values that work. I've inserted the instructions ROUND x<>y ROUND x<>y between the steps 019 and 020. Obviously this is not a solution as the results are limited to 12 digits, but both WP and WL work for all examples in my table.

Gerson.


#12

What about using the approximately equals test on step 020?

This does a round internally but doesn't disturb the stack.


- Pauli


#13

It works!

So that's what x~?Y is for :-)

Gerson.

#14

MAy I suggest to use this "is converging" test to replace the y#x? test at the end of the code. The x=y? test at step 020 is a short-cut stop in case there is no "oscillation" at the end of the "convergence" (or process in case not converging to the expected value !).


#15

So your steps 020 and 021 are kind of optional. They allow for faster results (about half a second) when there is no oscillation (most of the practical cases). This makes sense on the HP-41, but perhaps doesn't make any difference in faster calculators, like the WP 34S. By the way, do you have a 28S version already?



actual time
x LWP(x) LWP(x) ( s )

w/o steps (with steps
020 & 021 020 & 021)

9.999999e99 224.8431064 224.8431063 3.9 (4.1)
99.999999990 3.385630140 3.385630140 5.2 (5.4)
2.000000000 0.852605502 0.852605502 5.9 (5.5)
1.000000000 0.567143291 0.567143290 6.1 (6.4)
0.5 0.351733711 0.351733711 5.9 (5.5)
0.000000000 0.000000000 0.000000000 7.9 (7.4)
-0.367879441 -0.999969978 -0.999969471 17.4 (17.9)
-0.36787944 -0.999918898 -0.999920199 15.7 (16.2)
-0.3678794 -0.999526969 -0.999526967 15.6 (16.0)
-0.367 -0.932399183 -0.932399185 11.2 (11.6)
-0.36 -0.806084317 -0.806084316 9.8 (10.1)
-0.32 -0.560489483 -0.560489483 9.3 (9.6)
-0.3 -0.489402227 -0.489402227 8.3 (8.5)
-0.2 -0.259171102 -0.259171102 7.8 (7.3)
-0.0001 -0.000100010 -0.000100010 7.0 (6.6)
-1e-10 -1e-10 -1e-10 6.9 (6.6)
-1e-20 -2e-20 -2e-20 7.4 (7.0)
-9e-90 -9e-90 -9e-90 8.4 (8.0)
actual time
x LWM(x) LWM(x) ( s )

-0.36787944 -1.000086314 -1.000079806 11.8 (12.0)
-0.3678794 -1.000473741 -1.000473183 11.9 (12.2)
-0.367 -1.070791895 -1.070791887 8.0 (8.1)
-0.36 -1.222770133 -1.222770134 7.3 (7.5)
-0.32 -1.624849446 -1.624849446 6.0 (5.4)
-0.3 -1.781337024 -1.781337023 5.3 (4.7)
-0.2 -2.542641358 -2.542641358 5.3 (4.6)
-0.000000001 -23.89701959 -23.89701959 7.2 (6.6)
-1e-20 -49.96298427 -49.96298428 7.9 (7.4)
-2e-60 -142.4207440 -142.4207440 8.9 (8.2)
-9e-90 -210.3843700 -210.3843700 8.6 (8.1)


#16

Hi.

Yes, I already have type down a version for my HP28S, but it use ROOT finding capabilities such as the HP50g version may. The algorithm is quite different.

For the given x, the HP-28S first looks the solution W of the equation x=W.x.e(W.x). Then the W values are multiplied by x in order to get LWP(x) and LWM(x).

For positive x one unique solution W1 is possible. The ROOT sinking is initiated with 0.



Graph horizontal axe is W( independent) where x=+.3 (set value).

For negative xonly the second solution W2 is searched. The ROOT sinking is initiated with -e/x.



Graph horizontal axe is W(independent) where x=-.3 (set value).

« -> x 
« IF x e ->NUM INV NEG >= THEN
x 0 LWROOT
IF x 0 < THEN
x DUP NEG INV 1 EXP * LWROOT
2
ELSE
1
END
->ARRY
DUP x * ARRY-> DROP
ELSE
'n/a'
END
»
»
'LWPM'

« -> x y.
« x DUP 'w' * DUP EXP * =
'w' y. ROOT
»
»
'LWROOT'

The result obtained by this method give no clue about convergence of your method.
That why, I make a second HP-28s version, using strictly the same algorithm as for the HP-41C:

« DUP 2 + LN DUP LN - LWSLVR » 'LWP'
« DUP NEG LN LWSLVR » 'LWM'
« -> x y. « 0 y. y.
DO
DROP
x OVER EXP / OVER SQ + OVER 1 + /
ROT
UNTIL DUP2 == END » » 'LWSLVR'

Frist attempt was with 2 : convergence is fast, but the final value is not exactly the expected one. But, my code leave the three successive estimates in the stack. It is easy to obtained a better final value by computing the mean of the 3 last estimate :

x                w               x.w              y”             y’             y             mLWP(x)=(y"+y’+y)/3

5. .265344933049 1.32672466525 1.32672466525 1.32672466524 1.32672466524 1.32672466524
2. .426302751007 .852605502014 .852605502012 .852605502017 .852605502017 .852605502013
1. .567143290411 .567143290411 .567143290409 .567143290409 .567143290409 .567143290410
.5 .703467422500 .351733711250 .351733711249 .351733711249 .351733711249 .351733711250
.3 .789184369295 .236755310789 .236755310787 .236755310787 .236755310787 .236755310787
.1 .912765271606 .0912765271606 .0912765271607 .0912765271607 .0912765271607 .0912765271607
0. 1. 0. 0. 0. 0. 0.
-.1 1.11832559159 -.111832559159 -.111832559159 -.111832559159 -.111832559159 -.111832559159
-.2 1.29585550909 -.259171101818 -.259171101819 -.259171101819 -.259171101819 -.259171101819
-.3 1.63134075727 -.489402227181 -.489402227179 -.489402227179 -.489402227179 -.489402227180
-.3678794 2.71699629690 -.999526967506 -.999526968054 -.999526968523 -.999526968523 -.999526968567 (*)
-.367879441 2.71819882847 -.999969465544 -.999969486184 -.999969489231 -.999969489231 -.999969488213 (*)
-.3678794411 2.71822804307 -.999980213267 -.999980244950 -.999980308832 -.999980308832 -.999980287537 (*)
-.36787944117 2.71827501627 -.999997493932 -.999997147213 -.999996845190 -.999996845190 -.999996945863
-.367879441171 2.71827636186 -.999997988949 -.999997981622 -.999998513658 -.999998513658 -.999998336313 (*)
-.367879441172 extremum nan Undefined Results

x w2 x.w2 y” y’ y mLWM(x)=(y"+y’+y)/3

-.1 35.7715206396 -3.57715206396 -3.57715206395 -3.57715206395 -3.57715206395 -3.57715206393 (***)
-.2 12.7132067889 -2.54264135778 -2.54264135777 -2.54264135777 -2.54264135777 -2.54264135777(**)
-.3 5.93779007804 -1.78133702341 -1.78133702343 -1.78133702343 -1.78133702343 -1.78133702343(**)
-.3678794 2.71956838514 -1.00047318578 -1.00047317903 -1.00047318242 -1.00047318242(*) -1.00047318129(***)
-.367879441 2.71836484166 -1.00003053833 -1.00003048465 -1.00003050716 -1.00003050716 -1.00003049966
-.3678794411 2.71833513146 -1.00001960888 -1.00001969150 -1.00001980550 -1.00001980550 -1.00001976750(**)
-.36787944117 2.71827346703 -.999996923998 Undefined Results
-.367879441171 2.71828594961 -1.00000151609 Undefined Results
-.367879441172 extremum nan Undefined Results


(*)-.999526967506*EXP(-.999526967506) = -.999526968567*EXP(-.999526968567) = -.3678794
Look’s like the precision of the result (in both methods) have no or few insight on accuracy; w.x.e^(w.x) “loose” precision, so having all the digits of the final result have no meaning!
(**) w.x.EXP(w.x)= x+d and mLWM(x)*EXP(mLWM(x))= x-d with d=1E-12 precision error accumulate the wrong way!
(***) (y”+y’+y)/3 introduce cutoof rounding error on lasts digits. HP28S lacks guard digits!!


Edited: 5 Sept 2012, 5:29 a.m.

#17

A better solution is to place only one RND (or ROUND) at the end of the actual estimation (step 035).

Edited: 4 Sept 2012, 2:19 a.m.


#18

What about stopping after, say, 20 iterations if the other tests fail? This is necessary for LWP(0.91) on my HP-42S or for LWP(1) on Free42 Decimal, for instance.

00 { 70-Byte Prgm }
01*LBL "LWM"
02 STO "Z"
03 +/-
04 LN
05 GTO 02
06*LBL "LWP"
07 STO "Z"
08 2
09 +
10 LN
11 ENTER
12 LN
13 -
14*LBL 02
15 20
16 STO 01
17 Rv
18 0
19 X<>Y
20 ENTER
21*LBL 01
22 Rv
23 DSE 01
24 X=Y?
25 RTN
26 RCL "Z"
27 RCL ST Y
28 E^X
29 /
30 RCL ST Y
31 X^2
32 +
33 X<>Y
34 1
35 X<>Y
36 +
37 LASTX
38 Rv
39 /
40 X<>Y
41 X#Y?
42 GTO 01
43 .END.

The HP-42S will accept complex arguments and will return complex results. This feature requires more tests though.

Examples:

1) LWM(2 + 3i)

2 ENTER 3 shift COMPLEX XEQ LWM --> -3.15828083899e-2 - i3.72110798664

2) LWP(-3)

3 +/- XEQ LWP --> 4.66997857926e-1 + i1.82173982301
LWP(-2) returns Invalid Data (it should return 0.17281600284 + i1.67368641374

3) Solve x^x = -0.5

0.52 +/- LN XEQ WLP e^x --> 1.39318341581 + i2.01002623336
ENTER y^x --> -4.99999999998e-1 -i2.62338076867e-12

0.5 +/- LN XEQ WLM e^x --> -1.069647624470 -i2.03857976038e-1
ENTER y^x --> -4.99999999999e-1 -i1.76183807687e-11

Edited: 4 Sept 2012, 11:44 p.m.


#19

Please excuse my ignorance, but did anybody read Namir's article in Solve 28 and applied Ostrowski's method to this problem? Just an idea ...


#20

Walter,

The problem doesn't lie on the solve method, but in floating point arithmetic. For instance, LWP(0.91) converges to a 11-digit result in only six iterations. The problem is the 12th digit which keeps oscillation among three different values from this point on:

0.5336689331425
0.5336689331427
0.5336689331422
0.5336689331425
0.5336689331427
.
.
.
Since the estimations will always be different the program will loop forever. One solution might be stopping when the difference between two consecutive estimations is less than an arbitrary epsilon. This has the disadvantage of being slower, also care has to be taken to scale epsilon when needed. In most cases the estimations will converge to one unique 12-digit result, in some cases they will oscillate between two slightly different values. That's why the program keeps track of the latest three estimations and tests every other one. However, in a few rare situations the estimations will oscillate between three distinct values, as above, and this scheme will fail. In this case, I would suggest stopping after an arbitrary number of iterations is performed. It will take longer than it should be, but at least it will stop and the result will be accurate enough. Also, just an idea...

Gerson.


#21

Yes the maximum number of loops is the ultimate safety measure - question however is how many are enough / too many / too few?

Also I find it interesting that this situation occurs for 0.91 is there any theoretical reason for it?

Best,
ÁM


#22

Hello Ángel,

Quote:
question however is how many are enough / too many / too few?

For the WP 34S program, 30 appears to be just right. Probably more data than what you might want or need:

  Number of iterations until convergence is reached (WP 34S)

LWP LWM
x DBLOFF DBLON DBLOFF DBLON

1,00000E+100 5 7
1,000000E+80 5 8
1,000000E+60 6 7
1,000000E+50 6 7
1,000000E+20 6 7
1,000000E+10 6 8
1,000000E+08 6 7
1,000000E+06 7 8
9,999970E+05 6 9
1,000000E+04 7 8
1,000000E+03 8 8
9,999999E+01 10 8
5,000000E+00 6 7
1,999990E+00 8 9
9,999700E-01 8 10
9,900000E-01 7 10
9,799997E-01 7 10
9,600000E-01 7 11
9,509999E-01 8 11
9,500000E-01 11 9
9,449999E-01 9 10
9,300000E-01 9 10
9,199993E-01 7 10
9,100000E-01 7 11
8,896000E-01 8 9
8,800000E-01 8 11
7,000000E-01 8 9
5,000000E-01 8 11
2,000000E-01 9 10
1,000000E-01 10 13
1,000000E-02 10 11
1,000000E-20 10 11
1,000000E-40 11 11
1,000000E-80 11 12
1,000000E-90 12 12
0,000000E+00 14 17
-1,00000000000E-90 12 12 12 13
-1,00000000000E-40 11 11 13 13
-1,00000000000E-30 10 11 11 12
-1,00000000000E-10 10 11 11 11
-9,99997000000E-02 10 11 8 9
-3,00000000000E-01 10 12 7 11
-3,10000000000E-01 11 13 6 9
-3,60000000000E-01 12 14 9 10
-3,67000000000E-01 15 21 10 11
-3,67800000000E-01 15 17 12 13
-3,67870000000E-01 20 22 13 16
-3,67879000000E-01 20 21 15 17
-3,67879439996E-01 22 24 19 22
-3,67879441100E-01 24 28 21 23
-3,67879441171E-01 27 30 24 27

Quote:
Also I find it interesting that this situation occurs for 0.91 is there any theoretical reason for it?

None that I am aware of. It occurs also for x=1 (See Omega constant), for x=2, for x=100 and many other points as you can see in the table (when I couldn't choose round numbers).

The equivalent WP 34S program appears to be very accurate. For instance,

.5 SQRT LN XEQ LWP E^X
returns 0.5, exactly, and

.5 SQRT LN XEQ LWM E^X
returns 0.25, exactly, in double precision. The latter is significantly faster than the built-in function (Wm).

Best regards,

Gerson.

--------------------

P.S.:

Quote:
the maximum number of loops is the ultimate safety measure

In order to minimize the number of extra loops, we can have separate max loops for positive and negative arguments. For instance 15 and 30, respectively:


00 { 79-Byte Prgm }
01*LBL "LWM"
02 STO "Z"
03 +/-
04 LN
05 GTO 02
06*LBL "LWP"
07 STO "Z"
08 2
09 +
10 LN
11 ENTER
12 LN
13 -
14*LBL 02
15 RCL "Z"
16 SIGN
17 X<0?
18 STO+ ST X
19 ABS
20 15
21 *
22 STO 01
23 Rv
24 0
25 X<>Y
26 ENTER
27*LBL 01
28 Rv
29 DSE 01
30 X=Y?
31 RTN
32 RCL "Z"
33 RCL ST Y
34 E^X
35 /
36 RCL ST Y
37 X^2
38 +
39 X<>Y
40 1
41 X<>Y
42 +
43 LASTX
44 Rv
45 /
46 X<>Y
47 X#Y?
48 GTO 01
49 .END.
C. Ret's HP-41 program has to be modified as well: LWP(0.911) loops forever.


Edited: 5 Sept 2012, 6:31 p.m.

#23

Here is a working HP-41 version and timings. I wonder how fast these would run in MCODE :-)

001*LBL 'LWM 
002 RCL X
003 CHS
004 LN
005 GTO 00
006*LBL 'LWP
007 RCL X
008 2
009 +
010 LN
011 ENTER^
012 LN
013 -
014*LBL 00
015 RCL Y
016 SIGN
017 X<0
018 ST+ X
019 ABS
020 10
021 *
022 STO 00
023 X<>Y
024*LBL 01
025 STO Y
026 RDN
027 E^X
028 /
029 R^
030 X^2
031 +
032 1
033 R^
034 +
035 LASTX
036 X<> Z
037 X<>Y
038 /
039 DSE 00
040 X=Y
041 RTN
042 GTO 01
043 END


x LWP(x) LWP(x) ( s )

9.999999e99 224.8431064 224.8431063 8.2
99.999999990 3.385630140 3.385630140 8.0
2.000000000 0.852605502 0.852605502 5.4
1.000000000 0.567143290 0.567143290 8.4
0.5 0.351733711 0.351733711 5.5
0.000000000 0.000000000 0.000000000 7.2
-0.367879441 -0.999969978 -0.999969471 16.6
-0.36787944 -0.999918898 -0.999920199 16.3
-0.3678794 -0.999526969 -0.999526967 16.1
-0.367 -0.932399183 -0.932399185 16.1
-0.36 -0.806084317 -0.806084316 15.9
-0.32 -0.560489483 -0.560489483 16.4
-0.3 -0.489402227 -0.489402227 15.7
-0.2 -0.259171102 -0.259171102 7.4
-0.0001 -0.000100010 -0.000100010 6.7
-1e-10 -1e-10 -1e-10 6.7
-1e-20 -2e-20 -2e-20 7.1
-9e-90 -9e-90 -9e-90 8.2


actual time
x LWM(x) LWM(x) ( s )


-0.36787944 -1.000086314 -1.000079806 14.9
-0.3678794 -1.000473741 -1.000473183 15.2
-0.367 -1.070791895 -1.070791887 15.5
-0.36 -1.222770133 -1.222770134 15.1
-0.32 -1.624849446 -1.624849446 5.5
-0.3 -1.781337024 -1.781337023 4.8
-0.2 -2.542641358 -2.542641358 4.8
-0.000000001 -23.89701959 -23.89701959 6.8
-1e-20 -49.96298427 -49.96298428 7.5
-2e-60 -142.4207440 -142.4207440 8.5
-9e-90 -210.3843700 -210.3843700 8.2


#24

I added the max loop-counts safety to the WL0 code. The longest time for the calculation (with 30 loops for oscillating cases) is 10.9 s (vs. the 16.6 in your table below).

x= -0.36787944 ; W= -0.999920190

This is further reduced to 1.45 s using the CL in TURBO50 mode.

Cheers,
Ángel

Edited: 7 Sept 2012, 5:30 a.m.


#25

Hello Ángel,

Have you tried other max loops values? For WL0 15 suffices in the WP 34S, even in double precision (34 digits). In the HP-41 program above, I've used only 10 (and 20 for WLM), but MCODE might require about 12 iterations, considering the calculations are carried out with 13 digits. The HP-41 program above doesn't keep track of the three latest estimations, as in C. Ret's programs, since it will eventually stop anyway. However, there will occur more situations the max iteration limit will be used. Perhaps this is not a good idea for the HP-41. C. Ret's code with max loops might be better on it.

Cheers,

Gerson.


#26

Gerson, good point - but it's not just the sheer number of loops but which one of the oscillations is closer to the actual value - which is never known in a predictable manner, if I'm not mistaken.

For instance, here's the results for W(-0.36787944) using different maximum numbers. You see it's not always "the more loops, the better" - look for instance at the worse error for n=25 over the result for n=20 - a true sweet spot!

loops	Result   	Error 
31 -0.999920192 -6.4846E-09
30 -0.99992019 -8.48476E-09
29 -0.999920188 -1.04849E-08
25 -0.999920188 -1.04849E-08
20 -0.999920191 -7.48468E-09
15 -0.999771104 -0.000149106
10 -0.993013658 -0.006907092

So we're somehow back to where we were before implementing the safety bailout, as we don't know exactly *when* to bail out to get the better accuracy.

Regards,
ÁM.

Edited: 7 Sept 2012, 9:26 a.m.


#27

"Better accuracy" is something that is easier said than done. If I get it right, the previous algorithms essentially try to solve the equation w ew = x. With this approach, a 10-digit result on a 10-digit machine sure is possible for x > 0, but as x gets negative a problem arises: There are numerous 10-digit solutions that satisfy this equation.

Example:

W(-0,367) = -0,93239 91847 479...
 
w w ew
---------------------------------
-0,9323991829 -0,36699 99999 51
-0,9323991847 -0,36699 99999 99
-0,9323991857 -0,36700 00000 25
-0,9323991865 -0,36700 00000 47
All this rounds to -0,367 on a 10-digit machine like the HP41.

In fact the value of w here may differ from the true result by
          5 E-11
dw = ---------
(1+w) ew
 
= 1,879 E-9
In other words: 8 valid digits is all we can expect. To ensure a result with 10-digit accuracy we would need two more digits, i.e. 12 digits working precision, so that w ew becomes 0,36700 00000 00. And in fact my W-routine for the 12-digit 35s returns 10 correct digits.

Here are some more results for other values of x and the required precision for a result that is at least within 1 ULP of the true value on a 10-digit calculator:

                                   potential       required
x true W(x) error in w precision
-------------------------------------------------------------
-0,1 -0,1118325592 +/- 6,3 E-11 10 digits
-0,2 -0,2591711018 +/- 8,8 E-11 10 digits
-0,3 -0,4894022272 +/- 1,6 E-10 11 digits
-0,36 -0,8060843160 +/- 5,8 E-10 11 digits
-0,367 -0,9323991847 +/- 1,9 E-9 12 digits
-0,3678 -0,9793607150 +/- 6,5 E-9 12 digits
-0,36787 -0,9928527298 +/- 1,9 E-8 13 digits
-0,367879 -0,9984521038 +/- 8,8 E-8 13 digits
-0,3678794 -0,9995269666 +/- 2,9 E-7 14 digits
-0,36787944 -0,9999201985 +/- 1,7 E-6 15 digits
-0,367879441 -0,9999694707 +/- 4,5 E-6 15 digits
-0,3678794411 -0,9999802922 +/- 6,9 E-6 15 digits
So even 13-digit precision is not sufficient in all cases if we want to be sure to get all 10 digits right. Time to think about a different approach?

Dieter

Edited: 7 Sept 2012, 3:19 p.m.


#28

In my last post I suggested using a different approach for the W function at least for x close to -1/e. A possible solution here is a series that converges quite well for such values:

For |x| < sqrt(2) let
p = sqrt(2 * (e*x + 1))
 
Then
W(x) = -1 + p - 1/3 p2 + 11/72 p3 - 43/540 p4 + ...
The first numeric challenge here is the evaluation of e*x + 1 for x close to -1/e. This problem can be solved easily:
     e*x + 1
= e * (x + 1/e)
We just have to provide a sufficiently exact value for 1/e and add this to x. This can be done in two steps - on a 12-digit calculator for instance like this:
   ...
e
1/x // 1/e or exp(-1) is slightly low in 12-digit precision
+
4,42321595524E-13 // next digits of 1/e
+
e
*
STO+ X
SQRT // result is p
...
10-digit devices like the HP-41 should use a slightly different way to calculate this value since e-1 is 1 ULP high and 2e is 1 ULP low.

I tried this method on a HP-35s for x < -0,3655. Larger x use an iterative approach based on the Halley-method. Here the iteration exits if the relative error is approx. 1E-10 (usually after not more than 5...7 loops). A final Newton step then adjusts the last 1-2 digits. These two methods combined should provide results with 11 correct digits. Which essentially is as good as it gets with 12-digit precision.

Some examples:

  x              W(x)            error [ULP]
--------------------------------------------
5 1,32672466524 0
2 0,852605502013 -1
1 0,567143290410 0
0,5 0,351733711249 0
0,2 0,168915973499 0
0,1 0,0912765271609 0
1E-10 9,999999999900 E-11 0
0 0 0
-0,1 -0,111832559159 0
-0,2 -0,259171101820 +1
-0,3 -0,489402227183 +3
-0,36 -0,806084315971 0
-0,3655 -0,890363279764 -1
-0,367 -0,932399184748 0
-0,3678 -0,979360714958 0
-0,36787 -0,992852729822 0
-0,367879 -0,998452103781 0
-0,3678794 -0,999526966608 0
-0,36787944 -0,999920198484 0
-0,367879441 -0,999969470701 0
-0,3678794411 -0,999980292245 0
-0,36787944117 -0,999997199775 0
-0,367879441171 -0,999998449288 0
This method also is quite fast. For x > -0,3655 only a few iterations are required (initial guess was ln(1+x)), and results for x closer to -1/e are returned in about 2 seconds on a 35s even when the series is evaluated up to p9.

An implementation for 13-digit M-Code precision on an HP-41 would require even less effort and should get all ten digits of the returned result right.

Dieter


Edited: 9 Sept 2012, 3:48 p.m.

#29

I just came across some... er... "special behaviour" of the (inverse) W-function when evaluated numerically. This may be related to the mentioned oscillation, so I decided to post it here. The expression w ew looks quite straightforward and simple, but numerically it isn't. The function obviously is continuous and monotonic, but in a real-world calculator application it isn't.

For instance take a look at W(-0,363). The true 12-digit value is w = -0,845361432987 (492...). As expected (cf. W'(x)), there is a 15 ULP wide interval where every w within that range returns -0,363 for w ew and the given precision. So we would expect all w from -0,845361432980 to ...995 to return -0,363 exactly.

Now see what actually happens:

      w                   w ew  
-----------------------------------------
-0,845361432973 -0,362999999999 ^
-0,845361432974 -0,362999999999 |
-0,845361432975 -0,362999999999 |
-0,845361432976 -0,362999999999 should be ...999
-0,845361432977 -0,362999999999 |
-0,845361432978 -0,363000000000 |
-0,845361432979 -0,362999999999 |
-0,845361432980 -0,363000000000 -------<
-0,845361432981 -0,362999999999 |
-0,845361432982 -0,363000000000 |
-0,845361432983 -0,362999999999 |
-0,845361432984 -0,363000000000 |
-0,845361432985 -0,363000000000 |
-0,845361432986 -0,363000000000 |
-0,845361432987 -0,363000000000 should be ...000
-0,845361432988 -0,363000000000
-0,845361432989 -0,363000000000 |
-0,845361432990 -0,363000000000 |
-0,845361432991 -0,363000000000 |
-0,845361432992 -0,363000000001 |
-0,845361432993 -0,363000000000 |
-0,845361432994 -0,363000000001 |
-0,845361432995 -0,363000000000 -------<
-0,845361432996 -0,363000000001 |
-0,845361432997 -0,363000000000 |
-0,845361432998 -0,363000000001 |
-0,845361432999 -0,363000000000 should be ...001
-0,845361433001 -0,363000000001 |
-0,845361433002 -0,363000000001 |
-0,845361433003 -0,363000000001 v
Looking at these results it becomes obvious why the last digit of w is something that is not easy to get:

There are cases where +1 ULP in w returns -1 ULP in w ew and vice versa. The positive slope of that function has become negative.

Dieter


Edited: 19 Sept 2012, 9:00 a.m.

#30

Double precision can be essential to get correct single precision (!) results close to -1/e:

 W(-0,36787 94411 71442 32159 55237 70161 4608)
= -0,99999 99999 99999 98085 12808 36402 42755 05091 27811 ...
The result returned by firmware 3.1 3225 has an error near 5,5E-18. Which was to be expected since all W within +/- 1E-17 of the true W return the same result for w*ew. In other words: if the argument is the lowest possible DP-value, the function result has only 16-17 valid digits.

In single precision the value closest to -1/e is

 W(-0,36787 94411 71442 3)
= -0,99999 99891 64620 96...
If evaluated with 16-digit precision, W has a potential error of approx. 1E-8. But since the internal function uses 34 digits, about 25 digits can be trusted so that the single precision result is fine.

The iterative approach works fine for x > 0, but it is not the method of choice as x gets (very) close to -1/e.

Dieter


#31

In single precision mode, the convergence criteria is a relative error of 1e-24 not 1e-34.

I'm waiting for the dust to settle here so that the firmware W function can be appropriately updated and improved.


- Pauli


#32

Yesterday I tried this program, but it is not accurate near -1/e:

  x                Wp(x)                                    Actual Wp(x)       
--------------------------------------------------------------------------------------------
5 1,3267246652422002236350992297758080 1,32672466524220022363509929775807966
2 0,8526055020137254913464724146953176 0,85260550201372549134647241469531747
1 0,5671432904097838729999686622103554 0,56714329040978387299996866221035555
0,5 0,3517337112491958260249093009299512 0,35173371124919577956902116966375615
0,2 0,1689159734991095651164749037058184 0,16891597349910958403285121676162817
0,1 0,0912765271608622642998957214231796 0,09127652716086226414304860554693732
1e-10 9,9999999990000000001499999999733330e-10 9,9999999990000000001499999999733333e-10
0 0,0000000000000000000000000000000000 0,00000000000000000000000000000000000
-0,1 -0,1118325591589629658335694568202658 -0,1118325591589629658395566025319567
-0,2 -0,2591711018190737450566519502154066 -0,2591711018190737125799216755694943
-0,3 -0,4894022271802149690362312519962932 -0,4894022271802149215425004058488412
-0,36 -0,8060843159708177782855213616209925 -0,8060843159708176441924365462909918
-0,3655 -0,8903632797650515808736815032956873 -0,8903632797650511410125773181789555
-0,367 -0,9323991847479284837164661908793559 -0,9323991847479277117471951896732207
-0,3678 -0,9793607149578284774761844434886448 -0,9793607149578289483571325035882182
-0,36787 -0,9928527298215372443479364374257017 -0,9928527298215230612044024383067153
-0,367879 -0,9984521037807272593182949803064360 -0,9984521037807256860219240479636937
-0,3678794 -0,9995269666075681262606395538798757 -0,9995269666076291859013736029737629
-0,36787944 -0,9999201984851193714121109223924577 -0,9999201984851193714121109223924577
-0,367879441 -0,9999694707005488274010434842119125 -0,9999694706971019586561055803031195
-0,3678794411 -0,9999802922445170178816356178098493 -0,9999802922403187022126758165541105
-0,36787944117 -0,9999971997752715886211339967727831 -0,9999971997513460442164046071411576
-0,367879441171 -0,9999984492882199170281330426992986 -0,9999984493035449784059665034874342
-e^(-1) -0,9999999999999998752093661285259866 -1,0000000000000000000000000000000000


x Wm(x) Actual Wm(x)
-------------------------------------------------------------------------------------------------
-1e-10 -26,295238819246925694110128821854920 -26,2952388192469256941101288218549182
-0,1 -3,577152063957297218409391963511995 -3,5771520639572971234088072378654033
-0,2 -2,542641357773526424293806156661848 -2,5426413577735260673762240912765265
-0,3 -1,781337023421627611974170281512745 -1,7813370234216276966066061504534446
-0,36 -1,222770133978505953142938073423862 -1,2227701339785062017995187488850206
-0,3655 -1,118287472596987280228640736301493 -1,1182874725969877971465393784455955
-0,367 -1,070791886768051874079258725213193 -1,0707918867680525920604850398376584
-0,3678 -1,020927239409427553738750824109443 -1,0209272394094270897113574392278679
-0,36787 -1,007181488951068775696115597981675 -1,0071814889510830592911361236474477
-0,367879 -1,001549495191275425771845523319594 -1,0015494951912768950563759062788449
-0,3678794 -1,000473182613217871681553793568512 -1,0004731826131567640203456903691404
-0,36787944 -1,000079805761663777082544792855552 -1,0000798057620190473215870952117257
-0,367879441 -1,000030529920822569299901895542387 -1,0000305299242695511452438950072974
-0,3678794411 -1,000019708014416801577108775092207 -1,0000197080186152831515755678992718
-0,36787944117 -1,000002800229955926824084223846546 -1,0000028002538814408950429424294271
-0,367879441171 -1,000001550713383222645797579566477 -1,0000015506980581836415922225569375
-e^(-1) -1,000000000000000000000000000000000 -1,0000000000000000000000000000000000


#33

Gerson,

Quote:
Yesterday I tried this program, but it is not accurate near -1/e:

Of course it isn't. Any iterative approach trying to solve w*e^w = x simply cannot return results with full accuracy close to -1/e because of the reasons mentioned in my other posts.

Just consider these examples:

   x                           true 16-digit W
---------------------------------------------------
-0,36787 94411 71442 3 -0,99999 99891 64621 0
 
But any W between Wmin = -1,00000 00000 00000
and Wmax = -0,99999 99802 70995 6
will also return W^-1 = -0,36787 94411 71442 3
in 16-digit precision.
So a result obtained by solving w e^w = x has only 7-8 digits that can be trusted. A reliable 16-digit result would require 25 digits working precision.
   x                           true 16-digit W
---------------------------------------------------
-0,36787 94411 71000 0 -0,99999 84492 88219 9
 
But any W between Wmin = -0,99999 84493 75868 5
and Wmax = -0,99999 84492 00576 3
will also return W^-1 = -0,36787 94411 71000 0
in 16-digit precision
So a result obtained by solving w e^w = x has only 9-10 digits that can be trusted. A reliable 16-digit result would require 23 digits working precision.
   x                           true 16-digit W
---------------------------------------------------
-0,36787 94411 00000 0 -0,99998 02922 44517 0
 
But any W between Wmin = -0,99998 02922 51413 3
and Wmax = -0,99998 02922 37620 7
will also return W^-1 = -0,36787 94411 71000 0
in 16-digit precision
So a result obtained by solving w e^w = x has only 10-11 digits that can be trusted. A reliable 16-digit result would require 22 digits working precision.

For x > 0 the first derivative of x(w) (or W-1, if you prefer) is greater than 1. This means that changing w by 1E-16 will change x by more than that, so that w*e^w will change significantly and the true root (=W) can be found easily. So the results in your table for 0,1 < x < 1 must have a different reason, probably the algorithm itself.

This changes at x < 0. Here, numerous 16- or 34-digit values of x will return the same value for w*e^w, so that finding the true value for w becomes virtually impossible. Take a look at the second example above: the last 5 or 6 digits simply make no difference. If the goal is a 34-digit result for x close to -1/e, about 50 digits working precision would be required. With the present 34 digits the best we can achieve is about 25 digit accuracy. So limiting the allowed error to 1E-24 actually is a good idea: it will not get better anyway.

That's why I would suggest an alternative solution for x close to -1/e. The series mentioned in my other post is such an idea. My 35s implementation usually returns 11 or 12 out of 12 correct digits and runs in roughly two seconds. I think I will try to get a different polynomial approximation for x close to -1/e that will return even better results with less effort.

BTW - even on the 34s the value of W(-1/e) cannot be computed exactly. Simply because -1/e cannot be given exactly. This is as close as we can get:

 x   =  -0,36787 94411 71442 32159 55237 70161 4608
W(x) = -0,99999 99999 99999 98085 12808 36402 4276

Dieter

Edited: 12 Sept 2012, 4:38 p.m.


#34

Quote:
Any iterative approach trying to solve w*e^w = x simply cannot return results with full accuracy close to -1/e because of the reasons mentioned in my other posts.

Understood. So cannot the built functions, which return similar results as the ones above. If accuracy is not desired or needed near -1/e, then the built-in programs should only be slightly changed to stop when a maximum number of iterations is reached, as in the following program:

001 LBL B		; Wm
002 RCL X
003 +/-
004 LN
005 SKIP 008
006 LBL A ; Wp
007 RCL X
008 2
009 +
010 LN
011 ENTER^
012 LN
013 -
014 RCL Y
015 SIGN
016 x=0?
017 RTN
018 x<0?
019 STO+ X ; if x<0, double max iter
020 ABS
021 1
022 5 ; if x>0, max iter = 15
023 *
024 STO 00
025 x<> Y
026 STO Y
027 Rv
028 ex
029 /
030 R^
031 x2
032 +
033 1
034 R^
035 +
036 RCL L
037 x<> Z
038 x<> Y
039 /
040 DSE 00 ; if the maximum number of iterations is reached
041 x=? Y ; or two consecutive estimates are equal,
042 RTN ; return
043 BACK 017 ; otherwise continue
044 END

Quote:
That's why I would suggest an alternative solution for x close to -1/e. The series mentioned in my other post is such an idea. My 35s implementation usually returns 11 or 12 out of 12 correct digits and runs in roughly two seconds. I think I will try to get a different polynomial approximation for x close to -1/e that will return even better results with less effort.

The most apparent problem with the built-in functions is they will run forever when the arguments approach -1/e. This shouldn't be difficult to fix. Once this is solved, then full accuracy in single precision should be the next goal. Of course, it would be nice if a solution to the latter is provided first, so the latter can be skipped.

Regards,

Gerson.


#35

Just two short notes:

First of all, computing Lambert's W with 16-digit accuracy for a 16-digit argument as close to -1/e as -0,3678794411714423 is no problem at all. The 34s runs XROM routines in double precision, i.e. with 34-digit precision. But all we need are 26 digits, maybe one or two more. So using CNVG 1 (relative error < 1E-24) should be perfectly fine and safely stop the iteration process.

Second, the number of iterations can be vastly improved if a different initial guess is used for x close to -1/e. This can be based on the series expansion for W. I did it this way:

  if x > -2/7
w = ln(1+x)
if w>1 then w = w - ln(w)
else
w = sqrt(2*e*(x + 1/e)) - 1
With the Halley method I use the iteration usually converges within 3 to 5 successive loops. Not 10, 20 or even 30.

Please note that in 34-digit precision the last digit of 1/e is rounded up which means that any valid x (i.e. > -1/e) will return a guess greater than -1. Otherwise the term x + 1/e would have to be computed somewhat differently.

Dieter

#36

Quote:
Of course it isn't. Any iterative approach trying to solve w*e^w = x simply cannot return results with full accuracy close to -1/e because of the reasons mentioned in my other posts.

I've tried to solve x*ln(x) = a instead.

Since

    Wp(x*ln(x))=ln(x),  x>0
Wp,m(-x*ln(-x))=ln(-x), x<0
Wm(a) and Wp(a), for a<0, can be found by solving for x
a + x*ln(-x) = 0 
The roots are x1 and x2, x1>x2, and
Wm(a) = ln(-x1)

Wp(a) = ln(-x2)
For instance, when solving
-0,2 + x*ln(-x) = 0
on the HP-15C
001 f LBL A
002 CHS
003 g LN
004 *
005 RCL+ 0
006 g RTN

0.2 CHS STO 0
ENTER f SOLVE A
we get
-0,07865836032  (x1)

CHS LN --> -2,542641357 (Wm(-0,2))
and
.5 CHS ENTER f SOLVE A  -->

-0,7716909740 (x2)

CHS LN --> -0,2591711018 (Wp(-0,2))
Some results for Wm(x) results near -1/e are only slightly better however:
                                     Actual
x Wp(x) Wp(x)
-0,367879441 -0,9999680416 -0,9999694707
-0,36787944 -0,9999209372 -0,9999201985
-0,3678794 -0,9995268764 -0,9995269666
-0,367 -0,9323991839 -0,9323991848
-0,36 -0,8060843158 -0,8060843160
-0,32 -0,5604894831 -0,5604894831
-0,3 -0,4894022273 -0,4894022272
-0,2 -0,2591711018 -0,2591711018
-0,0001 -0,0001000100 -0,0001000100
-1e-10 -1e-10 -1e-10

actual
x Wm(x) Wm(x)

-0,36787944 -1,000081930 -1,000079806
-0,3678794 -1,000473028 -1,000473183
-0,367 -1,070791889 -1,070791887
-0,36 -1,222770133 -1,222770134
-0,32 -1,624849446 -1,624849446
-0,3 -1,781337023 -1,781337023
-0,2 -2,542641357 -2,542641358
-0,000000001 -23,89701958 -23,89701959
-1e-20 -49,96298428 -49,96298428
-2e-60 -142,4207441 -142,4207440
-9e-90 -210,3843700 -210,3843700


Gerson.


#37

I did a bit of calculus to determine the possible uncertainty of w in your alternative approach. If I get it right, there is nothing gained or lost. #-)

Quote:
Some results for Wm(x) results near -1/e are only slightly better however

The error should be roughly the same. If you used the 15C solver: that one uses 13 digits internally which may explain why the results may be slightly better here and there.

There are ways to get full precision results even very close to -1/e. For instance by using the series expansion suggested in an earlier post. Anyway, I tried a completely iterative solution for the 34s:

  • Please consider the following code experimental. ;-) I think it works fine, but there may be errors.
  • The program uses different initial guesses for x close to -1/e or not. This leads to fast convergence, typically within 5 iterations.
  • Originally the program was intended for use in DP mode where the CNVG 1 test (relative error < 1E-24) can be used to check if the result is reasonably close to the true solution. This is fine since in the worst case (x = 0,3678794411714423) only 25-26 digits of w can be trusted. As the Halley method, which was used here, converges very fast, this means that the actual error at this point should be much lower. However, the program does not quit, it does one more final iteration, just to be sure.
  • The CNVG 1 test does not make sense in SP mode (unless the algorithm considers the current approximation exact because the calculated error is zero). So especially for use in SP mode a loop counter was added to make sure the iteration quits after 7 loops.
  • The final steps calculate an error estimate for the potential error in W. It is returned in Y and states the error in ULPs of W. This way you can see how many digits may be trusted.
Finally, here's the code:
001 *LBL A     // W_p(x)
002 CF 00
003 SKIP 002
004 *LBL B // W_m(x)
005 SF 00
006 CF 01
007 STO 00
008 # 007 // max. number of iterations
009 STO 01
010 # 028
011 SDR 002
012 +/- // threshold = -0.28
013 RCL 00 // is used to determine
014 x>? Y // whether x is close to -1/e or not
015 GTO 01
016 RM? // initial guess for x close to -1/e
017 RM 0 // make sure the last digit of 1/e
018 # eE // rounds to ...609 (DP) resp. ...424 (SP)
019 1/x
020 RM[->]Y // restore original rounding mode
021 RCL+ 00
022 # eE
023 [times]
024 STO+ X
025 [sqrt]
026 FS?C 00 // use positive or negative sqrt for W_p resp. W_m
027 +/-
028 DEC X
029 GTO 03 // start iteration
030 *LBL 01 // initial guess for x not close to -1/e
031 FS?C 00 // Flag 0 set => jump to W_m guess
032 GTO 02
033 LN1+x // else calculate a guess for W_p
034 ENTER[^]
035 x>0?
036 LN
037 x<0?
038 CLx
039 - // initial guess for W_p and x not close to -1/e
040 GTO 03 // start iteration with this guess
041 *LBL 02
042 +/-
043 LN
044 ENTER[^]
045 +/-
046 LN
047 - // initial guess for W_m and x not close to -1/e
048 *LBL 03 // iteration starts here
049 FILL
050 RCL 00
051 x[<->] Y
052 e[^x]
053 /
054 -
055 x[<->] Y
056 INC X
057 /
058 RCL L
059 1/x
060 INC X
061 RCL[times] Y
062 # 1/2
063 [times]
064 +/-
065 INC X
066 /
067 -
068 FS?C 01 // "last iteration" flag set?
069 SKIP 004 // then exit
070 CNVG? 01 // relative error < 1E-24?
071 SF 01 // then do one final iteration
072 DSE 01 // but not more than 7
073 GTO 03
074 ENTER[^] // estimate potential error
075 e[^x]
076 RCL+ 00
077 # 1/2
078 x[<->] Y
079 /
080 INC X
081 ROUNDI // estimated uncertainty in ULPs
082 x[<->] Y // is returned in Y
083 END
For use in XROM (i.e. DP mode) the loop counter in R01 may be omitted (008-009, 072), as well as the final error estimate (074-082). This would save about a dozen steps.

For x > 0 the results should be exact or only very slightly off, and for x close to -1/e here is a table with some results. The last column lists the potential error in w that cannot be avoided in this kind of iterative approach. You can see that the actual error is well within these limits.

 x                     W_p(x)                                  true W(x)   off by    pot. err.
----------------------------------------------------------------------------------------------
-0,360 -0,8060843159708177 782855213616209921 ......20 1,0 E-34 5,8 E-34
-0,365 -0,8798200914159538 111724840007674054 ......53 1,0 E-34 1,1 E-33
-0,366 -0,9021733294184565 416888289070060388 ....0402 1,4 E-33 1,4 E-33
-0,367 -0,9323991847479284 837164661908793578 ....3572 6,0 E-34 2,0 E-33
-0,3678 -0,9793607149578284 774761844434886510 ...86482 2,8 E-33 6,6 E-33
-0,36787 -0,9928527298215372 443479364374256686 ...56796 1,1 E-32 1,9 E-32
-0,367879 -0,9984521037807272 593182949803064015 ...64023 8,0 E-34 8,8 E-32
-0,3678794 -0,9995269666075681 262606395538800519 ..799867 6,5 E-32 2,9 E-31
-0,36787944 -0,9999201984840833 972182231127087247 ..073406 1,4 E-30 1,7 E-30
-0,367879441 -0,9999694707005488 274010434842087911 ..082067 5,8 E-31 4,5 E-30
-0,3678794411 -0,9999802922445170 178816356178045764 ..8059703 1,4 E-30 6,9 E-30
-0,36787944117 -0,9999971997752715 886211339967722079 .968007103 2,9 E-29 4,5 E-29
-0,367879441171 -0,9999984492882199 170281330427836840 0427995223 1,6 E-29 8,8 E-29
-0,3678794411714 -0,9999995203293061 370716284460929731 4458749854 2,2 E-28 2,8 E-28
-0,36787944117144 -0,9999998876545465 749486709944930174 9943194917 1,7 E-28 1,2 E-27
-0,367879441171442 -0,9999999581864317 712633743446609641 3442169514 4,4 E-28 3,2 E-27
-0,3678794411714423 -0,9999999891646209 649647119617625498 9729425456 1,1 E-26 1,2 E-26
----------------------------------------------------------------------------------------------

Now try it and see how this performs compared to your solution. I do not think it is much more accurate, but at least it should be faster. ;-)

Dieter


#38

Hello Dieter,

Quote:
Now try it and see how this performs compared to your solution. I do not think it is much more accurate, but at least it should be faster. ;-)

It being faster and as accurate as shown in your examples is reason enough for me to replace mine with your version :-)
I'm now leaving to the newspaper stand to get my copy of a weekly magazine before it sells out, but I will surely try it later! The estimate of the potential error in stack register Y is an interesting feature at this point, but perhaps it should be omitted (or moved to stack register T) if this makes into WS 34S.

Best regards,

Gerson.

#39

This code will make it onto the 34S :-) It should be there in the next firmware build.

For xrom, I think the CNVG? 01 would be better suited as CNVG? 03. Is there any reason not to do this? In xrom, CNVG? 03 does either CNVG? 01 or CNVG? 02 depending on the caller's double precision mode setting (from user programs it does CNVG? 00 or CNVG? 02 instead.

I'll also keep the iteration limit, I've been caught out too many times not having one in place :-(


- Pauli


#40

If the code should make it to XROM, please remove the final error estimate (074 ff). This was intended as a kind of experimental feature to check accuracy close to -1/e (will not work correctly otherwise). ;-) On the other hand, the loop counter may remain as an additional safeguard.

I think CNVG 1 is the most suitable test here. The Halley method converges so well that at the point where the last two approximations differ by 1E-24 or less, the actual accuracy should already be as good as it gets. Close to -1/e the possible accuracy drops to 25 digits anyway (or even to 17 digits if x is the closest possible 34-digit-value near -1/e). Just to be sure, one additional (final) loop is executed.

There is a way to get 30 or even all 34 digits right. This would include a second method to compute W(x) for x close to -1/e, requiring 5 - 10 numeric constants - and, if possible, all 39 digits working precision, i.e. a regular internal C-coded function. It's the usual tradeoff between accuracy and numerical effort.

Dieter


#41

I'd already taken out the error estimate code. I'd have liked to have supported interval arithmetic so that error estimation came along with the values on the stack -- maybe on the next calculator.

I'll switch back to CNVG? 01.

I don't think this will go back to C. Double precision constants are possible and a lot cheaper space wise than constants in C (16 bytes vs 36 or 40).


- Pauli


#42

All 34 digits are hard to get with a user code XROM routine. But if 30 digits will do, there is a way. It would require eight (well... actually more like six) constants with 4 to 34 significant digits and two more to provide 60+ digits of 1/e. The rest is simple. Even a precise W+1(x) function would be possible.

BTW - the original post was about a 41C program for the W function. With 13-digit M-Code precision a full 10-digit result is quite easy to achieve. ;-)

Dieter

Edited: 16 Sept 2012, 7:38 p.m.


#43

The full 34 digits in double precision mode definitely aren't required.

The current code is 170 bytes long. Eight more double precision constants is 128 bytes plus the extra code required at two bytes per step. It doesn't sound like double precision is required for all the constants which would save some bytes. I'm not sure the extra space used is worth the gain -- single precision will be good enough regardless.

W(x) + 1 might be a worthwhile function by itself and this would justify the extra space.


- Pauli


#44

I've been testing a bit, and here's a first result.

The following program evaluates Wp and Wm as well as Wp+1 and Wm+1. The code should return 30+ valid digits for Wp and Wm and some less for the two W+1 functions. This program is intended to run in double precision only (!). For x close to -1/e a series expansion is used, where the coefficients are prestored in R05...R08. The lower coefficients are coded directly. As far as I understand, in XROM this can also be done for the higher coefficients so that the respective four registers are not required.

And again the usual caveat applies: this is experimental code. ;-)

Edit: corrected an errror in line 007. Of course it's flag 01, not 02.

001 *LBL A  // W_p
002 CF 00
003 CF 01
004 GTO 00
005 *LBL B // W_m
006 SF 00
007 CF 01
008 GTO 00
009 *LBL C // W_p + 1
010 CF 00
011 SF 01
012 GTO 00
013 *LBL D // W_m + 1
014 SF 00
015 SF 01
016 *LBL 00
017 CF 02
018 CF 03
019 STO 00
020 # 007 // max. 7 iterations
021 STO 01
022 # 028
023 SDR 002
024 +/-
025 RCL 00
026 x>? Y
027 GTO 01
028 # 086
029 SDR 009
030 # eE
031 1/x
032 SDL 033
033 IP
034 SDR 033
035 RCL+ 00
036 RCL+ 02
037 x[<=]? Y // check if x is *very* close to -1/e
038 SF 03 // if so, set flag 3
039 # eE
040 [times]
041 STO+ X
042 [sqrt]
043 FS?C 00
044 +/-
045 FILL
046 RCL[times] 08
047 RCL+ 07
048 [times]
049 RCL+ 06 // evaluate series expansion up to 8 terms
050 [times] // to get 30+ digits even very close to -1/e
051 RCL+ 05 // use coeff. 8...5 directly from r08...r05
052 [times]
053 # 043
054 # 054
055 SDL 001
056 / // coeff. 4 = -43/540
057 -
058 [times]
059 # 011
060 # 072
061 / // coeff. 3 = 11/72
062 +
063 [times]
064 # 003
065 1/x // coeff. 2 = -1/3
066 -
067 [times]
068 [times]
069 +
070 FS? 01 // add the final -1 only for W_p and W_m
071 FC? 03 // or if the iterative method is used
072 DEC X
073 FS?C 03 // for x very close to -1/e
074 GTO 04 // result has 30+ digits, so skip iteration and exit
075 GTO 03 // else jump to iteration
076 *LBL 01
077 FS?C 00
078 GTO 02
079 LN1+x
080 ENTER[^]
081 x>0?
082 LN
083 x<0?
084 CLx
085 -
086 GTO 03
087 *LBL 02
088 +/-
089 LN
090 ENTER[^]
091 +/-
092 LN
093 -
094 *LBL 03
095 FILL
096 RCL 00
097 x[<->] Y
098 e[^x]
099 /
100 -
101 x[<->] Y
102 INC X
103 /
104 RCL L
105 1/x
106 INC X
107 RCL[times] Y
108 # 1/2
109 [times]
110 +/-
111 INC X
112 /
113 -
114 FS?C 02
115 SKIP 004
116 CNVG? 01 // last change < 1E-24?
117 SF 02 // then raise flag for final iteration
118 DSE 01 // loop counter, exit after 7 loops or less
119 GTO 03
120 FS?C 01 // exit routine after iteration finished
121 INC X // add 1 if the result is W_p + 1 or W_m + 1
122 *LBL 04 // common exit point for all cases
123 END

Flags:
------
00 clear: primary branch, set: secondary branch
01 clear: W_p and W_m, set: W_p+1 amd W_m+1
02 clear: iteration limit not yet reached, set: final loop
03 clear: use iteration close to 1/e, set: result exact => exit

Registers:
----------
00 x
01 loop counter

R02, 05...08 store some required constants

Instructions:
-------------
 
Enter the program.
Switch to double precision
 
DBLON
 
and prestore the following constants:

Digits 34 to 67 of 1/e:
 
8,674458111310317678345078368016975 E-34 STO 02
 
More than 12 digits cannot be entered directly,
but it can be done this way:
 
8,67445811131 E-34 STO 02
0,31767834507 E-46 STO+ 02
8,368016975 E-58 STO+ 02
 
Coefficients for the series expansion:
 
769 ENTER 17280 / STO 05
-221 ENTER 8505 / STO 06
102180 ENTER 6535073 / STO 07
-1963 ENTER 204120 / STO 08
 
Please note the negative sign in R06 and R08.

According to my results (and the code ;-)) Wp and Wm should carry at least 30 valid digits. The error should not exceed 2 units in the 31st digit.

Please note: this is experimental code. Wp+1 and Wm+1 are quite accurate even close to -1/e, but here only 27 digits may be correct. More terms of the series expansion would provide better results.

Dieter


Edited: 17 Sept 2012, 6:53 p.m. after one or more responses were posted


#45

We can compile constants in which are then addressed by a special command in XROM. No registers are involved.

#46

Quote:
This code will make it onto the 34S :-)

I'm glad it will. Dieter has done a pro job. I'm looking forward to next firmware version, I'm still using 3.1 3089.

Quote:
I'll also keep the iteration limit, I've been caught out too many times not having one in place :-(

I have yet to find an example in which the iteration limit has been reached. But it's a good idea to keep it, just in case :-)

Gerson.

#47

A short note: The initial W estimate can be improved so that the number of iterations for Wm will be reduced and the program will run faster. It's just a tiny tweak that can be applied to the original code (dated 16 Sept) as well as the high-precision version (17 Sept):

   ...
# 007
STO 01
1/x
FC? 00
STO+ X
+/-
RCL 00
x>? Y
...
This will set the threshold to -1/7 for Wm and to -2/7 for Wp.

Dieter

#48

Gerson,

after taking a closer look at your table I wonder why I get the same or at least very similar results compared to those that you consider inaccurate (underlined digits). To me it seems that the reference values are wrong, not the ones you computed.

Example:

  x              your value                              true W
---------------------------------------------------------------------------------------------
-0,367 -0,9323991847479284837164661908793559 -0,9323991847479284837164661908793572
-0,367879441171 -0,9999984492882199170281330426992986 -0,9999984492882199170281330427995223
This means 32 resp. 27 correct digits. For 34-digit precision the error estimate is 32 resp. 28 digits, so your results are quite close.

Time to check your source for the exact W values. ;-)

Dieter


#49

Quote:
Time to check your source for the exact W values. ;-)

Definitely!

I think I won't throw my 75-step program away, even though it is not optimized yet :-)

Here are some checks again W|A, assuming it's a reliable source:

Wp(-0,367)

-0,93239918474792848371646619087935720846714799929561 (WolframAlpha)

-0,9323991847479284837164661908793551 (WP 34S)

-0,9323991847479284837164661908793559 (75-step program)

-0,9323991847479284837164661908793559 (44-step program)

Wp(-0.3678794411714423)

-0,99999998916462096496471197294254560370337054204797 (WolframAlpha)

-0,9999999891646209649647119772063050 (75-step program)

-0,9999999652840027343156721423374225 (44-step program)

Wm(-0,367)

-1,07079188676805187407925872521318830248331092315916 (WolframAlpha)

-1,070791886768051874079258725213044 (WP 34S)

-1,070791886768051874079258725213195 (75-step program)

-1,070791886768051874079258725213193 (44-step program)

Wm(-0.3678794411714423)

-1,00000001083537911330558114770448324981688163939604 (WolframAlpha)

-1,000000010835379113305581129710926 (75-step program)

-1,000000010877230017527477964899792 (44-step program)

Regards,

Gerson.

Edited: 13 Sept 2012, 10:23 a.m.

#50

I think it is possible to replace steps 29 through 34 by:

29 [<->] YZXY
30 INC X


Got to love the stack shuffle command for saving steps :-)


- Pauli


#51

Wonderful, the 34S keeps amazing me ! (just need to reverse the stack order as C.Ret has ordered the stack t,z,y,x)

Here is the program I've entered in my 34S:

                  t:          z:          y:                  x:      L:        R00
001 LBL "LWM x
002 STO 00 x
003 CHS -x x
004 LN LN(-x) -x
005 GTO 02
006 LBL "LWP x
007 STO 00 x
008 2 x 2
009 + x+2 2
010 LN LN(x+2) x+2
011 ENTER^ LN(x+2) LN(x+2)
012 LN LN(x+2) LN(LN(x+2)) LN(x+2)
013 - LN(x+2)-LN(LN(x+2))
014 LBL 02 y0
015 0 y0 0
016 x:y 0 y0
017 ENTER^ 0 y0 y0
018 LBL 01 y' y y"
019 RDN y' y
020 x=y?
021 STOP
022 RCL 00 y' y x x
023 RCL Y y' y x y
024 E^X y' y x exp(y) y
025 / y' y x/exp(y) exp(y)
026 RCL Y y' y x/exp(y) y
027 x^2 y' y x/exp(y) y² y
028 + y' y x/exp(y)+y² y²
029 [<>] YXZY y y' x/exp(y)+y² y
030 INC X y y' x/exp(y)+y² 1+y
031 / y y' (x/exp(y)+y²)/(1+y) 1+y
032 x:y y' y y"
033 CNVG? 03 y' y y"
034 RTN
035 GTO 01
036 END

I've tested all values listed by Gerson above for LWP(x) and one is not converging: -0.36787944

Edited: 3 Sept 2012, 7:44 p.m.


#52

Steps 15 - 17 look suitable for the [<->] command too :-)

Try turning double precision mode on. The convergence tolerance there is (relatively) less aggressive.

- Pauli


#53

Maybe I should soften the single precision convergence criteria for CNVG? 1e-15 is pretty exacting.


- Pauli


#54

Maybe, moving to double precision because the tolerance is different don't sound as the right solution to me.

Also the test x~? Y suggested above is AFAIK using ROUND so the result depends on the display format which is also not good for a function such as Wp(x)

And, yes, steps 16 & 17 can be replaced by [<->] YYXT !

Edited: 3 Sept 2012, 8:18 p.m.


#55

Moving to double precision would indicate if the tolerance is the problem (it seems likely though). The single precision tolerance probably should be more like 1e-13.

I agree not to use x~? Y conditional.

BTW: if this code goes into xrom to implement the built in function, it will always run in double precision.


- Pauli


#56

Yes, DBLON solves the problem for LWP(-0.36787944). However I've observed that with DBLON the build-in Wp function don't work for -0.367879441, it's stuck on "Waiting..." (FW rev 3246).

#57

The 34S almost certainly can be improved.

If someone wants to rework the keystroke code into a suitable form, we'd definitely consider including it.

The current version is included below. Ignore the reference to the C version -- we're not using that anymore.


- Pauli

/**************************************************************************/
/* The positive branch of the real W function.
* A rough guess is made and then Newton's method is used to attain
* convergence. The C version works similarily but uses Halley's
* method which converges a cubically instead of quadratically.
*
* This code is based on Jean-Marc Baillard's HP-41 version from:
* http://hp41programs.yolasite.com/lambertw.php
*
* Uses regiser I for temproary storage.
*/

XLBL"W0"
xIN MONADIC
SPEC?
JMP ret_NaN
x=0?
xOUT xOUT_NORMAL
Num eE
1/x
+/-
x[<->] Y
x<? Y
JMP ret_NaN
STO I
LN1+x
LamW0_loop:: FILL
+/-
e[^x]
RCL[times] I
-
x[<->] Y
INC X
/
-
CNVG? CVG_RELATIVE
xOUT xOUT_NORMAL
JMP LamW0_loop

/**************************************************************************/
/* The negative branch of the real W function.
* Uses registesr J and K for temporary storage.
*/

XLBL"W1"
xIN MONADIC
SPEC?
JMP ret_NaN
x>0?
JMP ret_NaN
x=0?
JMP ret_neginf
STO I
+/-
1/x
Num eE
x>? Y
JMP ret_NaN
RCL I
+/-
LN
JMP LamW0_loop


#58

Unless there's something hidden, IMHO it should read "uses register I for temporary storage" also in the comment about W1. No effect on J or K seen.


#59

Older comments than code, that's all.


- Pauli

#60

I hadn't noticed W was slow on the WP 34 for certain arguments until using it to check my results. C. Ret's code above is very compact and the conversion should be straightforward. Of course this should wait until the algorithm is fully tested, which hasn't been done yet.

Gerson.

#61

This thread started with a HP-41 program. Here's my current version of the Lambert W function on this hardware. Close to -1/e it uses a series expansion with slightly tweaked coefficents (Pauli: yes, as you can see here, the number of digits can be reduced substantially). The results should be reasonably accurate, i.e. with only a few ULP off. At least I hope so. ;-)

Here's the code:

Edited to correct two errors in line 006 and 097

001  LBL "W"
002 STO 00
003 -,357
004 X<>Y
005 X<=Y?
006 GTO 98
007 SF 00
008 FS? 01
009 GTO 00
010 LN1+X
011 STO 01
012 X>0?
013 LN
014 X>0?
015 ST- 01
016 GTO 01
017 LBL 00
018 CHS
019 LN
020 ENTER
021 CHS
022 LN
023 -
024 STO 01
025 LBL 01
026 RCL 01
027 RCL 00
028 RCL 01
029 E^X
030 /
031 -
032 RCL 01
033 1
034 +
035 /
036 ENTER
037 ENTER
038 LAST X
039 1/X
040 1
041 +
042 -2
043 /
044 *
045 1
046 +
047 /
048 ST- 01
049 RCL 01
050 FC? 00
051 GTO 99
052 X<>0?
053 /
054 ABS
055 1 E-8
056 X>Y?
057 CF 00
058 GTO 01
059 LBL 98
060 1
061 E^X
062 STO Z
063 1/X
064 +
065 2,855767840 E-11
066 -
067 *
068 ST+ X
069 SQRT
070 FS? 01
071 CHS
072 ENTER
073 ENTER
074 ENTER
075 -84
076 /
077 ,7
078 FS? 01
079 SIGN
080 *
081 ,0156
082 +
083 *
084 ,025962
085 -
086 *
087 ,0445
088 +
089 *
090 ,07963
091 -
092 *
093 11
094 ENTER
095 72
096 /
097 +
098 *
099 3
100 1/X
101 -
102 *
103 *
104 +
105 1
106 -
107 LBL 99
108 END
 
Flag 01 clear => Wp
Flag 01 set => Wm
I think the program runs quite fast, and further improvements of the accuracy will not be trivial. ;-)

Dieter

Edited: 23 Sept 2012, 6:26 p.m. after one or more responses were posted


#62

Very nice! This appears to surpass all previous attempts on this matter.

Here are three keystrokes sequences for the HP-35 (and other non-programmable RPN calculators). These would have been more useful 40 years ago. Anyway, here they are :-)

Wp, x >= e

1) ENTER^ ENTER^ ENTER^ LN
2) / LN
3) Repeat step 2 until the result converges to the desired accuracy

Wp, -1/e <= x < e (*)

1) CHS ENTER^ ENTER^ ENTER^
2) ex *
3) Repeat step 2 until the result converges to the desired accuracy
4) CHS

(*) this may take too many iterations near the limits of the convergence interval.

Wm, -1/e < x < 0

1) 1/x CHS ENTER^ ENTER^ ENTER^
2) * LN
3) Repeat step 2 until the result converges to the desired accuracy
4) CHS

Gerson.

#63

Quote:
Very nice! This appears to surpass all previous attempts on this matter.

Thank you - but there is still room for improvement. :-)

This weekend I tried a new approach for x close to -1/e. It's essentially an iterative solution, but this time it solves for W+1, not W itself and uses one or two tricks to avoid digit loss. The following program will run on a 34s in double precision mode. It should return results with 33 digit accuracy (+/- 1 ULP). Please do your own tests and see if you find a case where the error exceeds 1 unit in the 33th digit.

Here is the code:

001*LBL A
002 CF 00
003 SKIP 002
004*LBL B
005 SF 00
006 STO 00
007 CF 02
008 # 007
009 STO 01
010 # 035
011 FS? 00
012 # 025
013 SDR 002
014 +/-
015 RCL 00
016 x>? Y
017 GTO 01
018 # eE
019 1/x
020 SDL 033
021 IP
022 SDR 033
023 RCL+ 00
024 RCL+ 03 // cf. note below
025 # eE
026 [times]
027 STO 02
028 STO+ X
029 [sqrt]
030 FS?C 00
031 +/-
032 RCL L
033 # 003
034 /
035 -
036*LBL 00 // Newton iteration for W+1
037 FILL // and x close to -1/e
038 +/-
039 e[^x]-1
040 RCL[times] 00
041 # eE
042 [times]
043 RCL- Y
044 RCL+ 02
045 RCL/ Y
046 +
047 FS?C 02
048 SKIP 004
049 CNVG? 06 // test if absolute error < 1E-32
050 SF 02
051 DSE 01
052 GTO 00
053 DEC X
054 GTO 04
055*LBL 01
056 FS?C 00
057 GTO 02
058 LN1+x
059 x[<=]1?
060 GTO 03
061 ENTER[^]
062 LN
063 -
064 GTO 03
065*LBL 02
066 +/-
067 LN
068 ENTER[^]
069 +/-
070 LN
071 -
072*LBL 03 // Newton-Halley iteration for W
073 FILL // x not close to -1/e
074 RCL 00
075 x[<->] Y
076 e[^x]
077 /
078 -
079 x[<->] Y
080 INC X
081 /
082 RCL L
083 1/x
084 INC X
085 RCL[times] Y
086 # 1/2
087 [times]
088 +/-
089 INC X
090 /
091 -
092 FS?C 02
093 SKIP 004
094 CNVG? 01 // test if relative error < 1E-24
095 SF 02
096 DSE 01
097 GTO 03
098*LBL 04
099 END

The program requires one double-precision constant,
representing the final digits of 1/e, here stored in R03:
 
8,6744 58111 31031 76783 45078 36801 6975 E-34 STO 03
 
x [A] => Wp(x)
x [B] => Wm(x)

After some tests I really like this version. The iteration usually converges in 5 iterations or less to 33 valid digits (+/- 1 ULP).

Dieter


Edited: 23 Sept 2012, 2:21 p.m.


#64

Hello Dieter,

Regarding the HP-41 program, is there perchance a typo in one of the constants? For WP(-0.367) and WP(-0,367) I get -0.93250022 and -1.070690871, respectively, instead of -0.9323991848 and -1.070791887. Also, it appears line 006 should be GTO 98. Geir Isene has created a menu interface, like the ones in the Advantage Module Applications, which makes it even easier to use (please take a look at his listing below). By the way, the motivation of the original post was providing faster Lambert W routines than the ones he had in his webpage. Mine, for Wm, was particularly too slow. I think this goal has been reached, far beyond my expectations thanks to your contribution.

Thank you very much,

Gerson.

001*LBL "WL"
002 "V:2"
003 CF 01
004 "WP WM"
005 PROMPT
006 GTO "WL"
007*LBL B
008 SF 01
009*LBL A
010 STO 00
011 -,357
012 X<>Y
013 X<=Y?
014 GTO 98
015 SF 00
016 FS? 01
017 GTO 00
018 LN1+X
019 STO 01
020 X>0?
021 LN
022 X>0?
023 ST- 01
024 GTO 01
025*LBL 00
026 CHS
027 LN
028 ENTER
029 CHS
030 LN
031 -
032 STO 01
033*LBL 01
034 RCL 01
035 RCL 00
036 RCL 01
037 E^X
038 /
039 -
040 RCL 01
041 1
042 +
043 /
044 ENTER
045 ENTER
046 LAST X
047 1/X
048 1
049 +
050 -2
051 /
052 *
053 1
054 +
055 /
056 ST- 01
057 RCL 01
058 FC? 00
059 GTO 99
060 X<>0?
061 /
062 ABS
063 1 E-8
064 X>Y?
065 CF 00
066 GTO 01
067*LBL 98
068 1
069 E^X
070 STO Z
071 1/X
072 +
073 2,855767840 E-11
074 -
075 *
076 ST+ X
077 SQRT
078 FS? 01
079 CHS
080 ENTER
081 ENTER
082 ENTER
083 -84
084 /
085 ,7
086 FS? 01
087 SIGN
088 *
089 ,0156
090 +
091 *
092 ,025962
093 -
094 *
095 ,0445
096 +
097 *
098 ,07963
099 -
100 *
101 11
102 ENTER
103 72
104 /
105 -
106 *
107 3
108 1/X
109 -
110 *
111 *
112 +
113 1
114 -
115*LBL 99
116 END

#65

Gerson, thank you for your comment.

Yes, there are actually two errors I just discovered (and corrected) after having compared the listing with the program in my 41CV. Of course the jump in line 006 has to be a GTO 98, and the minus in line 097 has to be a plus (the coefficients have alternating signs). If you correct the respective lines in your version everything should be fine.

In the meantime I changed the constants a bit which slightly reduces the remaining error in the last digit. The current version returns exactly the values you posted. Actually the last digit of Wp(-0,367) should be rounded down to ...7 (it's -0,9323991847 479...). ;-)

Regarding a fast Lambert routine: of course the fastest method would be an M-code program. With 13 digits working precision and a targeted accuracy of, say, +/- 1 unit in the 11th digit, things are quite relaxed. However, I do not understand anything about M-code programming, so I will leave that to the experts. ;-)

Dieter


#66

It's fine now. Thanks!

Gerson.

#67

Quote:
the fastest method would be an M-code program.

The SandMath module has functions WL0 and WL1. A few days ago I added a maximum loop iterations safety to avoid the oscillations condition. It´s way faster, but not as accurate as your solutions in the vicinity of -1/e

The new version is ready, I´ll submit it to TOS in a few days - pls. email if interested.

Cheers,
ÁM


#68

Sure the Sandmath functions will run much faster on the 41 than a regular FOCAL program. But now imagine how fast an how accurate this could be with an optimized algorithm. ;-)

Close to -1/e two problems arise if W is determined by finding the root of w e^w = x. First, the usual methods (Newton, Halley etc.) will converge slower and slower as x approaches -1/e, increasing the number of iterations substantially. Then, the determined result is not quite accurate, since close to -1/e there are hundreds or thousands of n-digit values of W that satisfy the mentioned equation. With 13-digit precision, 10-digit results should be fine down to -0,367879, while at the limit (x = -0,3678794411) even the 8th digit may be off.

Both problems can be addressed by a simple solution: use two different methods for values of x that are close to -1/e and those that are not. Here is an algorithm that should work fine in 13-digit precision as it is available in M-code:

if x < -0,367845
a = x + 0,3678794411 // first 10 digits of 1/e
a = a + 7,14423216 E-11 // next digits of 1/e
p = sqrt(2*e*a) // use positive root for Wp, negative for Wm
w = (((-p*43/540 + 0,1527825)*p - 1/3)*p + 1)*p // 43/540 may be replaced by 0,07963
return w-1
else
set initial guess for Wp resp. Wm
Solve w e^w = x using Newton's method
quit if last two approximations are sufficiently close (first ten digits agree)
return w
The resulting error for x close to -1/e should be less than +/- 1E-11. The constants were chosen so that even W+1 is correct within 1 unit of the 10th significant digit, so this may be another option for an additional function. If this is not required, the error in W can even be reduced or the interval for the (fast) polynomial may become wider. Since the iterative solver does not have to handle these critical cases it should safely converge within, say, 5 or 6 iterations.

Dieter

Edited: 24 Sept 2012, 6:24 p.m.


#69

Intriguing to say the least... I´ll give it a go during the week-end - need to make some room in the ROM first, it´s currently packed and bursting through its seams.

To be continued...


#70

The suggested method should work fine in a 13-digit environment when the result is expected to have 10 valid digits. There is just one downside - moderately close to -1/e (say, 0,3 ... 0,367) the number of required iterations may be as large as 8 or 10 while usually not more than 5 or maybe 6 will do. This can be improved by a simple trick:

While the direct evaluation of w by means of a polynomial is sufficienctly exact (max. error approx. 1 unit in the 11th digit) for x very close to -1/e, this result may also serve as a very good estimate for other x. If for all x up to -0,23 the initial estimate is determined this way, the number of iterations for x<0 should not exceed five.

For the iteration itself a simple Newton style solver is fine. Newton-Halley converges faster, but the result is obtained in not more that just one or maybe two iterations less, while on the other hand every single loop takes a bit longer. So all in all nothing is gained.

Here's the complete algorithm for an M-code program, using 13 digits internal precision and returning a 10-digit result:

if x > -0.23 then            // set guess for x not close to -1/e
If Wm then
a = ln(-x)
w = a - ln(-a)
else
w = ln(1+x)
if w > 1 then w = w - ln(w)
end if
else // set guess for x close to -1/e
a = x + 0.3678794411 // first ten digits of 1/e
a = a + 7.14423216 E-11 // next digits of 1/e
p = sqrt(5.436563656918 * a) // 5.436... is 2*e
if Wm then p = -p
w = (((-p * 43/540 + 0.1527825) * p - 1/3) * p + 1) * p - 1
// at this point w is exact (max. error ~ 1D11) if w is within 1 +/- 0,014
// In all other cases w is a very good estimate
end if
 
if abs(w + 1) > 0.014 then // do iteration only if w is not yet sufficiently exact
repeat
dw = (w - x / exp(w)) / (1 + w) // use x/exp(w), not x*exp(-w)
w = w - dw
Until (abs(dw) * 1E11 <= abs(w)) // test if relative error < 1E-11
end if // and avoid divide-by-zero-error at w = 0
 
return w
Try it and see how it performs. The results should be exact to all 10 returned digits with a max. error near +/- 0,6 ULP, just as many of the 41C's internal transcendental functions.

Dieter

#71

FWIW, here is the updated Geir Isene's page on the Lambert W function for the HP-41 calculator:

http://isene.me/hp-41/wl/

Gerson.


Possibly Related Threads...
Thread Author Replies Views Last Post
  HP-41(CL): The easiest way to transfer FOCAL programs from a Linux PC to the HP-41 Geir Isene 13 1,850 12-05-2013, 02:40 AM
Last Post: Hans Brueggemann
  Lambert W functions on WP 34s Hans Milton 11 919 11-25-2011, 10:49 AM
Last Post: Ángel Martin
  Lambert W (HP-12C+) Gerson W. Barbosa 10 830 11-17-2009, 02:29 PM
Last Post: Gerson W. Barbosa
  Lambert's W on the HP-33s Gerson W. Barbosa 17 1,608 02-14-2008, 06:50 PM
Last Post: Gerson W. Barbosa
  Anyone want hp 41 to 41 infra-red data transfer? don wallace 38 2,679 09-21-2005, 04:17 PM
Last Post: don wallace

Forum Jump: