# HP Forums

Full Version: HP41C: Factorial (kind of) in MCODE
You're currently viewing a stripped down version of our content. View the full version with proper formatting.

This is a routine in MCODE I developed to overcome the situation that an HP41C can only calculate factorials of up to 69 as this is <10^100.

It may serve as a neat example how to handle MCODE in conjunction with the HP41CL in TURBO mode or with emulators on fast processors. The calculation of big numbers in an original HP41 may last a little too long. As far as I know, this "problem" has not been considered yet on a HP41 system. If so, I would like to get informed as I would be curious to learn about it.

With a little trick the numeric ability of the factorial problem can be expanded enormously by the use of logarithms. We can overcome the limitations of natural numeric expression as the antilogarithm of a sum of logarithms from 1 to n corresponds to the factorial of n.

The MCODE routine named LOGSUM takes a number (integer) from stack register X and calculates its natural logarithm starting with the number itself and looping down to 1. The result is then converted to a base-10 logarithm. I experimented with a sum of the base-10 logarithm but found it less reliable than the natural logarithm. Interestingly, the results of the latter were more precise in the HP41.

The main challenge of the MCODE program was to store intermediate results. Therefore, I used the status registers P for the storage of the intermediate log sums, and Q for the loop counter. The use of the CPU registers would increase the performance, however, they are used during the execution of the internal functions for the calculation of the natural logarithm (LN10, entry point 1B45) and the addition (AD2-10, entry point 1807) so intermediate results would be disturbed. In addition, it is more practical than the use of CPU 56-bit registers B, N and M as they must be accessed over C and this would mean much more code.

Eventually, the logarithm can be expressed as a natural number (better said: its approximation) by the use of the alphanumeric capabilities of the calculator. The result is shown as a.aaaaEbbbb in the alpha registers. For this conversion, a FOCAL program is used.

Listing of MCODE program "LOGSUM"

```08D "M"
015 "U"
013 "S"
007 "G"
00F "O"
00C "L"	        'Function name: LOGSUM
2A0 SETDEC	'set HP-41 to decimal mode
04E C=0 ALL
00E A=0 ALL	'Emptying of CPU arithmetic registers
228 WRIT 8(P)	'and Register P
0F8 READ 3(X)	'Store on stack in C and save it into
268 WRIT 9(Q)	'Register Q to serve as loop counter
115 ?NC XQ	'Start of loop
06C ->1B45	'Calculate natural log
10E A=C ALL	'Copy log to A
238 READ 8(P)	'Register P will contain the sums of the logs
01D ?NC XQ
060 ->1807	'Add Register P and most recent log
228 WRIT 8(P)	'Store the result back into P
10E A=C ALL	'and transfer it to A
04E C=0 ALL
130 LDI S&X	'The mantissa of C will become 1 by placing it
001 CON: 1	'first into S&X (exponent part) and then rotating
23C RCR 2	'it right twice
2BE C=-C-1 MS	'Works like a CHS operation on the mantissa
01D ?C XQ	'and the negative value is added resulting
061 ->1807	'in a subtraction of the loop counter by "1"
268 WRIT 9(Q)	'Store the loop counter back into P
2FA ?C#0 M	'and check whether the mantissa of C=0
377 JC -12	'If not 0, carrier set and jump back to loop start
10E A=C ALL	'Store the sum into A
04E C=0 ALL	'Clear C
35C PT= 12	'Pointer at start of mantissa
090 LD@PT- 2	'Load mantissa with 2.302585093 which will be
0D0 LD@PT- 3	'the divisor of the sum of natural logs to
010 LD@PT- 0	'get the 10-based logarithm (ln(10)=2.3025…)
090 LD@PT- 2
150 LD@PT- 5
210 LD@PT- 8
150 LD@PT- 5
010 LD@PT- 0
250 LD@PT- 9
0D0 LD@PT- 3
070 N=C ALL
1BD ?NC XQ
040 ->106F	'Perform "DIVIDE" function and put the result
3E0 RTN         'of logsum(n) into Register X
```

This little FOCAL program renders a legible result in decimal representation, which however cannot be used for further calculations:

Listing of FOCAL program "XFACT"

```01*LBL "XFACT"
02 LOGSUM	 'Calculate the logarithmic sum from 1 to n
03 INT	         'Take the integer. It is the exponential part.
04 LASTX
05 FRC	         'Take the fraction, the antilogarithm of which
06 10^X	         'is the mantissa.
07 FIX 4
08 CLA
09 ARCL X
10 "|-E"	 'This means APPEND "E".
11 FIX 0
12 ARCL Y
13 AVIEW	 'Output as a.aaaaEbbbbb in the alpha registers
14 FIX 4
15 END
```

I entered a set of numbers and calculated the factorial using XFACT at normal speed and 50x turbo on a 41CL. XFACT was compiled and transferred to HEPRAM.

```XFACT	1x Speed     50x Speed	  Result
(n)	 (sec)	      (sec)	  (output)
10	   4.01	        1.31	   3.6288E6
20	   6.23	        1.31	   2.4329E18
30	   8.19	        1.33	   2.6525E32
50	  12.35	        1.42	   3.0414E64
100	  22.55	        1.59	   9.3326E157
200	  44.85	        1.98	   7.8866E374
500	 109.35	        3.35	   1.2201E1134
1000	 216.85	        5.58	   4.0238E2567
5000	1104.16	       23.36	   4.2289E16325
10000	   n.d.	       45.32	   2.8462E35659
100000	   n.d.	      446.79	   2.7855E456573
```

A linear regression resulted in RR of >0.9999 and the 1x-speed slope was about 48-fold larger than the 50x-speed slope. By elimination of the overhead (y-axis intersection), supposedly the key read out and output onto display of about 1 second, I got a 49.5-fold difference of the slope which matches pretty exactly 50x TURBO.

Disclaimer: This code may be used at your own responsibility. I will not be responsible for any kind of inconveniences caused by its use.

Way to go Frido, you should post it as an article as well.

I noticed you use the "hard-coded" Ln(2) in the final division step. One alternative could've been calling [LN10] with "2" in C, shorter code but of course a tad slower.

```04E C=0 ALL      'Clear C
35C PT= 12       'Pointer at start of mantissa
090 LD@PT- 2
115 ?NC XQ	'calculate natural log
06C ->1B45       [LN10]
```

Since the result of [LN10] is also available as 13-digit form, this method would allow a 13-digit division next - done as multiplication by its inverse:

```239 ?NC XQ
060 ->188E       [ON/X13]
13D ?NC XQ
060 ->184F       [MP1_10]
3E0 RTN          'could use a GO in previous step instead)
```

This is an interesting way to overcome inherent limitations of the numeric range in the 41.

There are a few other great examples in JM-Baillard's pages, you and others may be interested to check them out:

Cheers,

ÁM

Edited: 25 May 2012, 5:29 a.m.

Dear Ángel,

First, I was somewhat disappointed that nobody had a reply on my contribution. Now, the MCODE Guru himself made a comment! :-)

Sure, the hard code of ln(10) - this was not ln(2)! - is not that elegant. The way you proposed is more straightforward as it uses fully the abilities of the HP41. The code would be changed as such:

```377 JC -12	'End of loop, see my posting
04E C=0 ALL	'Clear C
35C R=12	'Set pointer to nybble 12
050 LD@R 1	'fill it with 1
39C R=0	        'Set pointer to nybble 0
050 LD@R 1	'fill it again with 1 so we have 10 in C
115 ?NC XQ
06C ->1B45	'Perform ln(10)
070 N=C	        'Put ln(10) in CPU register N
0AE A<>C ALL
1BD ?NC XQ	'Divide C by N
040 ->106F
3E0 RTN
```

The solution you offered by calculating the inverse (1/x) of ln(10) and multiplying would of course work too.

However, an interesting, but also annoying point about the entry points of HP41 MCODE routines is that one cannot simply figure out where the operands have to be put in. For example, the 1/x function
(->188E, [ON/X13]) wants its argument in CPU register B, for the multiplication (->184F, [MP1_10]) these are A and C, and for divide (->106F, [DIVIDE] or [/]) the dividend is in A and the divisor is in N. Then you have to consider where the result will be delivered. Sometimes directly into the stack (this is the case with [DIVIDE]), but also into C only from where you have to do an additional step to write it into the stack.

This is not very straightforward, and I wish there would be documentation about this. Also, the preparation of the registers before feeding a function costs a lot of code, so that one has to consider about using an HP41 routine or if it would result in less code designing your own routine.

I will put this contribution with a few extensions into the articles section, as you suggested.

Believe me, I'm quite far from being a MCODE guru - not an appropriate term, I can hardly read 50% of the OS code without stumbling!

W.r.t. the input/output conventions for the MATH routines, I think reading the 13-digit writeup I did some time ago will help - available at TOS.

The first distinction is whether you're using 10-digit or 13-digit forms, then it all falls logically.

Sorry I got it wrong, of course it's not Ln(2). In fact there's one entry point in the OS that returns Ln(10) in 13-digit form to register B - so you could just call it:

```2B5 ?NC XQ
239 ?NC XQ
060 ->188E     [ON/X13]
13D ?NC GO
061 ->184F     [MP1-10]
```

Here using [ON/X13] is just a work-around to keep the 13-digit form of Ln(10). There are a couple of examples about that in the mini-paper.

Keep up the good work!

ÁM

Edited: 25 May 2012, 10:02 a.m.

Ángel,

Interestingly, the routine

```2B5 ?NC XQ
```
gives ln(10)*10^-6 as a result in B.

If
```239 ?NC XQ
060 ->188E     [ON/X13]
```
is executed directly thereafter the result is 1/ln(10).

1/ln(10) multiplied with a natural logarithm results in the corresponding
decadic logarithm (log). It appears that HP designed the [LNC10*] function for
this purpose. Thus, log would be calculated primarily by the code for the
natural logarithm. This confirms my previous observation that a sum of log is
less precise than a sum of ln. Obviously, because the error of the conversion

Thank you for the link. I will dive into your document and see which new
insights it will give to me.

Saludos

Frido

Quote:
Interestingly, the routine

```2B5 ?NC XQ
```
gives ln(10)*10^-6 as a result in B.

What a fool I am, this was a 13-digit precision ln(10).

Now, I have to complete reading your paper first, Ángel, before I produce more stupidities.

Quote:
... this was a 13-digit precision ln(10).

exactly! it´s a little confusing at the beginning but once you get the knack of it is actually easier than the 10-digit routines.

In fact I´d go as far as saying that the math routines were designed to be 13-digit from the scratch, and the 10-digit truncation came later as the way to interact with the input/output values.

Have fun!

Ángel,

The code has been transferred to the articles section. It has been improved using 13-digit precision.

Thank you for you very valuable contribution!

Frido