HP-41CL Calculator Benchmark



#2

I did some benchmarks for the HP-41CL as described in the Calculator Benchmark article. You'll find the results in PDF format here. As expected the MCODE benchmark shows a linear speed-up because there is no display output (the 41CL has to switch back to original speed for some I/O operations). With keystroke programming in TURBO50 mode the 41CL is as fast as a Commodore 64, programmed in BASIC. With MCODE in TURBO50 mode, the 8-Queen problem is solved in 1/4 sec!


#3

I was curious to see the difference of the speed modes since discovering the 41CL some times ago. Thank you for your effort and the nice contribution.


#4

I have to thank you for updating the article.


#5

Even though "TurboX" means "normal speed", TurboX could be misleading to the unaware!

My propose instead of:

17:58 HP-41CL Keystroke / RPN / TurboX Mode


to change to:

17:58 HP-41CL Keystroke / RPN / Turbo off


#6

I think you are right, but in the list "Turbo" usually stands for speed up by hardware modification,
so I changed it to "TurboX Mode x1.0" to make it more clear what TurboX means.

#7

Hi Xerxes,

Here's a couple of additions to your nice benchmark doc, in, I hope, ready-to-paste format:

 -       4.36      ND1 (v1.4)       RPL (UserRPL from HP-50g)
-
- 2.78 ND1 (v1.4) RPL+
-
- 0.00338 ND1 (v1.3.9) JavaScript
-

==========================================================================================

RPL+:
-------

<< 8 =r
0 =:s =:x =y
[] =a
DO
r =a[++x]
DO
++s
x =y
WHILE y 1 > REPEAT
a[x] a[--y] - =:t
IF 0 == t ABS x y - == OR THEN
0 =y
WHILE a[x] -- =:a[x] 0 ==
REPEAT
--x
END
END
END
UNTIL y 1 == END
UNTIL x r == END
s
>>

==========================================================================================

J‌avaScript:
[taken almost verbatim from C code]
-------------------------------------

function() { /*as is*/
var r=8, s=0, x=0, a = [];
do{
a[++x]=r;
do{
++s;
var t, y=x;
while(y>1)
if (!(t=a[x]-a[--y]) || x-y==Math.abs(t)){
y=0;
while(!--a[x])
--x;
}
} while(y!=1);
} while(x!=r);
return s;
}


Edited: 21 June 2011, 7:54 a.m.


#8

Hi Oliver,

thank you for this interesting comparison, but if I'm right the ND1 is an App and the speed depends on the used device.
Please consider that the benchmark is especially for physical calculators and calculator like pocket computers of the past.
I think comparing software or emulated calculators needs an own benchmark list with tests on different hardware.
Thank you for your understanding.


#9

Hi Xerxes,

Yes, ND1 is an app. I understand what you're saying and did notice that almost all results were for HW calcs. I saw a C64 result in there, which encouraged me to suggest this addition anyway.

But that's ok, I understand.

(I'll reapply after I figure out how to get a JavaScript VM running on an HP-30b. Ok, that's a joke. I think.)

Cheers.


#10

:)

The C64 stands for the missing Panasonic HHC with the Microsoft Basic ROM, because I suspect an equivalent speed. AFAIK there was also Snap Basic and Snap Forth available for the HHC.

#11

very nice contribution, Thank you both!

#12

Since we're revisiting this benchmark. The WP 34S runs it in 2.3 seconds in real mode and 2.1 in integer.


The program is the same either way:

        001: LBL B
002: CLREG
003: 8
004: STO 11
005: RCL 11
006: x=? 00
007: SKIP 22
008: INC 00
009: STO ->00
010: INC 10
011: RCL 00
012: STO 09
013: DEC 09
014: RCL 09
015: x=0?
016: BACK 11
017: RCL ->00
018: RCL- ->09
019: x=0?
020: SKIP 05
021: ABS
022: RCL 00
023: RCL- 09
024: x<>? Y
025: BACK 12
026: DSZ ->00
027: BACK 17
028: DSZ 00
029: BACK 03
030: RCL 10
031: RTN


- Pauli


#13

The fastest keystroke programmable! Thank you for testing.

#14

10x faster than a HP-12C ARM in RPN and 40x (!) faster than a 50G in UserRPL? Wow!

I guess RPL's poor showing comes from this code being more about interpreting control structures, than computing.


#15

Quote:
10x faster than a HP-12C ARM in RPN

I was thinking that using SKIP and BACK would be responsible for the speed advantage of WP 34S over other designs but the 12C ARM is using the same hardware and direct addressing, no labels. Pauli must have done something right, I guess.

EDIT: Thinking twice, isn't the 12C ARM based on an emulation layer that mimics the old voyager processor and runs the original firmware almost untouched? This would explain why it's slower than a native implementation.

Edited: 20 June 2011, 4:07 p.m.


#16

Quote:
I was thinking that using SKIP and BACK would be responsible for the speed advantage of WP 34S over other designs but the 12C ARM is using the same hardware and direct addressing, no labels.

I haven't tested but the long backward jumps might be faster using a GTO/LBL pair. I suspect that it is the distance of search that is important since both BACK/SKIP and GTO load every instruction from program memory. The LBL instruction executes very rapidly since it doesn't even call a worker routine.

If I could think of a better way to handle errors we'd get a fairly nice speed up. The current method saves the stack and volatile state before executing every instruction and restores it if an error occurred. This is quite expensive time wise but it allows a complete restoration with a minimum of code. The memory copy routine is optimised for space not speed which will also hurt a bit here.

Likewise, I've got lots of checks for illegal op-codes in the instruction dispatch & execution paths. Take these out and we'd get a small speed up. However, I'm not going to since the chance of an error causing havoc would increase too much.


The 12c is emulating the old (NUT?) processor.


Quote:
Pauli must have done something right, I guess.

I hope I've done more than one thing right in the firmware :-) Minimising the instruction decode/execute overhead was deemed desirable from the start.


- Pauli

#17

I've collected a few benchmarks by now, and compared UserRPL speed to JavaScript. With this benchmark, the speed difference is 800x, when it normally is ~20x.

The ratio from ND1 to 50g is ~20x for UserRPL, and ~30x for RPL+ vs. UserRPL, in line with usual results.

I conclude that this benchmark, so far, is an outlier / worst case for UserRPL.

An internal build runs the RPL+ code in 0.008 seconds (that is, a whopping 300x faster than current ND1, and ~10,000x faster than HP-50g), employing code-morphing to JavaScript. I don't have implemented this yet fully, but this nice result provides some motivation to push ahead with this work.

So, thank you for this worst-case-for-RPL benchmark... ;-)

#18

I must admit yet again. 2.1 seconds. Wow.


#19

It could be faster if we had the code space :-)

- Pauli


Possibly Related Threads...
Thread Author Replies Views Last Post
  HP-41CL setup troubleshooting Xavier A. (Brazil) 2 248 12-02-2013, 06:29 AM
Last Post: Xavier A. (Brazil)
  [41CL] New Extra Functions version Monte Dalrymple 0 148 11-08-2013, 04:32 PM
Last Post: Monte Dalrymple
  Yet another benchmark port on the wiki: Savage Pier Aiello 35 1,269 09-26-2013, 03:22 AM
Last Post: Pier Aiello
  So, latest 41CL / Library 4 config is... Gene Wright 4 293 09-22-2013, 02:59 AM
Last Post: Ángel Martin
  A brand new calculator benchmark: "middle square method seed test" Pier Aiello 25 891 09-13-2013, 01:58 PM
Last Post: Pier Aiello
  New community-maintained version of "Calculators benchmark: add loop" Pier Aiello 20 778 09-12-2013, 02:42 AM
Last Post: Pier Aiello
  HP-41CL anyone? Matt Agajanian 8 368 08-31-2013, 12:27 AM
Last Post: Sylvain Cote
  [41CL] A couple more rhetorical questions Monte Dalrymple 1 160 07-12-2013, 09:28 AM
Last Post: Ángel Martin
  41CL :TROUBLE IN FILE TRANsFER aurelio 22 911 06-18-2013, 03:44 PM
Last Post: aurelio
  [41CL] Another question for users Monte Dalrymple 28 1,105 06-03-2013, 10:04 AM
Last Post: Geir Isene

Forum Jump: