35s - how fast?



#12

Sorry for abusing this noble forum once again for a 35s question: Has this machine been benchmarked already in terms of speed?

Thank you :-)


#13

Read through Gene's review, in the last pages you'll find the answer... ;-)

Greetings,
Massimo


Edited: 14 July 2007, 5:30 a.m.


#14

Stupid me, thanks a lot, Massimo!

Edit: My 32SII does the looping test in 15 seconds. Twice as fast as the 35s? I must have overlooked something.


Edited: 14 July 2007, 5:43 a.m.


#15

The 33s and 35s use a GeneralPlus (formerly SunPlus) microcontroller with a 6502 core, which can run at up to 4 MHz, but is probably running slower in th calculator.

The 32SII used an HP Saturn core at about 650 KHz, if memory serves. The Saturn core was designed to do BCD arithmetic very efficiently; fixed point BCD addition or subtraction takes only a little more than one clock per digit, as does shifting. And of course floating point is performed in software by use of a lot of fixed point adds and shifts.

The 6502 takes many more cycles to do the same thing. It takes at least 3 cycles to do a binary add of a byte in memory to the accumulator, so to add two 15-digit floating point mantissas together (after they've been aligned) will require a code sequence something like the following (which is completely untested), assuming that the operands and result are stored in zero page in packed form:

ADDM:   LDX #7
CLC
L1: LDA OP1,X
ADC OP2,X
STA OP1,X
DEX
BPL L1

That takes about 139 cycles on the 6502, while on the Saturn the equivalent takes about 19 cycles (with operands in the 64-bit processor registers).

Also, the 33s and 35s firmware is written mostly (or perhaps entirely) in C. The 6502 isn't a very good target architecture for C, so that doesn't result in efficient code. If the arithmetic routines are written in C, they may be much worse than hand-coded routines. In particular, the compiler is unlikely to infer the use of the 6502's decimal mode.


#16

Thanks for giving some insights!

The museum benchmarks give more reasonable figures, so hopefully most meaningful applications won't run slower on the 35s than on the pioneer.

BTW, I have in mind implementing the error function which I often use and already implemented on the TI-59, PSION LZ and Sharp 1500 (the fastest!). At least, the 35s shouldn't evaluate it slower than the TI ;-)

#17

There was and old saying which was more or less...

"Software becomes slower more rapidly than hardware becomes faster"

In this case, it is not only software, but also architecture (specialized vs. general purpose).

However, I like (mostly) the 35s, slow as it may be.

#18

hi eric,

the choice of the 6502 architecture i find perplexing. i understand that hp might not have had the luxuary to change from the 33s, but that doesn't explain the 33s. the compiler point is important and i think that there are better choices than the 6502.

presumably there is not an ARM slow, cheap or low power enough to fit the bill. good compilers were built for the old 8086 architecture which might make an alternative. dust down your old copies of turbo C, and return (const char FAR*)FuManchu; well perhaps not.

but seriously, unless they're doing something really funky, i would expect the biggest limitation is a 64k address space. with 32k RAM, then you've only got 32K rom. this rules out adding a lot of extra function. which is why stuff might be missing already.

so better than segments might be our old friend the 68000, eg 16MHz dragonball like the palm had. flat architecture with mature compilers.

one thought; is there an architure licensing cost. for example is the 6502 now effectively free? when all others would require, at least some, license.

also, i had an idea about your ADD code.

Gene's article mentiones 15 decimal internal precision and an 8 byte mantissa (without signs). if i had to write the code in C for the 6502 or someting like that, i'd implement a base 100 decimal system with 2 digits per byte stored in binary. since without leveraging the decimal instructions of the 6502, arithmetic will be a shift and mask extravaganza or else write the math code by hand.

so with a base 100, you need a spare "padding" nybble that means 8 bytes gives you only 15 digits. which is what you've got, so maybe this is what they actually do.

im using the same idea in hplua but with base 10,000 each "digit" of 0 to 9999 stored in 16 bits. the idea is that i get to use 16x16->32 multiply and replace divide constants with inverted mul constants BUT the hit is that i waste 3 nybbles in this base. this isn't so bad because my floats are 16 bytes (compared to 35s 12 bytes). i also take a hit converting each 16 bit binary to and from decimal for IO, but this is not signigicant overall.


#19

Quote:
but seriously, unless they're doing something really funky, i would expect the biggest limitation is a 64k address space. with 32k RAM, then you've only got 32K rom. this rules out adding a lot of extra function. which is why stuff might be missing already.

If I'm remembering correctly, the address space is way over 64k. There was quite a bit of mask rom in the CPU.


Quote:
Gene's article mentiones 15 decimal internal precision and an 8 byte mantissa (without signs). if i had to write the code in C for the 6502 or someting like that, i'd implement a base 100 decimal system with 2 digits per byte stored in binary. since without leveraging the decimal instructions of the 6502, arithmetic will be a shift and mask extravaganza or else write the math code by hand.

There are more space efficient packing mechanisms for decimals: http://www2.hursley.ibm.com/decimal/dbover.html. All part of the IEEE-854 compliant decNumber library http://www2.hursley.ibm.com/decimal. The actual computations are carried out in base 10^n (n defaulting to 3, again from memory), so things are still relatively fast.

Using 12 bytes for reals that would easily fit into 8 is bording on criminal. Using 37 bytes for each register is just wanton wastefulness :-)


- Pauli


#20

the idea of using 10 bits for 3 digits is denser than Packed BCD, as you've pointed out. but i don't think things would still be relatively fast. i would expect this to be slower than a PBCD implementation because its more complicated. for example, one of the mantissa digits is stored in the exponent.

nevertheless, it does get close to binary efficiency, ie 16 decimal digits for 8 bytes.


#21

Quote:
the idea of using 10 bits for 3 digits is denser than Packed BCD, as you've pointed out. but i don't think things would still be relatively fast. i would expect this to be slower than a PBCD implementation because its more complicated. for example, one of the mantissa digits is stored in the exponent.

Yes, of course, it must be slower than a pure PBCD implementation. In use, however, the performance loss doesn't seem to be such an issue. You unpack the numbers once, perform all your operations in what amounts to a PBCD format and repack at the end.


Quote:
nevertheless, it does get close to binary efficiency, ie 16 decimal digits for 8 bytes.

Yes, this was the bit that most surprised me. The packing is *very* efficient and not too far from a pure binary equivalent. Plus it includes all the IEEE nicities like denormalised, NaNs, infinities and proper rounding.


Pauli

#22

Allow me to further the abuse-

I see there is no expandable memory capability on the 35S. So, can it be assumed still, as was the case on the 33S, that if one has too many programs, they might not all fit?


Possibly Related Threads...
Thread Author Replies Views Last Post
  A fast Bernoulli Number method for the HP Prime Namir 16 2,211 11-22-2013, 04:46 PM
Last Post: Namir
  Very fast modified TEA for HP 48 and up! Raymond Del Tondo 0 412 11-23-2012, 08:43 PM
Last Post: Raymond Del Tondo
  Fast Quadratic Formula for the HP-41C Gerson W. Barbosa 21 2,755 07-18-2012, 08:53 AM
Last Post: Gerson W. Barbosa
  HP42S freeze after "Fast mode" Tom Grydeland 3 753 02-23-2012, 05:41 AM
Last Post: Tom Grydeland
  HP 15C LE programming, does CPU run fast? designnut 1 476 02-12-2012, 05:13 PM
Last Post: Jeff O.
  [wp34s] Using FAST and SLOW Les Wright 5 878 12-03-2011, 11:14 AM
Last Post: Dominic Richens
  A fast and compact algorithm for the normal quantile Dieter 13 1,519 04-22-2011, 07:11 PM
Last Post: Paul Dale
  New HP12C that uses 2 batteries, and is blazing fast. jwhowa 30 2,800 01-26-2011, 04:19 AM
Last Post: Keith Midson
  HP 35s. Base conversions, fast way Pablo P (Spain) 0 354 06-30-2010, 02:00 PM
Last Post: Pablo P (Spain)
  Fast editor for 48SX/GX?? Mark Edmonds 16 1,692 06-05-2009, 12:53 AM
Last Post: Mark Edmonds

Forum Jump: