New HP-12C Review



#57

Thanks entirely to Charlie Oxford's efforts I now have a new HP-12C to
play with.

If you like the original 12C you will love this machine. It's
functionally identical to the original (we know this because it's
emulating the original) but everything runs 60 (sixty) times faster: a
long program (my 70 decimal digits of pi) ran in 90 seconds vs 90
minutes on the original 12C; a long amortization, 1.5 seconds vs 90
seconds, 69 factorial gives you an answer almost before you hit the
key, it's ludicrously fast! I hope that this blazing speed does not
scar the set-in-their-ways wall street crowd from believing the
calculation results.

Build quality, key clicks and display readability are excellent. Most
importantly, no missed keystrokes. The keys themselves use the new,
lower density plastic compared to the original, but this is a very
minor nit to pick.

The only things that are functionally different are the self tests. The manual has not been updated in this section, so it's wrong in most cases:

- "ON" + "/" runs the sequential key press test but shows 1,2 or 3
segments at a time and does not give "error 9" if you press the keys
in the wrong order. When the test ends it doesn't show the "12" on
the display just returns you to the X register.

- "ON" + "x" run a self test (I think) for one second and just results
to the x register.

- [ON] + [-] will clear the calculator and show "Pr Error".

- [ON] + [+] will run a continues self test (I think) ending when you
press and hold any key.

Now for the new stuff that I found from playing around:

- [ON] + [g] shows the curious display below but I haven't found anything
that you can do with this yet:

- [ON] + [g] + [ENTER} starts a testing menu:

1.L - LCD test -- this turns on all the segments. If you then press
[Rv] it will turn off half of them, press [Rv] again and you'll toggle
the segments on to off and vice versa

2.C - Copyright. First you'll see this:

the next key press will give you this:

then next you'll see this:

3.H - Extended LCD and key test. All segments will turn on at first.
hit any key and it will turn off 1 to 3 segments that sort of map to
the position of the keys -- rows and columns.

There are probably other easter eggs in here too, but I have yet to find them.

So, how do you find one of these in the store? You can see the bottom
edge of the calculator through the packaging, here's what it looks
like:

Here's a picture of the back, see how big that batter cover is?

And here's what it looks like under the door, I need to get one of
those special SDK cables from HP so I can start playing around with
some alternative firmware.

(I'm tempted to pull off the feet and peek at the circuit board, but
I'm going to wait for a bit longer.)

I think they did a great job on this. Proven technology and
ergonomics updated with modern speed and firmware
replacement/updatability. I can't wait for the modern 15C (and, dare I
suggest, 16C). Way to go HP! Kudos to Eric, Cyrille, Sam, Gene and whoever else
had a hand in this.


Edited: 28 Apr 2009, 1:47 a.m. after one or more responses were posted


#58

I think I'm going to have to keep an eye open for this one.


- Pauli

#59

Thanks for the pics of the battery door. It confirms what I've been looking for.

#60

Thanks. This will be the 3rd post-48GX HP I purchase (35s and 50g being the other two).

Has any started a wiki or blog on reprogramming it?


#61

I see another stick-on serial number. sigh, is it too much to ask for manufacturer-engraved SN? ;-)

#62

I will create an entry on my wiki for it (the same place as the 20b repurposing project). Feel free to contribute any and all information to it.

Katie, can I abscond with your findings to post them, or would you be willing to post them on the wiki? There's some great material there that I don't want to lose...

URL: http://www.wiki4hp.com

thanks,
bruce


#63

Bruce,

Quote:
can I abscond with your findings

Abscond away! I'll keep working at it to see if I can come up with anything else.

Thanks for the wiki and all your work on the 20b,

-Katie

#64

Katie,

any pictures of the calculator from the front? It appears the classic 12C gold trim is back.


#65

I just added one to the top of the review.

#66

It's a pity that the serial number label is so shoddy looking -- but, I'm picking the tiniest of nits here. I'm really pleased with what you've described! I think I'll have to get one. Believe it or not, I don't own a 12c in any incarnation yet!

#67

Brilliant review, thanks Katie. I have one with serial number CNA 849... and the LCD screen has the most "yellowy" tinge I have seen in a 12C. Still useable though. The "ON" key action is more direct than on the old 12C - on the new one the screen "lights up" even when I press and hold the "ON" down. The old one only turns on when the "ON" key is released. This makes the self test proceedure here a little different - more like [/] + [ON] rather than the old [ON]+[/] :-) I managed all your tests - great fun - oh except for the last one I couldn't escape from the menus ;-)

Cheers,
Tony


#68

Tony,

On that last test you need to press all the keys down at least once to blank the display. Once the whole display is blanked you can exit the test. A hardware reset (under the battery door) is the only other way to quit that I found.

Also, when a program starts with [f][CLEAR sigma] the display is not blanked when running like it is on the original 12C.

-Katie


#69

Thanks Katie - I did at least test that running after clearsigma still showed "running". But I never thought to test the tests ;-)

#70

Are there any change in this new 12C package?

How to distinguish the new 12c and the old one?


#71

Go back to Katie's original post. She describes how to identify the new version by looking through the packaging.

#72

Reading through the Hewlett-Packard Digest, Volume Eight, 1981, there is an article entitled "Quality By Design," which expounds on the key reliability of the Voyager keyboard design. Metal key operators and a gold-plated circuit board ensures low resitance contact points for precise actuation and durability, resistive to to wear and corrosion. Indeed, the many original Voyagers still in use today attests to this quality. I have a 1982 HP-15C and 1987 HP-12C with keyboards that still work perfectly.

So, I have to wonder how much of this sort of quality has carried over to the newest incarnation of the HP-12C. The cheesy serial number sticker does not concern me as much as any compromises in quality that may have been made under the skin. I suppose it is unreasonable to expect the modern pricepoint calculators to have 25+ year lifespans, but I still hope that HP has set the bar high on this product with regards to quality.


#73

Quote:
I suppose it is unreasonable to expect the modern pricepoint calculators to have 25+ year lifespans, but I still hope that HP has set the bar high on this product with regards to quality.

I would assume the quality would be on a par with the existing 12c or 12c platinum, if anyone has any experience with these.

#74

Martin, I would say the quality of the new unit is the same as that of a 12c I bought 8 years ago, and a 12cp 25th anniversary edition I bought about 3 years ago. They are all made in China, but I have never had any problems with any of those units. The first platinums had a problem with keystroke programs that were more than about 250 lines, as I recall, and I had one of those. But the new unit is blazingly fast. Like Katie said, a program I had that took maybe 2 minutes to run on the old 12c takes less than 2 seconds on this one. That ARM processor does make a huge difference!

#75

Quote:
So, I have to wonder how much of this sort of quality has carried over to the newest incarnation of the HP-12C.

Michael, IMHO the current 12C keyboard is the best of any of the current models. In fact, I really don't understand why HP don't use this key mechanism with the 20B, for example.

#76

I looked very hard and couldn't find an = key. Is this a pure RPN machine? If so, yahoo!!


#77

Yep, pure RPN, just like the original 12c.


#78

Program of

01 + 02 GTO 01

with the stack filled with 0 in X and 1 in Y, Z and T

counts to well over 45,000 in 60 seconds.


#79

Gene,

You have a hyper-speed 12C! I only get to about 30,000 in 60 seconds. The factor of 1.5 difference agrees with your statement several months ago that the new 12C runs 90 times faster, I (and Don too) found that it's "only" 60 times faster.

How do I get to hyper-speed? :)

-Katie


#80

Quote:
Gene,

You have a hyper-speed 12C! I only get to about 30,000 in 60 seconds. The factor of 1.5 difference agrees with your statement several months ago that the new 12C runs 90 times faster, I (and Don too) found that it's "only" 60 times faster.

How do I get to hyper-speed? :)
-Katie


Don't get too excited about that speed, it comes at the expense of battery life efficiency.

See here for why:

http://www.alternatezone.com/eevblog/?p=32

I bet they are running this sucker at 30MHz just like the 20B. And if they are, continuous processing would drain the batteries in way under 30 hours. Anyone want to put their unit into a continuous loop and see how long it actually lasts?

Dave.


#81

that's sad news. or maybe it's not sad news. i don't speak ausie ;-) but you seem to be saying that if someone(s) write new operating systems for the 12c and 20b; they can drastically lower the power consumption in both by changing the clock speed. if i get this correctly; the worst that the savings can be is 10X, but since the calc will sit idle most of the time while we write and think, and if we choose a standby speed of less than your 3 meg; those batteries can last a very long time indeed. did hp do this with either of the new units as shipped?


#82

Quote:
that's sad news. or maybe it's not sad news. i don't speak ausie ;-) but you seem to be saying that if someone(s) write new operating systems for the 12c and 20b; they can drastically lower the power consumption in both by changing the clock speed.

You can reduce the losses in the battery resistance by running at a slower speed, yes. This will give greater battery life at the expense of calculation speed.

A big speed reduction will almost certainly have no visible speed impact on normal calculations. It's only looping program calculations where the speed matters, but even then 30MHz seems crazy. My uWatch runs at 250KHz and does C floating point calculations all but instantly. In fact it's practically instant running at 32KHz.

Quote:
if i get this correctly; the worst that the savings can be is 10X, but since the calc will sit idle most of the time while we write and think, and if we choose a standby speed of less than your 3 meg; those batteries can last a very long time indeed. did hp do this with either of the new units as shipped?

The 20B runs at 30MHz only when doing calculations, then sits idle drawing almost nothing at a slow speed. So it's doing it properly except for the fact that they chose the top speed of 30MHz, and it peaks like this for *every* calculation regardless if it needs it or not! This is very poor low power calc design IMHO.

I can hardly imagine a program running on such a calc that would warrant a 30MHz clock rate on a 32bit ARM processor.

Every time you do a calc you are gulping a quick 15mA from those poor little CR2032 batteries with their high output resistance, it makes me want to cry!

I don't know about the 12B, I'm just assuming it's the same as the 20B.

Dave.

Edited: 29 Apr 2009, 2:03 a.m.


#83

Quote:
My uWatch runs at 250KHz and does C floating point calculations all but instantly. In fact it's practically instant running at 32KHz.

uWatch is running native math code, and the ARM-based 12C is not. Cyrille's put in a lot of optimizations [*], but at 250 kHz it would most likely be slower than the original 12C.

I agree, though, that 30 MHz is absurd and wastes battery life.

Eric


[*] I proposed some optimizations to the BCD math, and I know Cyrille experimented with them, but I'm not sure whether he put them in the production code. Aside from that, I know he designed his emulation code from the ground up to be very efficient.


#84

I just measured the current draw on the new 12C with good equipment. You need two power supplies for this as the batteries are in parallel with a common positive contact. Here are my findings:

power off: 4uA

power on, static display, no keystrokes: 45uA

continuous keystrokes (number entry): 1mA

long amortization function: 15mA

tight program loop: 15mA

continuous self-test ([ON]+[+]): 4mA

test modes ([ON]+[g] and [ON]+[g]+[ENTER]) , static display : 1.8mA

So it beats up on those CR2032's but only to give you the fast speed. I think this is justified however since you really do want the amortization results as fast as possible. Other functions run so fast that stress to the batteries is minimal. For most practical user programs on the 12C the same is true.

Given the higher current draw in the new modes, my guess is that the the boot loader on the Atmel chip is running. You probably need to be in this mode to talk to the serial port. Perhaps that's the purpose of [ON]+[g], just to put you in that mode and show the the status of the CPU.

-Katie

Edited: 29 Apr 2009, 12:56 p.m. after one or more responses were posted


#85

Quote:
I just measured the current draw on the new 12C with good equipment. You need two power supplies for this as the batteries are in parallel with a common positive contact. Here are my findings:

power off: 4uA

power on, static display, no keystrokes: 45uA

continuous keystrokes: 1mA

long amortization function: 15mA

tight program loop: 15mA

continuous self-test ([ON]+[+]): 4mA

test modes ([ON]+[g] and [ON]+[g]+[ENTER]) , static display : 1.8mA

So it beats up on those CR2032's but only to give you the fast speed. I think this is justified however since you really do want the amortization results as fast as possible. Other functions run so fast that stress to the batteries is minimal. For most practical user programs on the 12C the same is true.

Given the higher current draw in the new modes, my guess is that the the boot loader on the Atmel chip is running. You probably need to be in this mode to talk to the serial port. Perhaps that's the purpose of [ON]+[g], just to put you in that mode and show the the status of the CPU.

-Katie


Thanks for the measurements. But are you SURE it doesn't actually take 15mA spikes during normal calculations?

What equipment?, what method? "good equipment" doesn't mean anything if your method has limitations (e.g. you are only reading the average with a multimeter). Sorry to be pedantic, but it's easy to get false measurements on pulse current readings like this.

Obviously the processor is working at 30MHz during program execution as expected, so your 15mA figures are spot on. My bet is it also peaks at 15mA doing a simple addition.

Quote:
Other functions run so fast that stress to the batteries is minimal.

Sorry, you can't beat ohms law. The battery losses remain the same as I pointed out in my video, regardless of how "quick" the pulse is.

Dave.

Edited: 29 Apr 2009, 5:27 a.m.


#86

I was using a Fluke 867b, measuring the current draw from the common "+" supply line from the batteries to the calc (burden voltage drop is minimal on the 10 amp range). Yes, it does draw 15ma peak on every function. I just rechecked this using a HP 34401A and got the same readings. They also seem to agree with what Cyrille posted.

#87

hello,

Quote:
power off: 4uA
power on, static display, no keystrokes: 45uA
continuous keystrokes: 1mA
long amortization function: 15mA
tight program loop: 15mA
continuous self-test ([ON]+[+]): 4mA
test modes ([ON]+[g] and [ON]+[g]+[ENTER]) , static display : 1.8mA

the basic figures for the ARM are:
power off 4µA
running at 2Mhz (internal oscillator) ~1.5ma
running at 30Mhz 15ma
LCD on (12C, no charge pump: 45µa, 20b, charge pump, 150µa)

cyrille

#88

Quote:
uWatch is running native math code, and the ARM-based 12C is not. Cyrille's put in a lot of optimizations [*], but at 250 kHz it would most likely be slower than the original 12C.

Yes, the 12C is a different beast because it's running an emulator which has much more overhead. The 20B on the other hand...

If you assume the new 12C works at 30MHz, and it's been measured as 60 times faster, then it's obvious it only needs to run at 500K to emulate the original 12C speed (sounds about right to me). Speed improvement is nice, so a nice round 5 or 10 times improvement would have been sufficient for marketing, giving only a few MHz operation which would be very sensible. Or better yet, if possible, make it smart - so for normal calcs keep it running at a low 500KHz, and only switch to high clock speed when it's running a program or something.

Or even better still, give the user a speed option, it's only a few lines of code. By default, make it slow for maximum battery life, and those who need super speed can select it if needed.

Dave.


#89

I like the user-specified speed setting with a default of a few MHz. Most users would never read the manual to know how to change it but would experience a 10x speedup over the original 12C and very long battery life. Geeks would push it to the limit but have plenty of spare batteries around.

Still, at 30MHz when functions are run -- even with the bad battery losses -- given typical usage patterns I think that most users will experience several years of usage on one set of batteries.


#90

Quote:
Still, at 30MHz when functions are run -- even with the bad battery losses -- given typical usage patterns I think that most users will experience several years of usage on one set of batteries.

The 20B is rated at "an average of 9 months" battery life, so the 12C should be an identical spec.

Dave.

Edited: 29 Apr 2009, 6:02 p.m.


#91

hello

Quote:
The 20B is rated at "an average of 9 months" battery life, so the 12C should be an identical spec.

actually, no.
the 20b in idle mode (screen ON, not 'working') uses 150µa versus 45 or so for the 12C. so there will be a difference in battery life.

it also takes more keys to do something on average with the 20b, so there is more overhead there...

cyrille

#92

Hello,

Quote:
[*] I proposed some optimizations to the BCD math, and I know Cyrille experimented with them, but I'm not sure whether he put them in the production code. Aside from that, I know he designed his emulation code from the ground up to be very efficient.

Yep, the emulator is quite efficient.
As for BCD calculations, I further optimized the code that you proposed in 32 bit ARM assembly using the full power of 3 operations per instruction offered by the ARM (shift + operation + carry detection) allowing me to do a 64 bit BCD add in 18 instructions..

cyrille


#93

Of course, for Nut emulation you don't NEED 64-bit BCD operations. Maybe you've incorporated the routines into firmware for other calculators that do need 64-bit BCD operations?


#94

My hat is off to Cyrille!

#95

hello

Quote:
Of course, for Nut emulation you don't NEED 64-bit BCD operations. Maybe you've incorporated the routines into firmware for other calculators that do need 64-bit BCD operations?

But my code works for 64 bits :-) thanks to assembly codding and the power of ARM assembly, adding the last nibble only required one extra instruction...

// used to do an addition on 2 DCB number with a result in DCB
// for example, dcbAdjust(a, b) when a and b are dcb representation
// of number will return the dcb represention of the number r=a+b...
// note that the addition is done prior to the call...
// int64 dcbAddAdjust(int64 r, int64 b); // r=a+b

// a += 0x0666666666666666ULL; // preadjust as if carries occur
// u64 s = a + b; // compute the sum
// b = a ^ b ^ s; // find the carries
// b = ~b & 0x1111111111111111ULL; // compute mask for non-carries
// return s - ((b >> 4)*6); // subtract out 6 * non-carries
dcbAddAdjust:
ldr r12, cte66666666 // load 66666666 in lr to perform the a+666666666666666
/*
10hex = 16decimal. so the difference between the BCD representation of 10 and the hex value of 10 is 6.
Adding 6 to each digit of one of the 2 numbers correspond to assuming that the addition will generate a carry
for each digit and preemtively adds the 6 to each digit.

This need to be done so that we can easely detect which digit addition really had a carry.
Then the algorytm will remove the extra 6 from these digits.

The carry for each digit does appear as an extra '1' added to the last bit of each digit.
This means that the parity of each digit is equal to parity of digit in a exclusive or parity
of digit in b exclusive or carry from previous digit.

This means that calculating a xor b xor (a+b) and only looking at the first bit of each digit
will tell us for each digit if there is a carry or not. Note that the fact that we addedd 6 to each digit of
a does not affect the calculation of parity for a as 6 is an even number
*/
adds r0, r0, r12 // a+=6666666666666666
adc r1, r1, r12
ldr r12, cte88888888 // preload 888888888888 preloading it earlier than needed will reduce 1 wait state later on...
/* step 2 of the algorythm. we now have pre addjusted a, we calculate the sum of a+b.
however, here we pay attention to keeping the carry out of that calculation in the CPU
carry flag (this is why we use the adcS instruction as in add r1 and r3 and the current carry AND
keep the carry out in the flag register) so that we can later remove the 6 from the last digit of the
result if that carry is clear, or in our case, clear the bit in the bitfield representing the
digits where we need to remove 6 for digit 16 if the carry is set.*/
adds r4, r0, r2 // s= a+b
adcs r5, r1, r3 // keep carry!!!

eor r2, r0, r2 // b=a^b. This calculates the combined parity of each digits of a and b.
eor r3, r1, r3 // note that only the first bit of each digit has any interest. the others will be removed later
eor r2, r2, r4 // b=a^b^s. 'removing' the combined parity of each digit of a and b from the parity of the
eor r3, r3, r5 // sum of a and b gives the carry bit for each digit.
/* the next 3 instructions perform 3 64 bit operations at once (a normal non ARM CPU would need 6 instructions to do so!), so please follow!
we now have a set of 64 bits, 16 of them are of interest to us (the least significant bit of each digit) as it indicates the carry.
so, we need to
1: clear all the other bits (the b&1111111111111111 in the C code)
2: if the carry bit for digit 'n' is NOT set (ie, no carry), it means that we need to remove 6 from digit n-1. So we
need to invert each carry bit
3: we need to shift our bitfield to get the bit indicating the carry caused by digit n located in the same
region of the registers as digit n (for the moment, the bit for digit n is the least significant bit of digit n+1)

Thanks to the ARM cpu instruction set, we can do this in only 3 instructions.
- as far as 1 and 2 are concerned we can use the bic instruction (Bit Clear) which performs a "a and ~b" operation so we can combine them
- a shift operation normally takes 3 operations, but if we decide to shift the bitfeild by only 1 bit (which would place the carry for digit n
in the bit 3 of digit n), then we can use the shift ability of the ARM to perform in 1 operation the 88888888 and (~b shift 1) for the lower
32 bits of b, then use 1 instruction (the sub) to handle the carry for digit 7 (which is now held as the least significatn bit of the
register holding the upper 32 bits of b) and finish the work by handling the upper 32 bits of b.
because the shift is done 'onthe fly', we now need to and b not by 11111..., but by 1111... shifted by 1 or 8888......

Note that if the ARM had 64 bit registers, we could do the whole thing in only 1 instruction */
bic r2, r12, r2, lsr #1 // (~b>>1)&0x888888888888888
sub r2, r2, r3, asl #31
bic r3, r12, r3, lsr #1

/* Handles carry that we lost from the register due to 64 bit limitiation during the addition of a+b
but that was kept in the flag register of the ARM CPU and never modified since!
if the carry is SET (ie, there is a carry on the last digit), then we do NOT need to remove 6 from the last
digit, and the bit corresponding to the need to remove that 6 is cleared from the register. */
subcs r3, r3, #0x80000000

/* the last step of the algorythm is to remove 6 (or remove 2 and remove 4 as 2+4=6) from each digit where they were no carry.
for each digit, we have a bit (bit 3 to be precise) in variable b that is set if 6 needs to be removed from this digit.
so, we need to remove b>>1 (correspond to carry bit*4) and b>>2 (correspond to carry bit *2) from the sum to get our result.
note that because we know that only 1 bit in each group of 4 bit is potentially set, there is no need to handle bit movement
from higher 32 bits to lower 32 bits of b. If the ARM had 64 bit registers, this would be a non issue. */
subs r4, r4, r2, lsr #2 // s-(b>>2)*3 = s-b<<2-b<<3
sbc r5, r5, r3, lsr #2
subs r0, r4, r2, lsr #1
sbc r1, r5, r3, lsr #1
END of Function

cte66666666:
DC32 0x66666666
cte88888888:
DC32 0x88888888
cte11111111:
DC32 0x11111111

cyrille


#96

Cyrille: with some guessing (I apologize), I tried to put the code
you shared in a format compatible with the MoHPC Forum, which at
times is rather unfriendly with respect to long posts formatting.
Please correct any mistake.
Andrés


But my code works for 64 bits :-) thanks to assembly coding and
the power of ARM assembly, adding the last nibble only required
one extra instruction...

// Used to do an addition on two DCB numbers with a result in DCB
// for example, dcbAdjust(a, b), when a and b are dcb representation
// of numbers, will return the dcb representation of the number r=a+b...
// Note that the addition is done prior to the call...
//
// int64 dcbAddAdjust(int64 r, int64 b); // r=a+b
// a += 0x0666666666666666 ULL;
// preadjust as if carries occur
// u64 s = a + b;
// compute the sum
// b = a ^ b ^ s;
// find the carries
// b = ~b & 0x1111111111111111 ULL;
// compute mask for non-carries
// return s - ((b >> 4)*6);
// subtract out 6 * non-carries

dcbAddAdjust:

ldr r12, cte66666666 // load 66666666 in lr to perform the a+666666666666666

/* 10hex = 16decimal, so the difference between the BCD
representation of 10 and the hex value of 10 is 6.
Adding 6 to each digit of one of the 2 numbers correspond to
assuming that the addition will generate a carry for each digit
and preemptively adds the 6 to each digit.
This need to be done so that we can easily detect which digit
addition really had a carry. Then the algorithm will remove the
extra "6" from these digits.
The carry for each digit does appear as an extra '1' added to the
last bit of each digit. This means that the parity of each digit
is equal to parity of digit in an exclusive-or parity of digit in
b exclusive-or carry from previous digit.
This means that calculating a xor b xor (a+b) and only looking at
the first bit of each digit will tell us for each digit if there
is a carry or not. Note that the fact that we added 6 to each
digit of a does not affect the calculation of parity for a as 6 is
an even number */

adds r0, r0, r12 // a+=6666666666666666
adc r1, r1, r12
ldr r12, cte88888888 // preload 888888888888; preloading it earlier than
// needed will reduce 1 wait state later on...

/* Step 2 of the algorithm. We now have pre adjusted a, we
calculate the sum of a+b. however, here we pay attention to
keeping the carry out of that calculation in the CPU carry flag
(this is why we use the adcs instruction as in add r1 and r3 and
the current carry, AND keep the carry out in the flag register),
so that we can later remove the 6 from the last digit of the
result if that carry is clear or, in our case, clear the bit in
the bitfield representing the digits where we need to remove 6 for
digit 16 if the carry is set.*/

adds r4, r0, r2 // s= a+b
adcs r5, r1, r3 // keep carry!!!
eor r2, r0, r2 // b=a^b.

/* This calculates the combined parity of each digits of a and b. */

eor r3, r1, r3 // Note that only the first bit of each digit has any interest.
// The others will be removed later.
eor r2, r2, r4 // b=a^b^s. 'Removing' the combined parity of each
eor r3, r3, r5 // digit of a and b from the parity of the sum of a and b
// gives the carry bit for each digit.

/* The next 3 instructions perform 3 64-bit operations at once (a
normal non-ARM CPU would need 6 instructions to do so!), so please follow!
We now have a set of 64 bits, 16 of them are of interest to us
(the least significant bit of each digit) as it indicates the carry. So, we need to:

1. Clear all the other bits (the b&1111111111111111 in the C code)
2. If the carry bit for digit 'n' is NOT set (i.e., no carry),
it means that we need to remove 6 from digit n-1. So we
need to invert each carry bit
3. We need to shift our bitfield to get the bit indicating the carry
caused by digit n located in the same region of the
registers as digit n (for the moment, the bit for digit n
is the least significant bit of digit n+1) */

/* Thanks to the ARM CPU instruction set, we can do this in only 3
instructions. - as far as 1 and 2 are concerned we can use the bic
instruction (Bit Clear) which performs a "a and ~b" operation so
we can combine them - a shift operation normally takes 3 operations,
but if we decide to shift the bitfield by only 1 bit (which
would place the carry for digit n in the bit 3 of digit n), then
we can use the shift ability of the ARM to perform in 1 operation
the 88888888 and (~b shift 1) for the lower 32 bits of b, then use
1 instruction (the sub) to handle the carry for digit 7 (which is
now held as the least significant bit of the register holding the
upper 32 bits of b) and finish the work by handling the upper 32
bits of b. because the shift is done 'on the fly'; we now need to
and b - not by 11111..., but by 1111... shifted by 1 or 8888......
Note that if the ARM had 64 bit registers, we could do the whole
thing in only 1 instruction */

bic r2, r12, r2
lsr #1 // (~b>>1) & 0x888888888888888
sub r2, r2, r3
asl #31
bic r3, r12, r3
lsr #1

/* Handles carry that we lost from the register due to 64 bit
limitation during the addition of a+b but that was kept in the
flag register of the ARM CPU and never modified since, if the
carry is SET (i.e., there is a carry on the last digit), then we
do NOT need to remove 6 from the last digit, and the bit
corresponding to the need to remove that 6 is cleared from the
register. */

subcs r3, r3, #0x80000000

/* The last step of the algorithm is to remove 6 (or remove 2 and
remove 4; as 2+4=6) from each digit where there were no carry. For
each digit, we have a bit (bit 3 to be precise) in variable b that
is set if 6 needs to be removed from this digit. So, we need to
remove b>>1 (correspond to carry bit*4) and b>>2 (correspond to
carry bit *2) from the sum to get our result. Note that because we
know that only 1 bit in each group of 4 bit is potentially set,
there is no need to handle bit movement from higher 32 bits to
lower 32 bits of b. If the ARM had 64 bit registers, this would be
a non issue.*/

subs r4, r4, r2
lsr #2 // s-(b>>2)*3 = s-b<<2-b<<3
sbc r5, r5, r3
lsr #2
subs r0, r4, r2
lsr #1
sbc r1, r5, r3
lsr #1

END of Function

cte66666666: DC32 0x66666666
cte88888888: DC32 0x88888888
cte11111111: DC32 0x11111111


#97

For comparison, here's the C code I sent to Cyrille that inspired his ARM assembly code:

uint64_t bcd15d_add (uint64_t a, uint64_t b, bool carry_in)
{
if (carry_in)
b++;
a += 0x0666666666666666UL; // preadjust as if carries occur
uint64_t s = a + b; // compute the sum
b = a ^ b; // find the carries
b = ~(s ^ b) & 0x1111111111111110UL; // compute mask for non-carries
return s - ((b >> 2) | (b >> 3)); // subtract out 6 * non-carries
}

This works for 15-digit BCD addition, but not for 16 digits, because in C code there is no simple portable way to obtain the carry out from an addition of 64-bit unsigned integers. Cyrille wrote ARM assembly code based on this, and in ARM assembly it's easy to obtain the carry out.

Note that when given non-BCD input, neither the C code nor the ARM code will match the results given by the HP Nut or Saturn processors. My latest code first does an efficient parallel test for valid BCD digits of each operand, and chooses the fast BCD addition for valid operands, or a digit-by-digit method for invalid operands. I've sent the C code for the efficient parallel BCD test to Cyrille in case he wants to do something similar in other HP calculators such as the 50g.


#98

Quote:
For comparison, here's the C code I sent to Cyrille that inspired his ARM assembly code:

uint64_t bcd15d_add (uint64_t a, uint64_t b, bool carry_in)
{
if (carry_in)
b++;
a += 0x0666666666666666UL; // preadjust as if carries occur
uint64_t s = a + b; // compute the sum
b = a ^ b; // find the carries
b = ~(s ^ b) & 0x1111111111111110UL; // compute mask for non-carries
return s - ((b >> 2) | (b >> 3)); // subtract out 6 * non-carries
}

This reminds me very much of the tricks described in the book "Hacker's Delight". (Web site at http://www.hackersdelight.org/)

#99

Excellent book! I seem to have misplaced my copy (or maybe lent it out and forgotten?), so I might have to buy another one.

When I started to design my Saturn emulation engine (for Emu71), back in 1995, I investigated the most efficient way to implement BCD operations on x86 processors in 16-bit "real" mode (Emu71 was, and still is, a pure 16-bit "DOS" program...).
I quickly recognized that I had to handle the nibbles in packed form, i.e. 2 nibbles per byte to have some degrees of parallelism. I used the native BCD support of the Intel processors. The result is a quite efficient code for a 16-bit processor, and this explains most of the speed of Emu71:

; BCD 16 nibble addition: [dest] += [src]
; di points to dest, si points to src
_addd_w: mov ax,[di]
mov bx,[si]
add al,bl
daa
xchg al,ah
adc al,bh
daa
xchg al,ah
mov [di],ax
mov ax,[di+2]
mov bx,[si+2]
adc al,bl
daa
xchg al,ah
adc al,bh
daa
xchg al,ah
mov [di+2],ax
mov ax,[di+4]
mov bx,[si+4]
adc al,bl
daa
xchg al,ah
adc al,bh
daa
xchg al,ah
mov [di+4],ax
mov ax,[di+6]
mov bx,[si+6]
adc al,bl
daa
xchg al,ah
adc al,bh
daa
xchg al,ah
mov [di+6],ax
update_carry

If I had to rewrite Emu71 in 32-bit mode, I would do differently, maybe using Eric/Cyrille method. Is it a public domain method, or is it an innovative code of yours?

J-F

Edited: 1 May 2009, 3:54 a.m.


Quote:
When I started to design my Saturn emulation engine (for Emu71), back in 1995 ...

A million thank yous for Emu71.
Quote:
If I had to rewrite Emu71 in 32-bit mode, I would do differently ...
And a million more if you rewrite it AND open source it. Given current processor speeds I would hesitate to require assembly optimizations in the event someone wanted to port it to a different architecture.

What makes EMU71 useful for me is the hardware HP-IL support. This would need to be ported as well. This is probably the hardest part of the deal: write a Win32 or Linux device driver for the hardware (HP's or Christoph Klug's or the yet to appear PIL box.)


USB-based PIL box. No problem. I think JFG has a plan, a great plan.

Floyd: "What's going to happen?", Bowman: "Something wonderful." -- 2010

Edited: 1 May 2009, 1:27 p.m.

[Start of Dream]
what about a i71 for the iPhone? Another project could be to connect the iPhone to IL now that Apple has released the SDK for the connector pin... I don't know anything about hardware and frustratingly little about the 41/71 as well, but I think it would be a super cool project to connect the iPhone to HP-IL and have the i41X (or a yet to be written i71) connected to real devices...
[End of Dream]

Cheers

Peter


I share your dream. My plans:

  1. Get a PIL Box to work with Windows, Linux, Mac.
  2. Leverage the work of Khanh-Dang Nguyen Thu Lam (http://pagesperso-orange.fr/kdntl/hp41/nonpareil-patch-doc.html) and create a TCP front-end to the PIL BOX. I'll also support the virtual IL devices create by KDNT. This way IL can be added to any emulator and TCP can be used to get to the PIL-BOX for physical devices.
  3. Beg i41CX+ author to add TCP/IL support.

I'd be happy to help with 3.) (and any beta-testing that might be helpful).

[Dream Continues...

The HP-41 had real conectivity via HP-IL 25 years ago, even with data-acquisition units. I would like to say (rather fanatically, excuse me) that it had better real connectivity than the iPhone has today... OK, WiFi is nice on the iPhone side...

:-))

Quote:
You have a hyper-speed 12C! I only get to about 30,000 in 60 seconds. The factor of 1.5 difference agrees with your statement several months ago that the new 12C runs 90 times faster, I (and Don too) found that it's "only" 60 times faster.

Because of the emulator, I wonder if the speed is sensitive to how the program is aligned in memory. There's a good chance that reading something on a 4-byte boundary is faster than reading something that starts at an odd nibble address.

Dave

Like Katie, mine got to 30,382. Don

Thanks.
How many program steps?

DamirV

Edited: 29 Apr 2009, 2:24 p.m.


So has HP upgraded the 12cp at all? How does the current 12cp compare to this new 12c?


I don't think the 12cp has been upgraded. I have the 25th anniversary edition. How does it compare to the new 12c? The cp has algebraic mode, 400 lines of program space, more cash flows, and x2. The new 12c has pure RPN and raw speed.


Possibly Related Threads...
Thread Author Replies Views Last Post
  HP48GX - looking for Chotkeh Engineering Review Software MANUAL giancarlo 0 754 12-06-2013, 04:28 PM
Last Post: giancarlo
  [HP-Prime xCAS] Review Polynomial Tools + BUGs + Request CompSystems 0 566 09-05-2013, 12:53 PM
Last Post: CompSystems
  A hands-on review of the HP Prime Adrien Bertrand 7 1,384 08-14-2013, 03:45 AM
Last Post: Juergen Keller
  Review of the Fourier Hygrometer for HP StreamSmart Mic 3 880 03-30-2013, 02:40 PM
Last Post: Gerson W. Barbosa
  Review of the Fourier Thermometer for HP StreamSmart Mic 7 1,359 03-30-2013, 02:39 PM
Last Post: Gerson W. Barbosa
  HP StreamSmart Review Mic 6 1,285 03-13-2013, 02:33 PM
Last Post: Jedidiah Smith
  Review of the HP-40gs Mic 1 631 02-27-2013, 03:31 PM
Last Post: Eddie W. Shore
  Review of the DM-16CC hpnut 1 677 01-04-2013, 04:21 PM
Last Post: Guido (Canada)
  Review of the HP-300S+ Mic 22 3,307 01-01-2013, 06:07 PM
Last Post: chris smith
  Review of the HP-10S+ Mic 0 514 12-30-2012, 09:06 AM
Last Post: Mic

Forum Jump: