#### Topic: What number representation to choose for a homebrew calculator?

One of first things a calculator-tinker has to do in his project is to consider what number format to use.

This choice is crucial for either the prospective math library and the way how to display numeric data.

There are two fundamental ways to implement numbers in a small calculators:

- Binary coded decimals (BCD) as used in original HP engines.

- Binary floating point formats as used on PCs.

Variant a) comes in two basic flavours: Packed and unpacked BCD while there are more design details

to consider, for example are exponents stored in sign/magnitude or in complement format?

Variants of b) usually differ in the amounts of bits to be dedicated to mantissa and exponent. Standardized

(IEEE) floating point formats use exponents to the base of 2 in a biased format. We will see later, that other

scalings (16 or 256) of the exponents may also be interesting. In such case the number is interpreted as *(–1)^S · M · 16^(E – bias)* or *(–1)^S · M · 256^(E – bias)* instead of *(–1)^S · M · 2^(E – bias)*.

Let’s discuss advantages and drawbacks of different ways to represent numbers. In the classic HPs a 56bit

BCD format is used as follows:

` S ’ M M M M M M M M M M ’ X E E`

Each place represents a packed BCD nibble of 4 bits. Exponents are stored in 10’s complement while for the

mantissa sign/magnitude is used (*see also Jacques’ page, HP35 -> operating -> output format*).

It is to note that HP calculators use a dedicated decimal CPU which provides a large and specialized set of

decimal operations (addition, subtraction, shifting of arbitrary parts of a decimal register). For example shifting

the mantissa one digit to the right takes one single 10bit machine-instruction.

Available microcontrollers suited for making calculators usually work with binary numbers where elementary

instructions can process one byte at a time. Operations on registers have to be implemented manually, either

in C or Assembly language, taking much more program space than in HP originals.

**Decimal formats**

+ Original HP algorithms from the math section may be used as they are without any adaptations, assumed

that the basic operations (addition, shift, etc.) have been implemented.

+ Since decimal numbers are stored the way users want to read them, only little formatting has to be done

(rounding to match current display format is fairly easy). It is not necessary to perform complex radix

transformations known from binary environments.

– Available microcontrollers are optimized for binary operations. Decimal arithmetics / shifting are somewhat

harder to implement and run slower, especially if one likes to use packed BCD.

– Data densitiy (means: accurracy) is worse than in binary formats, especially when using unpacked BCD to

reduce complexity. As an example lets consider a number of 12 digits accurracy and exponent’s range ±999:

In packed BCD it would take 17 digits (12 mantissa, 2 signs, 3 exponent) or 68 bits. Unpacked formats are

out of discussion taking as much as 136 bits. With binary formats one would need 40 bits for the mantissa,

1 bit for the sign and 10 to 13 bits for the exponent including it’s sign: about 52 bits all together (13 nibbles).

**Binary formats**

+ better data-density and better fitted to popular microcontrollers (see above).

– radix conversion necessary after number entry (dec -> bin) and prior to display output (bin -> dec). This

operation is quite slow (10 to 40ms in the first VTQ model, so I don’t care ...). But it is also very complex

requiring as much as 3000 – 4000 bytes of program memory (in my first model VTQ–9). I assume it will

get better after having implemented logarithms.

– Simple decimal numbers like 0.3 can not be stored in it’s precise form (there is always a rounding error,

since 10’s system doesn’t match the 2’s system).

+ Better scalability of exponent. As mentioned above, the exponent may be stored and interpreted in various

ways thus making it a 2’s, 4’s, 8’s or in general 2j’s exponent. Let’s have a look at this from the point of view

of normalization: By using 2’s exponent, incrementing the exponent by 1 has to be compensated by a bit

shift right of the mantissa. Use an 16’s exponent and you have to replace the bit-shift by a nibble-shift, or

with an 256’s exponent by a byte-shift. Conclusion: Control granularity by choosing different exponent’s bases

(*see separate post «Binary Formats & Arithmetics»*).

**My Choice**

Despite the hard times I expected to face trying to implement floating-point radix conversions, I fell for a fully

binary format in my first model VTQ–9. My registers were 64 bits wide containing 48 bits of mantissa, 13 bits of

exponent, 1 bit for the sign, 2 bits on reserve:

`SrrEEEEE’EEEEEEEE’MMMMMMMM’MMMMMMMM’MMMMMMMM’ ... ’MMMMMMMM`

48 bits of mantissa give about 14.4 decimal digits (log10(248)). I «officially» stated an accurracy of 12 digits

while using the remaing 2.4 digits as a reserve to save accurracy. 13 bits of exponent allows a dynamic range

of 2±4095 or 10^±1232, enough for my desire to let the range be in 10^±999.

In all my existent and projected models I use a Microchip Pic 18F6520 controller (8bit engine with 16k words of

program memory and about 3k bytes of data memory).

Techically, a register in the first model is an array of 16bit-integers:

```
typedef unsigned int uint16; // primitive system type
uint16 reg[4]; // "register" allocated on stack
// example of function prototype:
void add_2 (uint16* a, uint16* b);
```

*Arithmetic implementations in the first model are discussed in a different thread.*

For my upcoming second model VTQ–13 I started wondering again, if I should stick with my binary format or

change to BCD. Expecting to significantly save program space for radix conversions once having implemented

logarithms, I again kept the binary system because of the better data usage and faster operation.

Basic arithmetics and normalization on my mind, I however made a change to the exponent’s granularity,

replacing the old 13bit binary exponent by a 10bit exponent to the base 256, thus defining a number in the

256’s system.

During my works on the Math library for the VTQ–13 I encountered neither the 2’s system nor the 256’s system

to be optimal for all situations (todate I already implemented addition/subtraction, multiplication, division,

logarithms, exponentials, square roots, while tangent is on the way). It has shown up, that in most cases

exponents to the base 16 are the best choice. If I had to refactor the Math-library, I would thoroughly use 16’s

exponents to store floating point numbers. In the next section I compare numbers to the bases 2, 10, 16 and 256

in respect to arithmetic and transcendental functions.

**Number-systems compared (2, 10, 16, 256)**

Let’s have a closer look at these 4 systems by analyzing their behaviour in digit-by-digit processes as they are

used by HP’s original algorithms. Two questions are of interest:

- How many steps are nesessary to perform pseudo-operations (namely divisions)?

- How many constants have to be stored for meggitt-processes, as they are used to compute trgonometric functions, logarithms etc.)?

First, we focus division. Here we need a set of subtractions to perform a pseudo-division, but there are no

constants to store. We assume an accurracy of 14 decimal digits and compare the number of subtractions for each system:

In a decimal system, we would need 7 · 14 = **98** subtractions in an average case with all «5» digits in the

result (we perform 5 successful subtractions, a sixth-one with underflow and a restoring addition which is

assumed to be as expensive as a subtraction (in fact, my latest implementations of addition and subtraction

both run 6.4us).

In binary we need to evaluate 48 binary digits to be equivalent to 14 decimals: For each step we have to

perform one comparison (which is, in fact a non-destructive subtraction) and an average of 50% of all

digits a real subtraction. So we have 1.5 subtractions per bit, thus **72** subtractions.

In a hex-system we need (in analogy to the decimal system) 9 · 12 = **108** subtractions (average case

with all «8» digits).

In a 256’s system with all 0x80 digits, finally we need 6 · 130 = **780** subtractions!

If we want to perform calculation on a logarithm for instance, we need to store constants (see Jacques’

articles on this subject: HP–35 Æ logarithms). If we assume the number of constants is proportional to the

number of digits, we need in the worst case 6 constants for the 256’s system, 12 for the 16’s system, 14

for the decimal system and 48 for pure binary computing (in practice the actual number is somewhat less,

but proportions match the reality):

```
System Subtractions Constants
-----------------------------------------------
2 72 48
10 98 14
16 108 12
256 780 6
```

Both the extreme systems deliver poor performance in one of the two disciplines while the decimal and the

16’s system provide in all cases a quite-close-to-the-optimum behaviour.

Can we profit from the advantages of the 2’s and the 256’s system without having to suffer from their

drawbacks? The answer ist: «Sometimes yes.» It is an interesting observation that the mantissae of

all three binary-oriented systems (2, 16 and 256) are equvalent! Let’s have a look at the number **39.25**

(all representations written in hex for better readability):

256’s system: +27.40’00’00’00’00 e000

16’s system: +2.74’00’00’00’00’0 e001

2’s system: +1.3A’00’00’00’00’0 e005

All three representations share the same bit-pattern (for the 2’s system shift one bit to the left to see ...).

The only differing things are the radix-point and the exponent.

In my implementations I use this observation to freely travel between these 3 systems if I can draw some

profit out of it. The only thing you have to care is to adjust the exponent the right way. So I divide mantissae

in the 2’s system, compute logarithms and tangents in the 16’s system and perform additions in the 256’s

system. - *For details see my upcoming posts on math implementations*.