Nut Processor Clock Speed?



#9

I've looked all over but I can't seem to find the clock speed of the Nut CPU used in the 41C/CV/CX. I wrote a program for the 41 and then entered it into the 42S to try it there. I was amazed at how much faster it ran on the 42s (Lewis @ 1MHz), I'm guessing at least 2X faster. Thanks,

Steve


#10

Approx. 360-380kHz


#11

While the frequencies cited by HrastProgrammer are correct (of course), they really don't reflect the true speed of the processor. The Nut CPU is bit serial, i.e. it operates on one bit of data at a time. The time required to operate on one such bit is called a "bit time" (surprise ;) and corresponds to a clock cycle. The key piece of information though is that each instruction takes 56 bit times to complete - a so called "word time" . If we divide the maximum clock frequency by the cycles per word time we get :

378000 / 56 = 6750

That's 6750 instructions per second. In comparison, the 1 MHz Saturn core in the Lewis IC can execute a maximum of 333333 instruction per second. That's almost *50 times faster* . So, now you see the real reason for the dramatic speedup of your code. :)

The information about the 41 here is derived from that contained in the 41 service manual. Hopefully it's accurate.

-----------------------------------------------------------------

Jonathan Busby - jdb@SNMAPOhouston.rr.com

Remove the random permutation of "NOSPAM" before replying.


#12

Is that really true.

It's easy to get agreement that your math on the 41 is good.

How about this (suggestion - maybe this dialogue would help answer)

42S is a Saturn based CPU with 64bit ACC registers

The bus is 4 bit wide (as opposed to 41 serial).

Wouldn't it take the CPU at least 16 cycles to load
the accumulator and therefore make the suggested
333333 wrong. Even assuming that the clockspeed and
the CPU processing speed is the same might prove to
be inaccurate.

Kim


#13

You said :

"42S is a Saturn based CPU with 64bit ACC registers"

Well, the Saturn doesn't have "accumulator" registers per se, but it has 4 64-bit general purpose registers : A, B, C, and D. These can be used interchangeably with most instructions but there are exceptions with respect to eg. memory access and loading of immediate data. Also, the D register is the most restrictive when it comes to its use: It will only interoperate with the C register.

"The bus is 4 bit wide (as opposed to 41 serial)."

While this is true for all Saturn processors, there is one notable exception from a performance perspective: In Yorke based Saturns (used in the 48G series and its siblings) the ALU operates at twice the speed of the external CPU clock from what I've seen. This gives the processor an effective 8-bit internal data bus.

"Wouldn't it take the CPU at least 16 cycles to load the accumulator and therefore make the suggested 333333 wrong."

You're thinking in terms of the 41's single cycle architecture, where "cycle" here is the 41 word time. The single cycle architecture is one of the simplest CPU designs and as you probably guessed (or knew), this is an architecture in which most or all instructions take one clock cycle to execute. In contrast, the Saturn has a multi-cycle design where the instruction execution times vary depending on what each instruction does. In general, the time it takes a Saturn instruction to execute is roughly proportional to how many nibbles of data it has to process.
Now, admittedly the speedup factor I gave was a bit contrived as I was basing this on one of the Saturn's fastest instructions, namely P=P+1 (which takes 3 cycles).
In real world code that actually accomplishes something useful you'll have a mix of instructions with varying execution times. A good guess for the average execution time of an instruction is 16 cycles, which gives you a speedup factor of around 10. But, this is in no way scientific since it really depends on what the currently executing routine is trying to accomplish.

"Even assuming that the clock speed and the CPU processing speed is the same might prove to be inaccurate."

The clock speed of the Lewis, like other Saturn based ICs, only varies to a very minor degree depending on temperature.
It's usually around 1 MHz, although it can be increased to 2 MHz at the expense of battery life. I don't know what you mean by "CPU processing speed". How is that suppose to change? As far as I know the architecture isn't self modifying. ;) ( a self modifying Saturn on an FPGA, now that's a thought ;)

-----------------------------------------------------------------

Jonathan Busby - jdb@SNMAPOhouston.rr.com

Remove the random permutation of "NOSPAM" before replying.


#14

"While this is true for all Saturn processors, there is one notable exception from a performance perspective: In Yorke based Saturns (used in the 48G series and its siblings) the ALU operates at twice the speed of the external CPU clock from what I've seen. This gives the processor an effective 8-bit internal data bus."

I don't know what your're really talking about, but the internal multiplexed address/data bus is always 4 bit wide. The chips with external memory controllers using an 8 bit data bus to connect with standard memory chips (ROM's, RAM's) like Lewis, Clarke and Yorke have also different timing. The timing of each opcode differs first if the memory storage device is connected to the internal 4 bit bus or 2nd connected over the 8 bit data interface. When accessing devices over the external bus the internal strobe signal is sometimes "streched", causing some extra CPU cycles. Finally on external devices we have different timing on access to even or odd addresses. That's easy to explain:

1) Even

D0=(5)	#00000
A=DAT0 B

will activate the external data bus once to read the content of one 8 bit cell of the external device. The internal CPU logic will transcode it into the internal 4 bit address/data bus.

2) Odd

D0=(5)	#00001
A=DAT0 B

will activate the external data bus twice to read the content of two 8 bit cells of the external device, because every wanted nibble is in different memory cells. Especially you see this when you're writing on odd addresses. You have read cycles between the write cycles. That's clear, when I only want to change 4 bit of a for example 8 bit RAM cell, I have to read the complete 8 bit, exchange or replace the wanted 4 bit and write the 8 bit back. But the internal chip logic is so clever that when writing 8 bit to an even address, that there's no read cycle necessary. You can check this behavior on a HP48GX without opening the case. You only have to play around with the bank switcher FF with a write protected card in slot 2.

So back to the problem above. In many times we can't tell about the true speed of code execution, because we don't know if the code is located on an odd or even address, especially when running in RAM.

Now also back to the HP42S with it's Lewis chip. The Lewis chip has a mask programmable 64KB ROM inside connected with the internal 4 bit data bus, so we haven't the problems above in general. We have always the shortest timing. The problems only occur when you access the RAM or language ROM of the HP17B/17BII or running the Lewis CPU in EPROM (development) mode (using an external memory controller).

BTW, Emu48 use the timings of the original Saturn CPU using the internal data bus. Because of this, the timing isn't very accurate, the additional strobe "stretch" and the different timing accessing on even/odd addresses are ignored.

Christoph Giesselink


#15

You said:

"I don't know what your're really talking about, but the internal multiplexed address/data bus is always 4 bit wide."

Notice that I was talking about the data bus internal to the *Saturn CPU core itself* not the Saturn data bus internal to the Yorke IC but external to the processor core.

Also, notice that I used "effective" in connection with "8-bit". I wasn't saying that the data bus was actually 8-bit, just that it appears to be as the ALU can process 8 bits in one clock cycle. This can be seen if we take the instruction A=A+C W as an example. Not taking into account memory access time, this instruction takes 19 cycles on the Clarke but 11 cycles on the Yorke. If we factor out the fetch/decode time, then we get 16 and 8 cycles respectively. So, the Yorke execution time is twice as fast. This holds for most instructions. Now, I'm not sure if the ALU is really 8 bits wide or if it just runs at twice the speed of the external CPU clock, but the latter would seem the logical choice since it's easier to implement and has a lower cost in terms of logic resources.

The information presented above is not the only evidence backing the claim put forward here. Dave Arnett (for those not in the know, he was the chief hardware engineer for the 48G/GX) said himself that instruction execution times had been improved in the Yorke.


"When accessing devices over the external bus the internal strobe signal is sometimes "streched", causing some extra CPU cycles"

I haven't read this anywhere, but it's implied because if the memory controller can't accept/deliver a nibble of data in one clock cycle (due to reasons you cited) then it will drive the WAIT line high causing the CPU to hold the NSTR (strobe) line low for longer than one clock cycle. But, this still doesn't explain why you have quarter cycles in the timing of some instructions.

-----------------------------------------------------------------

Jonathan Busby - jdb@SNMAPOhouston.rr.com

Remove the random permutation of "NOSPAM" before replying.


#16

You said:

"The information presented above is not the only evidence backing the claim put forward here. Dave Arnett (for those not in the know, he was the chief hardware engineer for the 48G/GX) said himself that instruction execution times had been improved in the Yorke."

But all opcodes with timing information I saw, comparing CPU cycles (Saturn, Clarke and Yorke) in tone and beeper routines of the HP48G(X), the Yorke implemtation always need more CPU cycles (~30%) for each opcode then the others. And I don't think that this timinig information is wrong, because all constants in the beeper formulars base on these opcode timings and are used by the real calculator. BTW these timing information differs also from Dan and Mika's GX cycle table. This corresponds also with the speed difference between 48SX and 48GX, with nearly double clock rate (2Mhz -> 3.7Mhz) we get about 60% more speed.

But I know at least 7 opcodes where the cycle time changed from the 1LK7 core to the actual core (in 1LU7, 1LR2, 1LR3, 1LT8).

"I haven't read this anywhere, but it's implied because if the memory controller can't accept/deliver a nibble of data in one clock cycle (due to reasons you cited) then it will drive the WAIT line high causing the CPU to hold the NSTR (strobe) line low for longer than one clock cycle. But, this still doesn't explain why you have quarter cycles in the timing of some instructions."

Why not, we have to differ about "CPU frequency" and "strobe rate". The 1Mhz we are talking about at the HP42S is the "strobe rate", the "CPU frequency" is (RATE+1)*524288Hz = (7+1)*524288Hz = 4,2Mhz -> with "CPU frequency" / 4 = "strobe rate" = (7+1)*524288Hz / 4 = 1.05Mhz at default setting.

So quarter of cycles are possible when the time stretching is generated by the internal "CPU frequency". But this is speculation.

Regards

Christoph


Possibly Related Threads...
Thread Author Replies Views Last Post
  48G vs 49G+ User RPL Speed Comparison John Colvin 7 1,164 11-16-2013, 10:07 PM
Last Post: Han
  HP Prime - Revision Suggestion - Setting the Clock Bill Triplett 5 1,098 11-15-2013, 12:36 AM
Last Post: Joe Horn
  Prime incase anybody missed the clock Dougggg 5 1,030 10-19-2013, 04:05 PM
Last Post: Geoff Quickfall
  Using your HP-65 as a clock! Don Shepherd 24 2,982 10-08-2013, 11:53 AM
Last Post: Don Shepherd
  WP-34S: Speed of y^x Marcel Samek 1 556 09-14-2013, 07:31 PM
Last Post: Paul Dale
  WP-34S function execution speed ? Gene Wright 4 822 09-04-2013, 05:40 PM
Last Post: Paul Dale
  WP-34s clock program Andrew Nikitin 8 1,264 06-09-2013, 11:20 PM
Last Post: Andrew Nikitin
  Cool math clock Bruce Bergman 28 3,354 04-10-2013, 03:13 AM
Last Post: Siegfried (Austria)
  HP-39gII speed Mic 2 768 02-24-2013, 05:55 PM
Last Post: Thomas Klemm
  Calculator Speed Benchmark (Add Loop) Thomas Chrapkiewicz 2 740 01-20-2013, 11:24 AM
Last Post: Thomas Chrapkiewicz

Forum Jump: