34s: Anyone have an instruction timing table?



#17

This used to be done by the user community in the PPC days.

What I'm wondering is whether such a table for opcode instruction timing exists. Something like:

+ : 10 ms

- : 11 ms

etc. The reason this might be handy is if you then saw something in the table like this:

Multiply : 20 ms
Divide : 500 ms

if you understand what I mean. Without such a list, there might be a few operations on the 34s that are out of kilter and MUCH slower than one would expect. If roll up takes 10x as long as roll down, then perhaps it would be best to either try optimizing the roll up code more or doing four roll down instructions.

So... anyone have something like this yet?


#18

Not that I'm aware of. About the only way to figure out timings would be to write a small program that looped lots. I'd also expect some variation based on arguments.

I do know that all the logarithms (and functions that use logarithms) are slow.


- Pauli


#19

It appears that the ROUND function is particularly slow. Any chance :-) this function could be reviewed for some optimization? :-) And, of course, please don't take this as any criticism at all!

===================

Jake found:

If I store a long decimal value in R05 and take the little program

LBL C

RCL 05

x<>X (acting as a NOP)

Roll down

1

+

GTO C

and starting with zero in X and run it for roughly 5 seconds, it counts up to 5613.

If I then replace the x<>X with ROUND, things get interesting.

Setting the display to FIX 0, the count after ~5 seconds is 3463.

With FIX 1, it reached 3330.

With FIX 2, it reached only 47.

With FIX 5, the count reached 46.

With FIX 11, the count was about the same.... 46.

And all the ones in between 3 and 11 seemed to count only up to 45-47 each.

The ROUND function surely makes the 34S run at a much slower pace.


#20

It will be fast in everything except FIX mode :-)

The reason being that it does a 10^x internally which requires logs.
I'm guessing multiplication by 0 and 1 are special cased.

I might be able to do something with it to speed up the FIX case too.


- Pauli

#21

And a fix is in. Should be much faster once built (rev 1256 or later) -- probably later this evening. Let me hope it works properly still :-)

x<>X isn't a good NOP here, it goes through a very different code path than ROUND. ABS or +/- are better for the NOP.

- Pauli


Edited: 19 July 2011, 8:17 a.m. after one or more responses were posted


#22

Thanks. That's great.

Perhaps some of us (yep, I know... suggest and you volunteer) should try this type of thing with other instructions?

It would help point out areas that would benefit from optimization or instructions that just seem much slower than "normal".

Will test the fix in the program shortly. Thanks again.


#23

Quote:
yep, I know... suggest and you volunteer

You be learning :-)

Quote:
It would help point out areas that would benefit from optimization or instructions that just seem much slower than "normal".

Anything involving logarithms (which includes powers) will be slow. Most other things seem to be fast enough. A quick scan over the source code will identify problematic functions easily enough.

Even if slow functions are identified, there is no guarantee that they can be sped up. Some functions just can't and others will get too much larger to avoid the problematic subroutines. This one was nice and easy -- no space gain, an acceptable portability loss and a good speed improvement.

- Pauli

Edited: 19 July 2011, 8:20 a.m.


#24

I understand.

However, if we can find them through example code like the stuff for ROUND, then you may be able to adjust them. You might not, but at least it could be looked at. :-)

#25

Quote:
x<>X isn't a good NOP here, it goes through a very different code path than ROUND. ABS or +/- are better for the NOP.

...yeah, well, that was what came into my head at that moment :-) I think it was a sufficient measuring method anyway, since the loop with x<>X still ran very fast as compared to ROUND. I'll use ABS next time.

Jake


#26

Actually, why not just use NOP as the NOP?

:-)


#27

Because it is not a monadic operation like ABS or ROUND that modifies the X register and sets LastX. NOP is simply ignored by the execution engine and therefore carries considerably less overhead.


#28

I think the real thing Jake was doing with this instruction was to simply see how the loop count was affected by replacing a NOP with the ROUND instruction at various FIX settings.

X<>X worked, since it just essentially did nothing, as would NOP since it really does do nothing. :-)

The "trouble" remains that ROUND slows down considerably if FIX is much greater than 2 or 3. :-(

I understand that will stay that way if it has to call 10^x which uses the log routines which are (relatively compared to the blinding speed the 34s shows in other areas) slow.

The reason it was a concern to us in the first place is that we have gotten used to having the 34s blow away any other machine when it is running something but the little code we were testing here was SLOWER than existing machines.

We don't like the 34s to be slower. ;-)

(Again, just in case my text here is not clear... none of this is in any way meant to be critical of anything. We're all just trying to help find places to tweak for improvements!).


#29

Quote:
X<>X worked, since it just essentially did nothing, as would NOP since it really does do nothing. :-)

Both of these go via very different code paths than a monadic function like round. NOP takes no arguments and does nothing with last X and doesn't even call a worker routine. x<>X goes through the commands with arguments path which again don't bother with last X but instead decodes the argument. To get a representative idea of the timing, it will be best to use as similar a function as possible. In this case, however, ROUND is so slow it probably doesn't matter much.


As for being slower, I've not tested but I'll take your word for it. The 10^x/log is gone from the code path so that clearly wasn't the big expensive operation :-( Digging a bit deeper ends up in some code in the decimal library we're using (which I haven't been bothered to figure out how exactly it works). Unless I suddenly get motivated to fix the library (which isn't all that likely), ROUND is going to stay slow. Better slow and correct than fast and wrong.


- Pauli

#30

I did Jake's test with build 1257 and it still drops from counts in the thousands at Fix < 3 or 4 down to counts of 50-60 at Fix 5 or higher.

#31

You did think a "fix" was in that should speed up the ROUND instruction.

Can you do some checks using Jake's example above?


#32

I implemented what I thought would be the fix and it didn't work. This stuff happens, my idea as to where the problem was was incorrect.

I'm not planning on digging further into the underlying problem at the moment. The slowness comes from decQuantizeOp() in decNumber/decNumber.c but it isn't clear to me where.


- Pauli


Possibly Related Threads…
Thread Author Replies Views Last Post
  HP Prime touch periodic table : new version Mic 4 2,072 11-25-2013, 05:29 PM
Last Post: Terje Vallestad
  Touch periodic table on HP Prime - revisited Terje Vallestad 2 1,471 11-23-2013, 11:47 AM
Last Post: Mic
  Touch periodic table on HP Prime Mic 30 9,293 10-27-2013, 04:45 AM
Last Post: Les Koller
  Periodic table for HP Prime Mic 15 4,705 10-19-2013, 12:20 PM
Last Post: Joseph Ec
  HP41 Functions Address Table (F.A.T.) Antoine M. Couëtte 6 2,275 07-21-2013, 02:48 AM
Last Post: Antoine M. Couëtte
  x swap (I) hp 35s instruction Denis Doyon 4 1,748 12-25-2012, 11:22 AM
Last Post: Walter B
  The first periodic table of elements for HP39gII Mic 0 947 11-08-2012, 06:55 AM
Last Post: Mic
  Variation table for HP39gII Mic 15 4,610 08-13-2012, 05:05 PM
Last Post: Bunuel66
  wp34S timing crystal Peter Mansvelder 30 7,322 01-24-2012, 03:49 AM
Last Post: Bart (UK)
  [wp 34s] wp 34s picture and scan Jeroen Van Nieuwenhove 2 1,201 10-27-2011, 09:02 PM
Last Post: Les Wright

Forum Jump: