▼
Posts: 1,545
Threads: 168
Joined: Jul 2005
This used to be done by the user community in the PPC days.
What I'm wondering is whether such a table for opcode instruction timing exists. Something like:
+ : 10 ms
- : 11 ms
etc. The reason this might be handy is if you then saw something in the table like this:
Multiply : 20 ms
Divide : 500 ms
if you understand what I mean. Without such a list, there might be a few operations on the 34s that are out of kilter and MUCH slower than one would expect. If roll up takes 10x as long as roll down, then perhaps it would be best to either try optimizing the roll up code more or doing four roll down instructions.
So... anyone have something like this yet?
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
Not that I'm aware of. About the only way to figure out timings would be to write a small program that looped lots. I'd also expect some variation based on arguments.
I do know that all the logarithms (and functions that use logarithms) are slow.
- Pauli
▼
Posts: 1,545
Threads: 168
Joined: Jul 2005
It appears that the ROUND function is particularly slow. Any chance :-) this function could be reviewed for some optimization? :-) And, of course, please don't take this as any criticism at all!
===================
Jake found:
If I store a long decimal value in R05 and take the little program
LBL C
RCL 05
x<>X (acting as a NOP)
Roll down
1
+
GTO C
and starting with zero in X and run it for roughly 5 seconds, it counts up to 5613.
If I then replace the x<>X with ROUND, things get interesting.
Setting the display to FIX 0, the count after ~5 seconds is 3463.
With FIX 1, it reached 3330.
With FIX 2, it reached only 47.
With FIX 5, the count reached 46.
With FIX 11, the count was about the same.... 46.
And all the ones in between 3 and 11 seemed to count only up to 45-47 each.
The ROUND function surely makes the 34S run at a much slower pace.
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
It will be fast in everything except FIX mode :-)
The reason being that it does a 10^x internally which requires logs.
I'm guessing multiplication by 0 and 1 are special cased.
I might be able to do something with it to speed up the FIX case too.
- Pauli
Posts: 3,229
Threads: 42
Joined: Jul 2006
And a fix is in. Should be much faster once built (rev 1256 or later) -- probably later this evening. Let me hope it works properly still :-)
x<>X isn't a good NOP here, it goes through a very different code path than ROUND. ABS or +/- are better for the NOP.
- Pauli
Edited: 19 July 2011, 8:17 a.m. after one or more responses were posted
▼
Posts: 1,545
Threads: 168
Joined: Jul 2005
Thanks. That's great.
Perhaps some of us (yep, I know... suggest and you volunteer) should try this type of thing with other instructions?
It would help point out areas that would benefit from optimization or instructions that just seem much slower than "normal".
Will test the fix in the program shortly. Thanks again.
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
Quote: yep, I know... suggest and you volunteer
You be learning :-)
Quote: It would help point out areas that would benefit from optimization or instructions that just seem much slower than "normal".
Anything involving logarithms (which includes powers) will be slow. Most other things seem to be fast enough. A quick scan over the source code will identify problematic functions easily enough.
Even if slow functions are identified, there is no guarantee that they can be sped up. Some functions just can't and others will get too much larger to avoid the problematic subroutines. This one was nice and easy -- no space gain, an acceptable portability loss and a good speed improvement.
- Pauli
Edited: 19 July 2011, 8:20 a.m.
▼
Posts: 1,545
Threads: 168
Joined: Jul 2005
I understand.
However, if we can find them through example code like the stuff for ROUND, then you may be able to adjust them. You might not, but at least it could be looked at. :-)
Posts: 349
Threads: 66
Joined: Apr 2007
Quote:
x<>X isn't a good NOP here, it goes through a very different code path than ROUND. ABS or +/- are better for the NOP.
...yeah, well, that was what came into my head at that moment :-) I think it was a sufficient measuring method anyway, since the loop with x<>X still ran very fast as compared to ROUND. I'll use ABS next time.
Jake
▼
Posts: 1,545
Threads: 168
Joined: Jul 2005
Actually, why not just use NOP as the NOP?
:-)
▼
Posts: 3,283
Threads: 104
Joined: Jul 2005
Because it is not a monadic operation like ABS or ROUND that modifies the X register and sets LastX. NOP is simply ignored by the execution engine and therefore carries considerably less overhead.
▼
Posts: 1,545
Threads: 168
Joined: Jul 2005
I think the real thing Jake was doing with this instruction was to simply see how the loop count was affected by replacing a NOP with the ROUND instruction at various FIX settings.
X<>X worked, since it just essentially did nothing, as would NOP since it really does do nothing. :-)
The "trouble" remains that ROUND slows down considerably if FIX is much greater than 2 or 3. :-(
I understand that will stay that way if it has to call 10^x which uses the log routines which are (relatively compared to the blinding speed the 34s shows in other areas) slow.
The reason it was a concern to us in the first place is that we have gotten used to having the 34s blow away any other machine when it is running something but the little code we were testing here was SLOWER than existing machines.
We don't like the 34s to be slower. ;-)
(Again, just in case my text here is not clear... none of this is in any way meant to be critical of anything. We're all just trying to help find places to tweak for improvements!).
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
Quote: X<>X worked, since it just essentially did nothing, as would NOP since it really does do nothing. :-)
Both of these go via very different code paths than a monadic function like round. NOP takes no arguments and does nothing with last X and doesn't even call a worker routine. x<>X goes through the commands with arguments path which again don't bother with last X but instead decodes the argument. To get a representative idea of the timing, it will be best to use as similar a function as possible. In this case, however, ROUND is so slow it probably doesn't matter much.
As for being slower, I've not tested but I'll take your word for it. The 10^x/log is gone from the code path so that clearly wasn't the big expensive operation :-( Digging a bit deeper ends up in some code in the decimal library we're using (which I haven't been bothered to figure out how exactly it works). Unless I suddenly get motivated to fix the library (which isn't all that likely), ROUND is going to stay slow. Better slow and correct than fast and wrong.
- Pauli
Posts: 1,545
Threads: 168
Joined: Jul 2005
I did Jake's test with build 1257 and it still drops from counts in the thousands at Fix < 3 or 4 down to counts of 50-60 at Fix 5 or higher.
Posts: 1,545
Threads: 168
Joined: Jul 2005
You did think a "fix" was in that should speed up the ROUND instruction.
Can you do some checks using Jake's example above?
▼
Posts: 3,229
Threads: 42
Joined: Jul 2006
I implemented what I thought would be the fix and it didn't work. This stuff happens, my idea as to where the problem was was incorrect.
I'm not planning on digging further into the underlying problem at the moment. The slowness comes from decQuantizeOp() in decNumber/decNumber.c but it isn't clear to me where.
- Pauli
|