41CL TURBO anomaly?



#10

I was playing around with the SANDMATH ROM on my V2 hardware 41CL and came across an interesting anomaly. I was timing the INCX function to see how it compared to a two line 1 + in FOCAL

LBL	INC-TST
LBL A
“RUNNING…”
AVIEW
0
ENTER
LBL 01
1
+
GTO 01
RTN
LBL B
“RUNNING…”
AVIEW
0
ENTER
LBL 02
INCX
GTO 02
RTN

and got the following numbers when running for 10 seconds:

Turbo mode	FOCAL 1+	INCX		Speed gain
0 82 128 1.56
2 163 241 1.48
5 391 527 1.35
10 682 835 1.22
20 1090 1192 1.09
50 1665 1580 0.95

I was surprised by these results. In fact I would have expected the speed advantage to increase with the INCX function as TURBO went up since I would have expected MCODE to have a greater advantage over FOCAL at higher turbo speeds.

I do realize that the majority of the execution time is spent outside the actual code that adds 1 or increments X but I would still have expected the INCX approach to be faster than the 1 + FOCAL at all turbo speeds.

What am I missing?

Cheers,

-Marwan

Edited: 14 May 2012, 11:34 a.m.


#11

It is only really meaningful to compare the results between different Turbo modes for the same code. The reason for this is that the automatic switch back to 1x (for display access, checking the keyboard, etc.) interferes with fast execution.


For example, suppose that you have a sequence of 60 mcode instructions that can execute at the Turbo speed without interruption, followed by something that has to execute at 1x. At 20x this sequence will run in 3 normal 1x bus cycle times, but at 50x this sequence will run in 2 normal 1x bus cycle times. At 50x any sequence of mcode instructions numbering between 50 and 100 will execute in 2 normal 1x bus cycle times, and if the number is closer to 50 than to 100, more time will be wasted waiting to resync to the bus.

So, from the information in the table, I would guess that the INCX case has more Turbo-capable sequences that are just slightly longer than a multiple of 50 than the other case. This makes the INCX case have to waste more cycles waiting to sync back up to the bus to run something at 1x.


#12

Well, for what it´s worth- the four almost-trivial functions INCX, DECX, INCY, and DECY - are not in the latest SandMath release, the SandMath-IV...

the LBL B code has a lot going on in FOCAL, so my guess is that the INCX MCODE effect has little impact in the global picture.

Cheers,
ÁM


#13

Quote:
the LBL B code has a lot going on in FOCAL, so my guess is that the INCX MCODE effect has little impact in the global picture.

Yes, that is true. I acknowledged that in my original post. But even taking that into consideration, and taking into account that the code in both routines is identical except for "1 +" vs. "INCX" one would expect the LBL B code to be faster at all TURBO settings. Monte's reply explains why this is not the case.

By the way, why did you choose to remove INCX, DECX, INCY, DECY from the SandMath library? I have not taken a look at the latest version (I don't actually know where to find it) so I am assuming that it was to make room for other, deemed more useful, routines?

Cheers,

-Marwan


#14

Quote:
why did you choose to remove INCX, DECX, INCY, DECY from the SandMath library?

I had to make room for some code and a couple of FAT entries, so I thought these were not adding much value to the Module given their simple nature. I replaced them with:

LGMN - Logarithm Multi-Factorial

HNX, LNX - Struve Functions

ELIPF - Elliptic Integral 1st. kind

SDGT = Sum of mantissa digits.


Quote:
I have not taken a look at the latest version (I don't actually know where to find it)

Same place where you got the current version.

Cheers,
'AM


#15

Hi Ángel,

Thank you for your response.

I guess I still have the old docs. I'll grab the latest shortly. Do you know which version went into the 41CL V3? I have not installed my V3 board yet.

While I have yet to find a need for INCY and DECY I suspect that these fall into those cases where when you do need them you *really* would like to have them since incrementing Y without using ISG (which may not be possible--there is a way around this problem) takes a fair amount of stack manipulation.

X<>Y
1
+
X<>Y

or

1
STO+ Z
RDN

And either approach ends with you pushing T off the top of the stack.

The other approach would be to use ISG followed by a NO-OP to deal with the potential skip. This would preserve the stack:

ISG Y
X<>X
...

Using X<>X as a NO-OP. This would probably be the approach I would use.

At any rate I understand your reasoning.

I am just trying to get into MCODE programming and am thinking about building a small utility library with these sorts of functions. You already have most of the ones I want but also many that I would not include since the idea is not to build a "higher math" library but a simple utility library. This is something that I am just starting to play with since I have never done any MCODE on a 41.

Cheers,

-Marwan


#16

Hi again,

DECX/Y and INCX/Y can always be replaced with ISG X/Y and DSE X/Y followed by a NOP,(like "F0", text-0 string). Yes it's a little slower and takes one more byte but it removes the dependency with the module, and FAT entries are a a premium.

Hope you get into MCODE soon, it's a lot of fun. A routine library is perhaps the best way to start... as a matter of fact I'm almost ready to release my Library#4 project, which is exactly about that.

Cheers,
ÁM


#17

Hi Ángel,

I just used X<>X in my example because it was easier than messing with a synthetic instruction. But yes, "F0" would better. Actually if you are skipping on every ISG instruction (I have yet to test where you don't skip) the break even point is between TURBO 10X and TURBO 20X (see the table below) as we are seeing the same thing as we did for INCX in my earlier posts. I'll add timings for NOT skipping shortly just for completeness.

Iterations per 10 seconds:

Turbo FOCAL SandMath Performance
Speed (ISG Y, X<>X) (INCY) increase
0 92 136 1.48
2 184 258 1.40
5 448 566 1.26
10 803 898 1.12
20 1369 1262 0.92
50 2316 1708 0.74

Thanks for your responses!

-Marwan

#18

Hi Monte,

Thanks for the explanation. Very informative and it clarifies things for me.

This following table appears to confirm what you wrote. This is a set of timings for the same basic routines modified to do 5 "1 +" combinations or 5 INCX instructions within the body of the loop. As can be seen in the table, the performance improvement is greater at the lower turbo speeds but falls off faster and ends up slightly lower at 50X.

Turbo mode	FOCAL 1+ (x5)	INCX (x5)	Speed gain
0 102 181 1.77
2 200 331 1.66
5 475 737 1.55
10 855 1136 1.33
20 1418 1599 1.13
50 2226 2076 0.93

Cheers,

-Marwan


Possibly Related Threads...
Thread Author Replies Views Last Post
  HP-41CL setup troubleshooting Xavier A. (Brazil) 2 475 12-02-2013, 06:29 AM
Last Post: Xavier A. (Brazil)
  [41CL] New Extra Functions version Monte Dalrymple 0 282 11-08-2013, 04:32 PM
Last Post: Monte Dalrymple
  So, latest 41CL / Library 4 config is... Gene Wright 4 528 09-22-2013, 02:59 AM
Last Post: Ángel Martin
  HP-41CL anyone? Matt Agajanian 8 728 08-31-2013, 12:27 AM
Last Post: Sylvain Cote
  [41CL] A couple more rhetorical questions Monte Dalrymple 1 307 07-12-2013, 09:28 AM
Last Post: Ángel Martin
  41CL :TROUBLE IN FILE TRANsFER aurelio 22 1,654 06-18-2013, 03:44 PM
Last Post: aurelio
  [41CL] Another question for users Monte Dalrymple 28 2,112 06-03-2013, 10:04 AM
Last Post: Geir Isene
  [41CL] Updated Manual Monte Dalrymple 1 315 05-14-2013, 10:22 PM
Last Post: Matt Kernal
  HP-41CL & NoV(-64): Race condition? Geir Isene 11 937 05-03-2013, 01:59 PM
Last Post: Diego Diaz
  [41CL] Memory Reference posted Monte Dalrymple 12 990 05-01-2013, 03:39 PM
Last Post: Etienne Victoria

Forum Jump: