The problem with the 48 is that it can be very slow with the
stack
Well, the RPL stack can be very deep. I don't think that I'd call it
slow if I restricted myself to using only 4 levels. On a 48GX, 4 ROLL or
4 ROLLD takes only about 21 ticks (0.0026 seconds). Can we blink that
fast?
I have noticed that ROLL or ROLLD over a lot of levels can be relatively
slow, about 186 ticks (0.023 second) on a 48GX if I have the numbers 1
through 1000 on the stack and execute 1000 ROLL or 1000 ROLLD. How long
does that take on a non-RPL calculator?
For reference, here's my general purpose timing program, which I used
for the above and for the total times in the programs below. Note that
the "correction value" varies by model. In these particular calculators
today, I used these values for correction, but the best value varies
among units and even over time (Temperature? Battery condition? Phase of
the moon?). To find the value, I put the number 1 on the otherwise empty
stack, temporarily disable the last stack, arguments, and commands
saves, run the timer program, and adjust the value until it *usually*
returns "Ticks: 0" and "0_s". Well, on a 48 series that is; on a 49G I
have to settle for *usually* +/- 10 ticks (0.0012 seconds). This program
isn't 100% repeatable, but I expect that it's about the best that can be
done in UserRPL, and plenty good enough for my purposes. If I care to
get picky about routines that are very nearly the same (or very fast),
then I run them in loops. Note that this program times the evaluation of
the object on the stack; if that's a name, the time will include
whatever's needed for name resolution.
48SX Checksum (using 104 below): # C2F4h
48GX Checksum (using 72 below): # 5F0h
49G Checksum (using 252. below): # CF3Eh
Bytes: 136
%%HP: T(3)A(R)F(.); @ Header for ASCII download.
\<<
MEM \-> t @ Force GC, use MEM result for dummy 't'.
\<<
TICKS 't' STO @ Get start time, store it in 't'.
EVAL @ Evaluate the object.
TICKS @ Get stop time.
RCWS @ Get current wordsize.
64. STWS @ Set wordsize to 64.
SWAP t - @ Elapsed time.
B\->R @ Convert binary to real.
@ Correction for time to do TICKS 't' STO.
@104 48SX value
72 @ 48GX value
@252. 49G value
-
"Ticks" \->TAG @ Tag result.
DUP 8192. / '1_s' * @ Also show in seconds.
ROT STWS @ Restore original wordsize.
\>>
\>>
"Garbage collection" (but I still say that it should be called "memory
packing") gets slower as the stack gets deeper, and often slows down
drastically if the stack (which is really a stack of pointers) has many
pointers into a large composite that's in temporary memory (that is, the
composite isn't stored in a global or port variable). An example of such
a pointer into a composite would be to build a list on the stack and use
GET to extract an element from it. It "looks like" the element is on the
stack, but really the pointer is into the list, which is still in
temporary memory. The original list is still referenced and kept in
temporary memory for the sake of the pointer until the element is
dropped, stored in a global or port variable, combined into another list
or a vector, or has the NEWOB command executed on it. (This also affects
how much free memory there is.) The garbage collection routine
automatically runs when the unused memory gets too low. If the garbage
collection routine runs and finds a pointer into a composite, then it
has to check all of the following objects in the composite one at a time
to find the end of the composite and its size field so that it knows
exactly which block of memory to move.
Usually, when speed is the most important criterion, I recommend using
the stack instead of variables, and if that gets unmanageable, then
local variables in preference to global variables. But it can be a huge
advantage to store a large list in a global variable (to get it out of
temporary memory) before taking it apart if there's an expectation that
a garbage collection may occur with the "list elements" on the "stack".
For example, compare the execution times for the following programs.
Note that MEM forces a garbage collection (to make it easy to determine
how much memory is available), and it's the easiest way to get an idea
of how long garbage collection takes. In the following programs I've
included internal timing routines for the MEM command itself. All of the
timings are with the stack empty and with last stack, arguments, and
commands saving enabled. Wordsize is 64. The 49G is in "approximate"
mode. Note that the "correction value" varies by model; they were found
experimentally and may not be "best" for all calculators. Also note that
the times would vary with repeated trials.
This first one is just to time MEM on a nearly empty stack.
48SX Checksum (using 32 below): # 6255h
48GX Checksum (using 16 below): # 85AFh
49G Checksum (using 189. below): # F96Dh
Bytes: 84.5
%%HP: T(3)A(R)F(.); @ Header for ASCII download.
\<<
TICKS @ Get the start time.
MEM @ Force a garbage collection.
TICKS @ Get the finish time.
ROT - @ Elapsed time.
B\->R @ Convert binary to real number.
@ Correction for time to execute TICKS.
@32 48SX value
16 @ 48GX value
@189. 49G value
-
"Ticks" \->TAG @ Tag result.
DUP 8192. / '1_s' * @ Also show in seconds.
\>>
48SX: 378 ticks (0.046 second) for MEM,
1510 ticks (0.18 second) total.
48GX: 254 ticks (0.031 second) for MEM,
1031 ticks (0.13 second) total.
49G: 594 ticks (0.072 second) for MEM,
1738 ticks (0.21 second) total.
Now with the reals 1 through 1000 on the stack:
48SX Checksum (using 32 below): # C99Fh
48GX Checksum (using 16 below): # 2E65h
49G Checksum (using 189. below): # 62B9h
Bytes: 111.5
%%HP: T(3)A(R)F(.); @ Header for ASCII download.
\<<
1. 1000. @ Place the reals 1 through 1000 on the stack.
FOR n
n
NEXT
TICKS @ Get the start time.
MEM @ Force a garbage collection.
TICKS @ Get the finish time.
ROT - @ Elapsed time.
B\->R @ Convert binary to real number.
@ Correction for time to execute TICKS.
@32 48SX value
16 @ 48GX value
@189. 49G value
-
"Ticks" \->TAG @ Tag result.
DUP 8192. / '1_s' * @ Also show in seconds.
\>>
48SX: 12751 ticks (1.56 seconds) for MEM,
56681 ticks (6.92 seconds) total.
48GX: 8866 ticks (1.08 seconds) for MEM,
39169 ticks (4.78 seconds) total.
49G: 9052 ticks (1.10 seconds) for MEM,
39057 ticks (4.77 seconds) total.
No list involved.
Now we'll build a list and explode it onto the stack:
48SX Checksum (using 32 below): # 428Dh
48GX Checksum (using 16 below): # A577h
49G Checksum (using 189. below): # CBB8h
Bytes: 121.5
%%HP: T(3)A(R)F(.); @ Header for ASCII download.
\<<
1. 1000. @ Place the reals 1 through 1000 on the stack.
FOR n
n
NEXT
DUP \->LIST @ Put the numbers into a list.
LIST\-> @ Explode the list onto the stack.
DROP @ Discard the count.
TICKS @ Get the start time.
MEM @ Force a garbage collection.
TICKS @ Get the finish time.
ROT - @ Elapsed time.
B\->R @ Convert binary to real number.
@ Correction for time to execute TICKS.
@32 48SX value
16 @ 48GX value
@189. 49G value
-
"Ticks" \->TAG @ Tag result.
DUP 8192. / '1_s' * @ Also show in seconds.
\>>
48SX: 1818259 ticks (221.96 seconds) for MEM,
1870881 ticks (228.38 seconds) total.
48GX: 1255937 ticks (153.31 seconds) for MEM,
1292199 ticks (157.74 seconds) total.
49G: 1180957 ticks (144.15 seconds) for MEM,
1216584 ticks (148.51 seconds) total.
"Looks like" the same
thing on the stack for MEM, but because of the list, it's much slower.
Oh well, that gives us time to start a fresh pot of coffee.
But if the list is stored in a global variable before we explode it:
48SX Checksum (using 32 below): # C0E2h
48GX Checksum (using 16 below): # 5AB7h
49G Checksum (using 189. below): # 4C07h
Bytes: 156.5
%%HP: T(3)A(R)F(.); @ Header for ASCII download.
@ Note well! This will purge a global variable named 'TEMP'. If you
@ already have something named 'TEMP', then choose a different name!
\<<
'TEMP' @ Place the name on the stack.
1. 1000. @ Place the reals 1 through 1000 on the stack.
FOR n
n
NEXT
DUP \->LIST @ Put the numbers into a list.
OVER STO @ Store the list in a global variable.
RCL @ Put the list back on the stack.
LIST\-> @ Explode the list onto the stack.
DROP @ Discard the count.
TICKS @ Get the start time.
MEM @ Force a garbage collection.
TICKS @ Get the finish time.
ROT - @ Elapsed time.
B\->R @ Convert binary to real number.
@ Correction for time to execute TICKS.
@32 48SX value
16 @ 48GX value
@189. 49G value
-
"Ticks" \->TAG @ Tag result.
DUP 8192. / '1_s' * @ Also show in seconds.
'TEMP' PURGE @ Purge the variable.
\>>
48SX: 6556 ticks (0.80 second) for MEM,
68400 ticks (8.35 seconds) total.
48GX: 4508 ticks (0.55 second) for MEM,
67779 ticks (8.27 seconds) total.
49G: 4996 ticks (0.61 second) for MEM,
51629 ticks (6.30 seconds) total.
Much better, even with the
extra overhead of handling the global variable. I'm willing to trade the
extra size for speed.
And no, using a local variable doesn't have this effect; local variables
are still in temporary memory. Try:
48SX Checksum (using 32 below): # C035h
48GX Checksum (using 16 below): # 5590h
49G Checksum (using 189. below): # 4178h
Bytes: 144
%%HP: T(3)A(R)F(.); @ Header for ASCII download.
\<<
1. 1000. @ Place the numbers 1 through 1000 on the stack.
FOR n
n
NEXT
DUP \->LIST @ Put the numbers into a list.
\-> TEMP @ Store the list in a local variable.
\<<
TEMP @ Put the list back on the stack.
LIST\-> @ Explode the list onto the stack.
DROP @ Discard the count.
TICKS @ Get the start time.
MEM @ Force a garbage collection.
TICKS @ Get the finish time.
ROT - @ Elapsed time.
B\->R @ Convert binary to real number.
@ Correction for time to execute TICKS.
@32 48SX value
16 @ 48GX value
@189. 49G value
-
"Ticks" \->TAG @ Tag result.
DUP 8192. / '1_s' * @ Also show in seconds.
\>>
\>>
48SX: 1820005 ticks (222.63 seconds) for MEM,
1872915 ticks (228.63 seconds) total.
48GX: 1257259 ticks (153.47 seconds) for MEM,
1293734 ticks (157.93 seconds) total.
49G: 1181800 ticks (144.26 seconds) for MEM,
1217617 ticks (148.63 seconds) total.
The local variable
didn't help a bit.
These timings were done were on a 48SX Version D, a 48GX Version R, and
a 49G Revision #1.19-6.
Another technique is to force a garbage collection with MEM DROP just
before these conditions occur, in the hope that garbage collections
won't be needed at particularly bad times.
By the way, there's supposed to be a new 49G Revision 1.19-7 flash ROM
already written, just waiting for HP to allow it to be released. The
plan is (was?) that this will have an improved garbage collection
routine, particularly when it comes to handling this issue with
composite objects.
I hope that someone (maybe just RPL users?) will find this helpful or
interesting. I enjoyed playing around with the calculators, and some
things that I already "sort of knew" have soaked into my brain a bit
deeper. The biggest surprise for me was how well the 49G did.
Regards,
James