Calculator Benchmark: HP-48GX/SysRPL



#16

Hi Xerxes and all others

Here's my contribution to your interesting
listing .
The program takes about 37.1s to run on my HP-48GX/R. I used the UserRPL- and the Forth-program as prototypes.

Pitfalls:

  • Since I couldn't find an ABS function for binary integers, I used 2DUP#< followed by a SWAP to make sure the difference is positive.
  • rplcomp automatically wraps the lines between WHILE and REPEAT with DOCOL/SEMI. When compiling using JAZZ/ASS you have to do that by yourself. This took me some time and crashes to find out.
This is my first SysRPL-program, so let me know if something could be improved.

Best regards

Thomas

*
* Eight Queens Puzzle
*
* Time: 37.1350097656
* Solution: { 8 4 1 3 6 2 7 5 }
* Steps: 876
*
* ( --> time { solution } steps )
ASSEMBLE
NIBASC /HPHP48-R/
RPL
DEFINE A 1GETLAM
DEFINE A\<- 1PUTLAM
DEFINE Y 2GETLAM
DEFINE Y\<- 2PUTLAM
DEFINE X 3GETLAM
DEFINE X\<- 3PUTLAM
DEFINE S 4GETLAM
DEFINE S\<- 4PUTLAM
DEFINE R 5GETLAM
DEFINE R\<- 5PUTLAM

DOCOL CK0NOLASTWD
EIGHT
ZEROZEROZERO DUP
5PICK NDUPN {}N
NULLLAM FIVE NDUPN {}N BIND
CLKTICKS
BEGIN
( A[X++] = R )
R X #1+DUP X\<- A
PUTLIST A\<-
BEGIN
S #1+ S\<-
X Y\<-
BEGIN
Y #>1
WHILE
( | A[X] - A[Y--] | )
A X NTHCOMPDROP
A Y #1- DUP Y\<- NTHCOMPDROP
2DUP#< IT DOCOL SWAP SEMI #-
DUP#0= SWAP
X Y #- #= OR
IT DOCOL
#0 Y\<-
BEGIN
( A[X]-- )
A X NTHCOMPDROP
#1- DUP
X A PUTLIST A\<-
#0=
WHILE
X #1- X\<-
REPEAT
SEMI
REPEAT
Y #1=
UNTIL
X R #=
UNTIL
CLKTICKS SWAP bit- #>%
# 2000 UNCOERCE %/
A S
ABND
SEMI


#17

Hello Thomas,

thank you for porting n-queens to SysRPL. It's interesting to see the
difference to UserRPL (6.5x).

Unfortunately I have no experience with SysRPL to assess if any significant improvement
is possible.


#18

A little improvement is possible:

Since there's only one command (SWAP) in the if-then clause the enclosing DOCOL/SEMI can be removed.

Now I measured only 35.2s.

Here's the corrected line:

     33     2DUP#< IT SWAP #-

I've also mixed up the position of the ++/-- in the comments:

     21   ( A[++X] = R )
30 ( | A[X] - A[--Y] | )
39 ( --A[X] )

May I ask you to update the listing?

Thanks

Thomas


#19

Yes, of course your corrections are welcome. ;-)

I'm a bit surprised of the HP-71B results especially in Forth (46.3 sec)
in comparison to SysRPL, considering the much slower cpu clock.


#20

The Forth program uses an array to keep track of the position of the queens whereas my program uses a list which I assume is not as efficient.

Unfortunately I could not figure out how to use an array of binary integers in SysRPL.

From RPLMAN.DOC; 2.4 Execution

Quote:
This flexible interpretation is intrinsically slower than
the indirect-only execution (like Forth), because of the
overhead of making the direct/indirect test. In practical
implementations of RPL, it is possible to shift the overhead
almost entirely to the direct execution case, so that the
execution penalty for the indirect case is negligible,
including primitive assembly language objects that are never
executed directly.

Maybe we'd have to run a different test where both programs use the
same data structures

to find out whether this overhead is indeed negligible.


#21

Quote:
Unfortunately I could not figure out how to use an array of binary integers in SysRPL

You can easily create an array of any type using the normal means,

but you have to write your own access methods for data types other than real or complex numbers.

If you're dealing with integers only you could also use a long binary integer,

and treat it like an array, or organize it as needed;-)

Arrays are, from a data stream point of view, nothing else than a
string of consecutive entities,

which are defined (type) and organized (rows, cols) in the header.


From a performance point of view, a list is the worst but most flexible object type.


HTH

Raymond


#22

Quote:
you have to write your own access methods for data types other than real or complex numbers.

Does that mean writing in assembler?

It would be a great help if you could give me some code snippets

on how to create, access and change an array of binary integers.

A simple example like the one I used (i.e. bint[8]) will do.

Or maybe you can point me to the appropriate documentation.

My current sources are RPLMAN.DOC and RPLCOMP.DOC from HP as well as
"Programming in System RPL" from E. M. Kalinowski and C. Dominik.

Any help will be appreciated

Thomas

PS: I have only an HP-48GX. Therefore using ZINTS is not an option.


#23

Hi Thomas,

below is the source for two maybe useful methods, wrc2HXS and GetAndSet.

wrc2HXS creates an 'empty' HXS string of the needed size, GetAndSet is the suitable HXS access method.

Note that the wrc2HXS code actually creates a CSTR, which doesn't have any disadvantages over a HSTR,

as long as you don't attempt to edit it with non-suitable functions.

If you intend to access the HXS stream with other methods than those below,

you should make sure to change the prologue to DOHSTR, as indicated in the listing.

I made some other methods for converting between real arrays and arrays_of_BINT, but didn't print them here.

Just drop me a note if interested.

HTH

Raymond


Quote:
PS: I have only an HP-48GX.

Wise choice. Did you try SpeedUI ?


Now for the sources:


*****************************************************************************
* Modulname : wrc2HXS Create tempob HXS string from #width #rows #cols
* Modultype : Secondary/Code
* Dest.Comp. : HP48
* Language : System RPL with parts of Machine Code
* Author : Raymond Del Tondo
* Interface : #w #r# #c --> HXS|$
* Description: Just try...
* Keywords :
*
* Stack : #w #r# #c --> HXS|$
*
* Notes :
*
* THIS CODE IS COPYRIGHTED 2005 BY :
* Raymond Del Tondo of Magic48ges
*
* To Do :
*
* Version Date ED Reason
*
* 0.0 22.11.2005 RDT Start demo file
*
*****************************************************************************
TITLE wrc2HXS

*****************************************************************************
* INCLUDE FILES
*****************************************************************************
* INCLUDE D:\RPL\TL\HEADER.H

*****************************************************************************
* DEFINES
*****************************************************************************
ASSEMBLE
DBG? EQU 0
RPL

*****************************************************************************
* MISSING ENTRIES
*****************************************************************************
ASSEMBLE
=WIPEOUT EQU #0675C
RPL

*****************************************************************************
* OBJECTS
*****************************************************************************


*****************************************************************************
*
* wrc2HXS Create tempob HXS string from #width #rows #cols
*
* #w #r# #c --> HXS|$
*
*****************************************************************************
CODE
* A B C D Rn Dn P Cyc Data Stk
GOSBVL =POP2# r c
RSTK=C
GOSBVL =SAVPTR
C=RSTK ...cc
B=A A r
D=C A c
CSL A r r ..cc. c
CSL A .cc..
C=C+A A .ccrr
CSL A ccrr.
A=DAT1 A 1:@dsktop
D1=A 1:@ob1
D1=D1+ 5 1:@ob_body
A=DAT1 A w
C=C+A A ccrrw
RSTK=C w r ccrrw c * Save for later
C=B A w r r c

***************************************************************************************************************************
* CPU: A B C D Rn Dn P Cyc Data Stk
* w r r c 0:? 0
* 1:?
* 2:?
* 3:?
* 4:?
* 0:?
* 1:@ob_body
* Stk: ( *Here will be an RPL data stack content list* )
***************************************************************************************************************************
* A B C D Rn Dn P Cyc Data Stk
*
GOSBVL =MUL# ? prod_wr ? c
A=B A prod_wr prod_wr
C=D A prod_wr c
GOSBVL =MUL# prod_wrc
C=B A prod_wrc
C=C+CON A,5
GOSBVL =MAKE$N lenfield 0:@prlg 0:@data
C=RSTK ccrrw
DAT0=C A * Write wrc field
* LC(5) =DOHSTR

D0=D0+ 5 0:data+5
AD0EX @data+5 0:len
D0=D0- 10 0:len-10
C=A A @data+5
CD1EX len-10 1:@data+5
CD0EX
GOSBVL =WIPEOUT * Zero out data area
GOVLNG =GPOverWrR0Lp
ENDCODE


And here follows the GetAndSet method:


** HP-48
**
** GET: HXS #c #r TRUE --> HXS #value
** SET: HXS #c #r #value FALSE --> HXS'
**


* A B C D
CODE
ST=1 3
GOSBVL =popflag
GOC STget

GOSBVL =POP# #data
R4=A
ST=0 3

STget GOSBVL =POP2# c r
R3=A
RSTK=C
GOSBVL =SAVPTR
GOSBVL =GetStrLenStk
C=0 A
C=DAT1 1 #w
D=C A #w
D1=D1+ 3
C=DAT1 B #cc
D1=D1+ 2 * Skip wrc field
A=C A #cc
C=RSTK #cc #r
C=C-1 A #cc #r-1 #w
GOSBVL =MUL# product #w
A=R3 c
A=A+B A ((#r-1)*#cc)+c * Now we have the element slot
C=D A ((#r-1)*#cc)+c #w

A=A-1 A * Pre-decrement

GOSBVL =MUL# product #w

CD1EX
C=C+B A
CD1EX

C=D A
P=C 0
P=P-1

?ST=1 3
GOYES STget2

****************************
* Part of the PUT
****************************
A=R4
DAT1=A WP
P= 0
GOVLNG =GETPTRLOOP
****************************

* But here we need the GET

STget2 A=0 A
A=DAT1 WP
P= 0
GOVLNG =PUSH#ALOOP
ENDCODE


#24

Hi Raymond

Thanks a lot for the listings. I promise to study them carefully and try to improve the program.

But it will probably take some time since I'm a newbie to SysRPL.

I'd be interested in the listings of the other methods you mentioned as well.

You can use the forum to send me an e-mail.

Quote:
Did you try SpeedUI ?

Not yet though I already read about it. But sure I will try it soon.

I know it's one of your projects.

Thanks again for your kind support!

Thomas


#25

Hi Thomas,

you have mail (if the MoHPC mailer works)

Regards

Raymond

#26

It took me some time to pick the proper commands from your listing,

but at last I succeeded in writing two routines to set and read a nibble from a HXS.

But alas, when I run the program, it takes 39.9s compared to 35.2s
of the pure SysRPL-program.

I wondered what could be gained if I just inlined the code, but it still takes 37.2s.

My conclusions:

  • it seems the overhead using lists is not that big
  • probably my routines could still be improved
  • though it might seem it wasn't worth I've learned a lot

Here's the listing for those who might want to have a look at it:

ASSEMBLE			                       	   CLKTICKS                                        
NIBASC /HPHP48-R/ BEGIN
RPL ( A[++I] = R )
DEFINE A 1GETLAM A R I #1+DUP I\<-
DEFINE A\<- 1PUTLAM PUTBINTEL EVAL A\<-
DEFINE J 2GETLAM
DEFINE J\<- 2PUTLAM BEGIN
DEFINE I 3GETLAM S #1+ S\<-
DEFINE I\<- 3PUTLAM I J\<-
DEFINE S 4GETLAM BEGIN
DEFINE S\<- 4PUTLAM J #>1
DEFINE R 5GETLAM WHILE
DEFINE R\<- 5PUTLAM ::
DEFINE GETBINTEL 6GETLAM ( | A[I] - A[--J] | )
DEFINE PUTBINTEL 7GETLAM A I GETBINTEL EVAL SWAP
J #1- DUP J\<- GETBINTEL EVAL
:: CK0NOLASTWD SWAPDROP
2DUP#< IT SWAP #-
' ( PUTBINTEL )
:: DUP#0= SWAP
CODE I J #- #= OR
GOSBVL =POP2# IT ::
RSTK=C A #0 J\<-
GOSBVL =SAVPTR
C=RSTK A ( --A[I] )
B=C A A I GETBINTEL EVAL
C=DAT1 A #1- SWAPOVER
C=C+B A I PUTBINTEL EVAL A\<-
CD1EX
D1=D1+ 5+5-1 BEGIN
DAT1=A 1 #0=
GOVLNG =GETPTRLOOP WHILE
ENDCODE ::
; ( --A[--I] )
A I #1- DUP I\<-
' ( GETBINTEL ) GETBINTEL EVAL #1-
:: SWAPOVER I
CODE PUTBINTEL EVAL A\<-
GOSBVL =POP# ;
GOSBVL =SAVPTR REPEAT
C=DAT1 A ;
C=C+A A ;
CD1EX REPEAT
D1=D1+ 5+5-1 J #1=
A=0 A UNTIL
A=DAT1 1 I R #=
GOVLNG =PUSH#ALOOP UNTIL
ENDCODE CLKTICKS
; SWAP bit- #>%
EIGHT # 2000 UNCOERCE %/
ZEROZEROZERO A S UNCOERCE
%0 %># ABND
NULLLAM SEVEN NDUPN {}N BIND ;

#27

Yes, the result is really unexpected. I guess it's necessary to use assembly only to have a real improvement. ;-)


#28

It's also a matter of how often a loop will be run, and how the data is organized.

There _is_ a chance to slightly improve performance of the above program

by cutting down LAM usage to a minimum, and use the data stack instead.

Since the second loop, the one that ends with @J #1= UNTIL,

is run 876 times, there may be some potential inside that structure.

Another small change that saved at least one second was

replacing #>1 by the discrete words 'ONE #>' .

But the looping frequence in this code may be too small for _big_ time savings.

And a word regarding lists:
The code above may not be optimal in showing the benefits of using integer streams,

because of said low looping frequence, but if it were much more loops to run, the differences would be obvious.

Please also note that list manipulation functions are real space wasters,

leading to more frequent garbage collections, which in turn immediately slows the system down;-)


Raymond


#29

Yes, using a list as storage is not very efficient. A problem of flexibility against speed.

#30

Thanks for clarification. A similar behaviour is observable on the BASIC like formula programmable calculators in dependence of using lists or arrays.


Possibly Related Threads...
Thread Author Replies Views Last Post
  Yet another benchmark port on the wiki: Savage Pier Aiello 35 1,239 09-26-2013, 03:22 AM
Last Post: Pier Aiello
  A brand new calculator benchmark: "middle square method seed test" Pier Aiello 25 845 09-13-2013, 01:58 PM
Last Post: Pier Aiello
  New community-maintained version of "Calculators benchmark: add loop" Pier Aiello 20 719 09-12-2013, 02:42 AM
Last Post: Pier Aiello
  Calculator Speed Benchmark (Add Loop) Thomas Chrapkiewicz 2 210 01-20-2013, 11:24 AM
Last Post: Thomas Chrapkiewicz
  WP-34s: Speed benchmark W. Bruce Maguire II 13 489 04-29-2012, 12:16 AM
Last Post: Gilles Carpentier
  HP-41CL Calculator Benchmark Juergen Keller 17 486 06-21-2011, 05:48 PM
Last Post: Paul Dale
  Windows 7 and HP 48GX Nico Tak 12 494 03-01-2011, 04:02 AM
Last Post: Marcus von Cube, Germany
  HP 48GX Squeaky Anthony (USA) 3 185 02-09-2011, 01:15 PM
Last Post: Anthony (USA)
  N-queens benchmark and the HP-17BII Thomas Klemm 62 887 02-07-2011, 05:58 AM
Last Post: Thomas Klemm
  Loops of Addition benchmark for the 41CL Monte Dalrymple 8 226 08-19-2010, 09:48 PM
Last Post: Wlodek Mier-Jedrzejowicz

Forum Jump: