BTW did I mention that I just slightly improved the code,
it now needs only about 0.3363 (ZERO POINT THREE) seconds!
I replaced the biggest time wasters by some more efficient code.
Here's the story:
As mentioned in my earlier post, the ASLC and CSLC loops were the ones which could use some refinement.
** The two loops L110asl and L110csl from the original listing take the biggest amount of time.
** In the best case, when P is 9, the loop is run 16-9 times, -> 5 times,
** which sums up to 5*21 + 5*3 + 4*10 (NC case) + 1*3 (C case) = 105 + 15 + 40 + 3 = 163 cycles
** In the worst case, when P =1, the loop is run 16-1 times, 15 times,
** which sums up to 15*21 + 15*3 + 14*10 (NC case) + 1*3 (C case) = 315 + 45 + 140 + 3 = 503 cycles !!!
**
** The newer method (using ASR W) andshown below,
** runs 1 time in the best case, and 8 times in the worst case
** summing up to 2 + 3+d + 6 + 1*3+d + 1*2 + 3 = 2 + 4 + 6 + 19 2 + 3 = 36
** or worst 2 + 3+d + 6 + 8*3+d + 8*2 + 7*10 + 3 = 250 cycles !
P=C 0 Y 6
A=C P 'A(Y)' 3+d
P= 0 2
C=-C P 3+d
P=C 0 6
*
L110asr ASR W 3+d
P=P+1 2
GONC L110asr 10/3
**********
C=A P 3+d * Backup of 'A(Y)'
A=C W 3+d * Full backup of AAAAAAAA incl A(Y)
C=B P AAAAAAAAX 3+d
C=-C P 3+d
P=C 0 6
L110csr CSR W 3+d
P=P+1 3
GONC L110csr 10/3
ACEX W A[0]='A(X)' C[0]='A(Y)' 3+d
So the second variant was remarkably faster (runtime was about 0.5 seconds!) than the easy and elegant, but slower initial version.
But the worst case (250 cycles) still looked not too good, so I tried some other methods, including self-modifying code in temporary memory.
Not that self-modifying shit which actually modifies itself,
but a default code slice in a RAM buffer, outside of the
main program,
which got a parameter modified on demand, and then called from the main program.
Not bad, too, but the management overhead was slightly too large.
With this method the run time was between 0.37 seconds and 0.38 seconds. Nothing more to gain.
So I finally used the discrete dispatcher version, which I actually had before the 'self-modifying' one.
This version is the fastest one up to now. It has a run time of 0.33630 seconds :-)
The current listing is shown below.
Have fun:-)
Raymond
:: CK0
CLKTICKS
* A B C D Dn Rn P Cyc
CODE
GOSBVL =SAVPTR
L10 LAHEX 888888888
R0=A 0:R 19
C=0 W 3+d
D1=C 1:S 8
B=0 A X=0 7
L40 P= 0 2
A=R0 R 19
?A=B P 13+d/6+d
GOYES L180
L50 B=B+1 P R X' AAAAAAAAx 3+d * X=X+1
L60 P= 0 0 2 * A(X)=R
C=B P AAAAAAAAX 3+d
P=C 0 X 6
C=A P AAAAAAAAX 3+d * Pushed R to AAAAAAAA at pos X
L70 CD1EX S 8 * S=S+1
C=C+1 A 7
CD1EX 8
L80 P= 0 0 2 * Y=X
C=B P X 3+d
D=C P Y=X 3+d
L90 D=D-1 P Y' 3+d * Y=Y-1
L100 ?D=0 P 13+d/6+d
GOYES L40
L110 P= 0 2 * T=A(X)-A(Y)
C=D P AAAAAAAAY 3+d
GOSUB Ptst
P= 0
A=C P
C=B P
GOSUB Ptst
P= 0
?A>=C P 13+d/6+d
GOYES NoSwp
ACEX P 3+d
NoSwp A=A-C P ABS(T) 3+d
L120 ?A=0 P 13+d/6+d * IF T=0 THEN 140
GOYES L140
L130 C=B P X 3+d * IF X-Y<>ABS T THEN 90
C=C-D P X-Y 3+d
?A#C P 13+d/6+d
GOYES L90
L140 P= 0 2
C=B P X 3+d * A(X)=A(X)-1
P=C 0 X 6
C=C-1 P 3+d
L150 ?C#0 P 13+d/6+d * IF A(X)<>0 THEN 70
GOYES L70
L160 P= 0 2 * X=X-1
B=B-1 P 3+d
L170 ?B#0 P 13+d/6+d * IF X<>0 THEN 140
GOYES L140
L180 CSR W * Shift right one nib 3+d
AD1EX 8
P= 7 2
ACEX WP 3+d
RSTK=C * Save count 8
GOSBVL =PUSHhxs
GOSBVL =SAVPTR
C=RSTK 8
A=C A 7
GOVLNG =PUSH#ALOOP
*********
Ptst P=C 0 6
?P# 1 13/6
GOYES tP2
CPEX 1
C=P 0
CPEX 1
RTNCC
tP2 ?P# 2 13/6
GOYES tP3
CPEX 2
C=P 0
CPEX 2
RTNCC
tP3 ?P# 3 13/6
GOYES tP4
CPEX 3
C=P 0
CPEX 3
RTNCC
tP4 ?P# 4 13/6
GOYES tP5
CPEX 4
C=P 0
CPEX 4
RTNCC
tP5 ?P# 5 13/6
GOYES tP6
CPEX 5
C=P 0
CPEX 5
RTNCC
tP6 ?P# 6 13/6
GOYES tP7
CPEX 6
C=P 0
CPEX 6
RTNCC
tP7 ?P# 7 13/6
GOYES tP8
CPEX 7
C=P 0
CPEX 7
RTNCC
tP8 CPEX 8
C=P 0
CPEX 8
RTNCC
ENDCODE
CLKTICKS ( *Ticks1 #Board #LCnt Ticks2* )
4ROLL
bit- #>% # 2000 UNCOERCE %/
SWAP UNCOERCE
;
Edited: 8 Feb 2008, 7:06 p.m.