HHC 2011 Programming Puzzle SPOILERS



#2

Since the conference is well and truly over, I thought I'd post my solutions and see what other people have come up with and what other improvements are possible.

Let the discussions and optimisations begin :-)


- Pauli


#3

For the WP 34S.

A basic routine that runs over each value of X in the positive quadrant and
solves the circle equation to get the number of pixels set in that column.

	001  [cmplx]DROP     Ignore centre
002 0 Accumulator
003 RCL Y
004 STO[times] Z Keep r^2
005 LBL 00
006 DEC X
007 x<0?
008 GTO 01 Check for loop finish
009 ENTER[^]
010 x[^2]
011 RCL- T
012 +/-
013 [sqrt] sqrt(r^2 - x^2)
014 CEIL
015 STO+ Z Accumulated
016 DROP
017 GTO 00
018 LBL 01 Exit code -- multiply by 4
019 DROP
020 4
021 [times]

This program takes 6.5 seconds for a radius of 999. For a radius
of 5,000 it takes 32.7 seconds. Execution time is is linear in the
radius. It might be possible to tweak this a bit more and save a
few fractions of a second, however the execution time will remain
linear in the radius.

- Pauli

Edited: 26 Sept 2011, 2:17 a.m.

#4

For the WP 34S.

Rather than solving the circle equation each time, an incremental
approach is also possible. Bresenham's circle algorithm manages
to generate the outer points on a circle using only integer shifts
and additions. The following program implements Bresenham's circle
algorithm, however the results are slightly off because it is using
the pixel centre points rather than their extremes. This error is
almost certainly correctable without introducing a performance loss.

	001  [cmplx]DROP     Ignore centre
002 BASE 10 Integer mode
003 STO 00 Save radius as X coordinate
004 +/-
005 STO I Error accumulator
006 0
007 STO 01 Y coordinate
008 STO J Area accumulator
009 LBL 00
010 RCL 00
011 x<? 01 Check for loop termination
012 GTO 01
013 STO+ J Accumulate column
014 RCL 01 Update Y coordinate
015 SL 01
016 STO+ I Error = Error + 2 Y + 1
017 INC I
018 INC 01 Y = Y + 1
019 RCL I
020 x<0? Check for change in X
021 GTO 00
022 DEC 00 X = X - 1
023 RCL 00
024 SL 01 Error = Error â&#128;&#147; 2 X
025 STO- I
026 GTO 00
027 LBL 01 Finish up
028 RCL J Area in octant
029 SL 01 Area in quadrant plus duplicated overlap
030 RCL 01
031 x[^2] Correct the duplicated overlap
032 -
033 SL 02 Quadruple to full circle
034 DECM Switch mode back

Running time for a radius of 999 is 0.9 seconds, for a radius of
5,000 it is 4.7 seconds. Again this approach will give time linear
in the radius but each iteration is very fast.

- Pauli

Edited: 26 Sept 2011, 2:18 a.m.


#5

Paul,

I was aware of this method. It was used in the Microsoft BASIC in my old MSX 8-bit computer. Great to see an implementation, even if the results are slightly diffent.

Gerson.

From the MSX Red Book:


----------------------------------------------------------------------------

Address... 5BBDH

This is the circle mainloop. Because of the high degree of
symmetry in a circle it is only necessary to compute the
coordinates of the arc from zero to forty-five degrees. The
other seven segments are produced by rotation and reflection of
these points. The parametric equation for a unit circle, with T
the angle from zero to PI/4, is:

X=COS(T)
Y=SIN(T)


Direct computation using this equation, or the corresponding
functional form X=SQR(1-Y^2), is too slow, instead the first
derivative is used:

dx
---- = -Y/X
dy

Given that the starting position is known (X=RADIUS,Y=0), the X
coordinate change for each unit Y coordinate change may be
computed using the derivative. Furthermore, because graphics
resolution is limited to one pixel, it is only necessary to
know when the sum of the X coordinate changes reaches unity and
then to decrement the X coordinate. Therefore:

Decrement X when (Y1/X)+(Y2/X)+(Y3/X)+... => 1
Therefore decrement when (Y1+Y2+Y3+...)/X => 1
Therefore decrement when Y1+Y2+Y3+... => X

All that is required to identify an X coordinate change is to
totalize the Y coordinate values from each step until the X


- 150 -


5. ROM BASIC INTERPRETER


coordinate value is exceeded. The circle mainloop holds the X
coordinate in register pair HL, the Y coordinate in register
pair DE and the running total in CRCSUM. An equivalent BASIC
program for a circle of arbitrary radius 160 pixels is:

10 SCREEN 2
20 X=160:Y=0:CRCSUM=0
30 PSET(X,191-Y)
40 CRCSUM=CRCSUM+Y :Y=Y+1
50 IF CRCSUM<X THEN 30
60 CRCSUM=CRCSUM-X:X=X-1
70 IF X>Y THEN 30
80 CIRCLE(0,191),155
90 GOTO 90


The coordinate pairs generated by the mainloop are those of a
"virtual" circle, such tasks as axial reflection, elliptic
squash and centre translation are handled at a lower level
(5C06H).
----------------------------------------------------------------------------


#6

Quote:
Great to see an implementation, even if the results are slightly diffent.

I'm sure the algorithm can be corrected to give the correct results. It is a matter of reworking the maths and changing the update part of the loop. I might do this if I get really keen, but I've already spent more time than I can afford on this interesting little challenge.


- Pauli


#7

That was not a criticism. Actually, the way the circles are drawn have been simplified to make the challenge easier, as Walter has noticed. I guess both algorithms produce smoother circles, therefore no need to change them. The point is the existence of such algorithms, which were created in a time when speed really mattered.

Gerson.


#8

No criticism was taken :-)

Definitely stick to integer mode and avoiding expensive operations is a win although a lot less so that when these algorithms were first discovered.


- Pauli

#9

For the WP 34S.

Again running of the value of x and solving the circle equation but in integer mode results in a faster program.

	001  [cmplx]DROP     Ignore centre
002 BASE 10
003 0 Accumulator
004 RCL Y
005 STO[times] Z Keep r^2
006 LBL 00
007 DEC X
008 x<0?
009 GTO 01 Check for loop finish
010 ENTER[^]
011 x[^2]
012 RCL- T
013 +/-
014 [sqrt] sqrt(r^2 - x^2)
015 FS? C
016 INC X
017 STO+ Z Accumulated
018 DROP
019 GTO 00
020 LBL 01 Exit code -- multiply by 4
021 DROP
022 SL 02
023 DECM

Timings in this case are 7.6 seconds for a radius of 5,000 and 1.4
seconds for a radius 999. Again, these timings are linear in the
radius of the circle.

One problem here. The integer square root code is incorrectly
rounding for some input values. Since a new firmware image was not
permitted by the rules of the contest, this solution is invalid. A fixed firmware image has been committed but not yet moved into a release package.

It should be possible to rectify the problem with the square root
by squaring the result and comparing against the initial value.
The error only occurs when the square root isn't exact and it only
seems to round the result up. Even with the extra steps involved,
this method would likely be faster than the floating point equivalent.

- Pauli

Edited: 26 Sept 2011, 2:29 a.m.

#10

Some other semi-random comments.

  • I couldn't identify an explicit formula for the result. Such a formula will almost certainly be fastest.

  • I tried using the integration command. It isn't very accurate on the 34S but it is very very fast. On a 15C LE, it took ages.

- Pauli


#11

HI,

I come first to an algorithm real close to your code.

Here is a version for HP-41C:

01 LBL"HHC2011  13 -              25 +
02 0 14 SQRT 26 ST+ 00
03 * 15 RCL Z 27 CLx
04 * 16 * 28 1
05 STO 00 17 INT 29 +
06 LBL 00 18 LASTx 30 x<y?
07 ENTER^ 19 FRC 31 GTO 00
08 R^ 20 x=0? 32 4
09 / 21 GTO 01 33 RCL 00
10 x^2 22 CLx 34 *
11 1 23 1 35 END
12 x<->y 24 LBL 01

Some detail perhaps needs an optimization.

For RPL user, here a version for HP-28S calculator, which is basic enough to be used with out modification on any RPL system.

« ->  r x y                 @ input three arguments as specified
« 0 @ initiate number of pixel in circle
0 r 1 - FOR k @ main loop k=0 to r-1
1 @ 1^2 is 1 !
k r / SQ - SQRT @ x = sqrt(1 -(k/r)^2)
r * CEIL @ n = CEIL(x.r) at row where y=k/r
+ @ add to number of pixel
NEXT
4 * @ multiply by 4 to get full cricle
»
»
'HHC2011' STO

Only one quadrant of the circle is investigated, total number of pixels is 4 times the quadrant count.

I may suggest that a version using only an half-quadrant may speed up the process and is coherent with symmetry. The total number of pixels will be 8 times one half-quadrant count.

                            @ version 2.0
« -> r x y @ input three arguments as specified
« 0 @ initiate number of pixel in circle
0 r 2 SQRT / FOR k @ main loop k=0 to r/SQRT(2)
1 @ 1^2 is 1 !
k r / SQ - SQRT @ x = sqrt(1 -(k/r)^2)
r * CEIL @ n = CEIL(x.r) at row where y=k/r
.5 k + - @ remove (k+0.5) pixels
+ @ add to number of pixel
NEXT
8 * @ multiply by 8 to get full cricle
»
»
'HHC2011b' STO

Here a proposed code for HP-15C and any other great classic HP calculators.

# --------------------------------------------
# Tcl/Tk HEWLETT·PACKARD 15C Simulator program
# Created with version 1.2.13
# --------------------------------------------
000 { }
001 { 42 21 11 } f LBL A
002 { 0 } 0
003 { 20 } ×
004 { 20 } ×
005 { 34 } x<->y
006 { 2 } 2 _
007 { 11 } \/x
008 { 10 } ÷
009 { 43 44 } g INT
010 { 44 0 } STO 0
011 { 42 21 0 } f LBL 0
012 { 43 33 } g R^
013 { 10 } ÷
014 { 43 11 } g x²
015 { 1 } 1
016 { 34 } x<->y
017 { 30 } - _
018 { 11 } \/x
019 { 43 33 } g R^
020 { 20 } ×
021 { 43 44 } g INT
022 { 43 36 } g LSTx
023 { 42 44 } f FRAC
024 { 43 20 } g x=0
025 { 22 1 } GTO 1
026 { 43 35 } g CLx
027 { 1 } 1
028 { 42 21 1 } f LBL 1
029 { 40 } +
030 { 2 } 2
031 { 15 } 1/x
032 { 30 } -
033 { 45 0 } RCL 0
034 { 30 } -
035 { 40 } +
036 { 42 5 0 } f DSE 0
037 { 43 5 0 } g CF 0
038 { 45 0 } RCL 0
039 { 43 30 3 } g TEST 3 (x >= 0 ?)
040 { 22 0 } GTO 0
041 { 33 } R_dwn
042 { 8 } 8
043 { 20 } ×
044 { 43 32 } g RTN
# --------------------------------------------

Comments and suggestions are welcome.


Edited: 26 Sept 2011, 6:28 a.m. after one or more responses were posted


#12

Interesting. I did do just an octant for the Bresenham approach. This algorithm generates just an octant of a circle and relies on symmetry to get the rest of the points on the circumference.

Rather than keeping track of the diagonal, I accumulated the total area under the octant, doubled this and removed the duplicate area which is a perfect square. This allowed everything to be kept in integer mode.


- Pauli

#13

Hi C.Ret,

Quote:
I come first to an algorithm real close to your code.

Here is a version for HP-41C:

(... 35 step program ...)
Some detail perhaps needs an optimization.


At least it can be done shorter and faster. ;-)

Here's basically the same algorithm with some improvements. R was moved under the root and a more elegant CEIL emulation was used. For positive arguments, ceil(x) equals int(x + sign(frc(x)). Due to the special sign-implemenation in the 41-series an additional test for non-zero x was required. All this makes the program shorter and faster. Add a label and rtn if you prefer. ;-)

  01 RDN
02 RDN
03 STO 00
04 STO 01
05 DSE 00
06 GTO 01
07 GTO 02
08 LBL 01
09 RCL 01
10 x^2
11 RCL 00
12 x^2
13 -
14 SQRT
15 ENTER
16 FRC
17 x>0?
18 SIGN
19 +
20 INT
21 +
22 DSE 00
23 GTO 01
24 LBL 02
25 4
26 *
For r > 1 steps 06, 07 and 24 can be omitted.

The next improvement is an update of the user instructions, telling the user not to enter the center coordinates but just the radius. This would save two more steps. ;-)

Dieter


#14

Hello

Thank you for the much more elegant CEIL way :

n INT Lastx FRAC SIGN + 
(will run whith any classic except special cas of SIGN of HP-41).

I was trying to adapt my code for Hp-15c, but I was unable to lacate any SIGN function.

May the sequence be used instead :

FIX 0 n RND
?





I will try it later and spare steps in HP-15 listing as you demonstre it to me.



Considering only a quadrant was a good idea.

Considering only an octrant will be a good one toot, if I had not made thinks the worst way. By scaning from 0 to r/sqrt(2), I spare about 30% of running time.



But, I only realize that it is possible to spare 70% of the loops by scanning k from (r-1) to r/sqrt(2) !

And using symetry to count this small octrand part two time and adding the "common" square.



A picture better than words :


« ->  r x y                    @ input three arguments
« r 2 SQRT / CEIL -> p @ compute upper limit p
« 0 @ initiate number of pixel in octant
p r 1 - FOR k @ main loop k from r-1 downto p
1 @ 1^2 is 1 !
k r / SQ - SQRT @ x = sqrt(1 -(k/r)^2)
r * CEIL @ n = CEIL(x.r) at row where y=k/r
+ @ add to number of pixels (in region A)
NEXT
2 * @ add symetric/transpose part (region A')
p SQ + @ add inside square of p.p pixels (region B)
4 * @ multiply by 4 to get full cricle
»
»
»
'HHC2011c' STO




I hope this last approache will be the more efficient, especially for large circles, concerning number of loops.

Now 5000,0,0 HHC2011c returns result in less than 1'25" on HP-28S



I still using a complex computation at each step, the solution propose by Paul Dale based on a Bresemhan method trigger me. Have to found a way to spare operations and tests. Using an 'error' or 'indicator' counter is already tracked.

Edited: 27 Sept 2011, 12:17 p.m. after one or more responses were posted


#15

Quote:
Thank you for the much more elegant CEIL way :

n INT Lastx FRAC SIGN +

Or also
  ENTER 
FRC
SIGN
+
INT
Quote:
(will run whith any classic except special cas of SIGN of HP-41).

Well, the 41 was the first HP (pocket) calculator I know of that featured a SIGN function.
So none of the classics (and many later models as well) offered that function and this approach cannot be used there.
Quote:
I was trying to adapt my code for Hp-15c, but I was unable to lacate any SIGN function.

There you are. ;-)
Quote:
May the sequence be used instead :

FIX 0 n RND

?


No, this won't work. It simply rounds the argument to the next higher or lower (!) integer. CEIL always rounds up.

But since sign(x) = x / abs(x) for x<>0, it could be accomplished this way:
  ENTER
FRC
ENTER
ABS
x=0?
+
x<>0?
/
+
INT
This works for x >= 0.

And finally, a more tricky version:

  ENTER
FRC
x<>0?
e^x
SQRT
INT
+
INT
This also works for negative x.

Ah, those were the good old days where calculators were so slow that using transcendental functions didn't matter much. ;-)

Dieter

#16

Hi C.Ret,

My solution for the HP-15C also used the optimization that you posted, where it is only necessary to find the number of pixels in ~30% of the lines. The total number of pixels is then 4 * (s1 + 2 * s2) where s1 = (INT(r/sqrt(2)+1))^2 and s2 is the number sum of the number of pixels on each line above the square on the main diagonal as shown in your picture.

Lines 29 to 34 are yet another variant on the CEIL function.

For the radius=5000 case on the HP-15C LE the program below computes the number of pixels in 16 seconds.

Best regards,

Eamonn.

001 LBL C
002 Rv
003 Rv
004 STO 0 // Radius (r) -> Reg 0
005 x^2
006 STO 3 // r^2 -> Reg 3
007 RCL 0
008 2
009 SQRT
010 /
011 1
012 STO -0 // r-1 -> Reg 0
013 +
014 INT
015 STO 1 // y1 -> Reg 1
016 X^2
017 STO 2 // s1 -> Reg 2
018 CLX
019 STO 4 // 0 -> s2
020 LBL A
021 RCL 0 // (r-1)
022 RCL 1 // y
023 TEST 7 // x<y?
024 GTO B
025 x^2
026 CHS
027 RCL +3 // r^2-y^2
028 SQRT
029 FRAC
030 TEST 0 // x<>0
031 1
032 LAST X
033 +
034 INT // X = ceil(r^2-y^2)
035 STO +4 // s2 = s2 + x
036 1
037 STO +1
038 GTO A
039 LBL B
040 2
041 RCL *4
042 RCL +2
043 4
044 *
045 RTN
#17

Here is an HP41 program with your idea (take R in X)

01  *LBL'Y
02 ENTER^
03 X^2
04 STO 00
05 2
06 /
07 STO 01
08 X<>Y
09 CF 00 ***
10 0
11 DSE Y
12 GTO 00 ***
13 1 ***
14 GTO 01 ***
15 *LBL 00
16 RCL 00
17 RCL Z
18 X^2
19 -
20 RCL 01
21 X<=Y?
22 SF 00
23 FS? 00
24 X<>Y
25 RDN
26 SQRT
27 INT
28 LASTX
29 FRC
30 X#0?
31 SIGN
32 +
33 FS?C 00
34 GTO 01
35 +
36 DSE Y
37 GTO 00
38 *LBL 01
39 X^2
40 X<>Y
41 ENTER^
42 +
43 +
44 4
45 *
46 END

*** step are for R=1 case (can be removed)

for the rounding in 'old' classic you can use

INT
LASTX
FRAC
X#0? *** SIGN calculate
ENTER^ *** duplicate if not zero
X#0? *** still the same to test at
/ *** can divide as not zero an give a real SIGN function :)
+

as a plus it does not perturb the stack too much

I use this on an HP97 :)

Olivier

Edited: 27 Sept 2011, 2:51 a.m.


#18

A faster one (shorter loop no R=1 special case, R still in X)

01  *LBL'Y (53 bytes)
02 ENTER^
03 STO 00
04 STO 01
05 STOx 01
06 2
07 SQRT
08 /
09 INT
10 LASTX
11 FRC
12 X#0?
13 SIGN
14 +
15 X^2
16 LASTX
17 RCL Z
18 -
19 X<>Y
20 *LBL 00
21 RCL 00
22 RCL Z
23 +
24 X^2
25 RCL 01
26 -
27 CHS
28 SQRT
29 INT
30 LASTX
31 FRC
32 X#0?
33 SIGN
34 +
35 STO+ X
36 +
37 ISG Y
38 GTO 00
39 4
40 *
41 END

Some timings   50 ->     8024 in  8 sec
100 -> 31796 in 16 sec
540 -> 918168 in 82 sec
1000 -> 3145520 in 150 sec
5000 -> 78559640 in 750 sec (should be 30 sec with a CL ...)

HP42S one, better use of stack and recall arithmetic

01  *LBL"Y"
02 ENTER
03 STO 00
04 STO 01
05 STOx 01
06 2
07 SQRT
08 /
09 IP
10 LASTX
11 FP
12 X#0?
13 SIGN
14 +
15 X^2
16 LASTX
17 RCL- ST Z
18 X<>Y
19 *LBL 00
20 RCL 01
21 RCL 00
22 RCL+ ST T
23 X^2
24 -
25 SQRT
26 INT
27 LASTX
28 FRC
29 X#0?
30 SIGN
31 +
32 RCL+ ST X
33 +
34 ISG ST Y
35 GTO 00
36 4
37 *
38 END

sorry, no timing ...


Edited: 28 Sept 2011, 6:58 a.m. after one or more responses were posted


#19

A stack only version for hp42s

00 { 58-Byte Prgm }
01>LBL "X"
02 ENTER
03 ENTER
04 STO+ ST Y
05 2
06 SQRT
07 ÷
08 IP
09 LASTX
10 FP
11 X!=0?
12 SIGN
13 +
14 X^2
15 LASTX
16 RCL- ST T
17 +/-
18 X<>Y
19>LBL 00
20 RCL ST Z
21 RCL- ST Z
22 RCL× ST Z
23 SQRT
24 FP
25 X!=0?
26 SF 00
27 X<> ST L
28 IP
29 FS?C 00
30 ISG ST X
31>LBL 01 *** as NOP
32 STO+ ST X
33 +
34 DSE ST Y
35 GTO 00
36 4
37 ×
38 END

start with a default state (as said) so flag 0 reset

A faster one for HP41

01>LBL "W"
02 ENTER^
03 STO 00
04 STO+ 00
05 2
06 SQRT
07 ÷
08 INT
09 LASTX
10 FRC
11 X!=0?
12 SIGN
13 +
14 X^2
15 STO 01
16 X<>Y
17 LASTX
18 -
19 0
20>LBL 00
21 RCL 00
22 RCL Z
23 -
24 LASTX
25 *
26 SQRT
27 INT
28 LASTX
29 FRC
30 X!=0?
31 SIGN
32 +
33 +
34 DSE Y
35 GTO 00
36 ST+ X
37 RCL 01
38 +
39 4
40 ×
41 END

timing (not the same machine, here a 41C, before a 41CX)
50 in 7 sec
540 in 68 sec
1000 in 126 sec
5000 in 620 sec

Edited: 28 Sept 2011, 6:08 a.m.


#20

I did it slightly differently, with about the same timings:

41C 34 bytes, 5000: 10m 11.56s 78'559'640

Original with explanation (still at 37 byes) here

#21

A better one for 42S (shorter loop)

01 >LBL "X"
02 STO 00
03 STO+ 00
04 STO 01
05 2
06 SQRT
07 ÷
08 IP
09 LASTX
10 FP
11 X!=0?
12 SIGN
13 +
14 X^2
15 X<> 01
16 RCL- ST L
17 0
18 >LBL 00
19 RCL 00
20 RCL- ST Z
21 RCL× ST Z
22 SQRT
23 IP
24 LASTX
25 FRAC
26 X!=00?
27 SIGN
28 +
29 DSE ST Y
30 GTO 00
31 RCL+ ST X
32 RCL+ 01
33 4
34 ×
35 END
#22

I haven't written a program for the 41 since 1987 when I got my 28C, but I can't pass up a good contest. I dusted off my calculator and manual and here's what I came up with:

01 LBL"PC   Program Contest
02 RDN drop center values
03 RDN
04 STO 00 radius -> 00
04 STO 01 loop index -> 01
06 X^2
07 STO 02 r^2 -> 02 for faster loop
08 0 sum on stack
09 LBL 00
10 RCL 02 r^2
11 RCL 01 r^2, i+1
12 1 r^2, i+1, 1
13 - r^2, i
14 X^2 r^2, i^2
15 - r^2-i^2
16 SQRT sqrt(r^2-i^2)
17 INT 17-24 is my CEIL function
18 LASTX
19 FRC
20 CHS
21 SIGN
22 X>0?
23 CLX
24 - end of CEIL function
25 + add to running total
26 DSE 01
27 GTO 00
28 4 times 4 quadrants
29 *
30 END

(Surely there's a better way to do the CEILING function. ???)

This program would have been even more interesting if we did not assume that the circle fit on the display panel. Also, about the idea of only counting pixels that are at least 50% covered, is there an easy way to do that without integrals?

-Wes L


#23

Quote:
(Surely there's a better way to do the CEILING function. ???)

At least there's another option (as well as that using SIGN mentioned in other posts). ;-)

The '41 features a MOD function, so a true CEIL function can be setup like this:
  ENTER
ENTER
-1
MOD
-
That -1 may be stored somewhere beforehand to increase execution speed.

Once again, tricks like these are not required on the 35s since it features an INTG function (equivalent to FLOOR elsewhere):

CEIL(x)  =  -INTG(-x).

  +/-
INTG
+/-
In our case, instead of adding CEIL(x) we can simply subtract INTG(-x):
  ...
+/-
INTG
-
...

Dieter

#24

Another RPL solution :

« DROP2 -> r '4*Sum(n=0,r-1,CEIL(Sqrt(1-(n/r)^2)*r))' »
DROP2 is just to ignore x and y. Sum and Sqrt are the special symbols for

Very simple but takes ~ 75s. for 5000 radius in aprox mode


Edited: 28 Sept 2011, 5:49 p.m.


#25

Salut Gilles,

A better solution :

« DROP2 -> r 
« r 2 SQRT / CEIL -> p '4*Sum(k=p,r-1,p*p+2*CEIL(Sqrt(1-(k/r)^2)*r))'
»
»

This version spare execution time for large radious r.
For explanations see posts HHC 2011 Programming Pulzze - Spoiler -

#26

Quote:
  • I couldn't identify an explicit formula for the result. Such a formula will almost certainly be fastest.

  • I've found this approximation empirically from my results:
    n ~ r*(pi*r + 4)

    Gerson.


    Edited: 26 Sept 2011, 8:37 a.m.


    #27

    I also figured out a couple of approximations but they aren't good enough :-(

    On the 34S using integrate results in a very fast and approximate answer:

    001  [cmplx]DROP
    002 x^2
    003 STO 00
    004 0
    005 RCL L
    006 [integrate] 00
    007 4
    008 *
    009 RTN
    010 LBL 00
    011 x^2
    012 RCL- 00
    013 +/-
    014 [sqrt]
    015 CEIL

    An equivalent program on the 15C LE caused the integrator to run for ages -- I gave up after quite a few minutes.


    - Pauli

    #28

    Here's mine. I hope there are no other HP-12C program so I can win in the fastest HP-12C program category :-)

    Cheers,

    Gerson.


    #29

    40 bytes for Hp 42S (or HP 41) and 2 registers, rouding ok

    01 * LBL "Z"
    02 RCL ST Z
    03 STO 00
    04 STO 01
    05 STOx 01
    06 DSE 00
    07 GTO 00 ***
    08 GTO 01 ***
    09 *LBL 00
    10 RCL 01
    11 RCL 00
    12 X^2
    13 -
    14 SQRT
    15 INT
    16 LAST X
    17 FRC
    18 X#0? new rounding
    19 SIGN
    20 +
    21 +
    22 DSE 00
    23 GTO 00
    24 *LBL 01 ***
    25 4
    26 x
    27 END
    Steps with *** can be removed for r > 1

    Less register for 42 bytes, HP 42 or 41, rounding ok

    01 * LBL "Z"
    02 RCL ST Z
    03 ENTER
    04 ENTER
    05 STO 00
    06 STOx 00
    07 DSE ST Y
    08 GTO 00 ***
    09 GTO 01 ***
    10 *LBL 00
    11 RCL 00
    12 RCL ST Z
    13 X^2
    14 -
    15 SQRT
    16 INT
    17 LAST X
    18 FRC
    19 X#0? new rounding
    20 SIGN
    21 +
    22 +
    23 DSE ST Y
    24 GTO 00
    25 *LBL 01 ***
    26 4
    27 x
    28 END

    Only stack !!! only for HP 42 and 44 bytes (if we count a reg as 7 bytes it's a winner: now the rounding is correct :) )

    01 * LBL "Z"
    02 Rv
    03 Rv
    04 ENTER
    05 ENTER
    06 ENTER
    07 STOx ST Z
    08 DSE ST Y
    09 GTO 00 ***
    10 GTO 01 ***
    11 *LBL 00
    12 RCL ST Y
    13 RCLx ST Z
    14 RCL- ST T
    15 +/-
    16 SQRT
    17 +
    18 IP
    19 LAST X
    20 FP
    21 X#0? new rounding
    22 SIGN
    23 +
    24 DSE ST Y
    25 GTO 00
    26 *LBL 01 ***
    27 4
    28 x
    29 END

    ( FIX 00 RND doesn't work :( need to work a bit more harder.
    But I have no stack left ... see later )

    Rounding for the 2 first programs is ok now :)

    And for the third also :)

    Olivier


    Edited: 26 Sept 2011, 3:21 p.m. after one or more responses were posted


    #30

    The first two versions will not work correctly. The CEIL function here was implemented by adding 0,5 and rounding to the next integer. This is basically the same as a simple 1  +  INT. That's why the routine causes as error if the argument of the CEIL function is an integer. For instance, for r = 5 there are two points where the sqrt is exactly 3 resp. 4. At that point adding 0,5 will cause the value to get rounded up to 4 resp. 5 (instead of leaving it at 3 resp. 4). So the returned result is not 88 (correct), but 96 instead.

    Dieter


    #31

    You were right, I see the bug just after posting.

    The correction is now done for the three programs :)

    Olivier

    #32

    The basic approach is similar to my suggestion for the '41, so both can be improved this way: Instead of starting the loop with x = r - 1 (which is done by the first DSE) let's begin with x = r. This will not change the result since the first loop will add sqrt(r^2 - r^2), i.e. a plain zero. But it can handle the case r = 1 without any additional code, thus saving four steps, while on the other hand one more loop has no perceivable effect on execution time. It even handles r = 0 correctly. ;-)

    Here are two versions of this algorithm. It's very basic and for sure not optimized for speed, but I like it for its compact code and readability. Especially the 35s version. ;-)

      HP-41/42                    HP35s
     
    01 LBL "HHC" H001 LBL H
    02 STO 00 H002 STO R
    03 x^2 H003 STOx R
    04 STO Z (42s: STO ST Z) H004 STO X
    05 LastX H005 RCL R
    06 LBL 01 H006 RCL X
    07 R^ H007 x^2
    08 RCL 00 H008 -
    09 x^2 H009 sqrt
    10 - H010 +/-
    11 SQRT H011 INTG
    12 ENTER^ H012 -
    13 FRC H013 DSE X
    14 X#0? H014 GTO H005
    15 SIGN H015 4
    16 + H016 x
    17 INT H017 RTN
    18 +
    19 DSE 00
    20 GTO 01
    21 4
    22 *
    23 END
    R is assumed in to be entered X. Add 2x RDN if you want to do it according to the original rules.

    Dieter

    #33

    Here's the second version of my 12C program. No attempt to optimize it for least numbered registers usage has been made. Actually, I should be doing my homework instead :-)

    -----------------------------------------------

    Keystrokes |Display |
    [f][P/R] | |
    [f]CLEAR[PRGM] |00- |
    [ENTER] |01- 36 |
    [STO]0 |02- 44 0 |
    [ENTER] |03- 36 |
    [ENTER] |04- 36 |
    [x] |05- 20 |
    [STO]1 |06- 44 1 |
    [CLx] |07- 35 |
    [STO]2 |08- 44 2 |
    [RDN] |09- 33 |
    2 |10- 2 |
    [g][SQRT] |11- 43 21 |
    [/] |12- 10 |
    [-] |13- 30 |
    [g][INTG] |14- 43 25 |
    [-] |15- 30 |
    [STO]4 |16- 44 4 |
    [x<>y] |17- 34 |
    [STO]3 |18- 44 3 |
    [g][x<=y] |19- 43 34 |
    [g][GTO]36 |20- 43,33 36 |
    1 |21- 1 |
    [STO][-]3 |22- 44 30 3 |
    [RCL]0 |23- 45 0 |
    [RCL]1 |24- 45 1 |
    [RCL]3 |25- 45 3 |
    [ENTER] |26- 36 |
    [x] |27- 20 |
    [-] |28- 30 |
    [g][SQRT] |29- 43 21 |
    [-] |30- 30 |
    [g][INTG] |31- 43 25 |
    [STO][-]2 |32- 44 30 2 |
    [RCL]4 |33- 45 4 |
    [RCL]3 |34- 45 3 |
    [g][GTO]19 |35- 43,33 19 |
    [RCL]2 |36- 45 2 |
    [ENTER] |37- 36 |
    [+] |38- 40 |
    [RCL]1 |39- 45 1 |
    [+] |40- 40 |
    [RCL]4 |41- 45 4 |
    [RCL]0 |42- 45 0 |
    [-] |43- 30 |
    [ENTER] |44- 36 |
    [x] |45- 20 |
    [+] |46- 40 |
    4 |47- 4 |
    [x] |48- 20 |
    [g][GTO]00 |49- 43,33 00 |
    [f][P/R] | |

    1 R/S -> 4 ( - )
    5 R/S -> 88 ( - )
    540 R/S -> 918,168 ( 3.1 s)
    5000 R/S -> 78,559,640 ( 25.5 s)

    (timings on the 12C+, firmware date 2008-06-28)

    -----------------------------------------------

    5 CLS
    10 INPUT R
    15 R2 = R * R
    20 L = R - INT(R * (1 - 1 / SQR(2)))
    30 I = R: S = 0
    35 IF I <= L THEN 70
    40 I = I - 1
    50 S = S - INT(R - SQR(R2 - I * I))
    60 GOTO 35
    70 N = 4 * ((L - R) ^ 2 + R * R + 2 * S)
    80 PRINT R, N
    90 GOTO 10

    -----------------------------------------------


    #34

    HP-41CX, HP-42S and wp-34s versions

    ------------------------------------------------------------

    HP-41CX:

    01>LBL'CR
    02 STO 00
    03 ENTER^
    04 ENTER^
    05 ENTER^
    06 2
    07 SQRT
    08 /
    09 -
    10 INT
    11 STO 01
    12 STO 02
    13 CLX
    14 STO 03
    15 CLX
    16 RCL Y
    17 *
    18>LBL 00
    19 DSE 00
    20 RCL X
    21 RCL 00
    22 X^2
    23 -
    24 SQRT
    25 RCL Z
    26 -
    27 IP
    28 STO+ 03
    29 RDN
    30 DSE 01
    31 GTO 00
    32 RCL 03
    33 STO+ X
    34 +
    35 RCL 02
    36 X^2
    37 +
    38 4
    39 *
    40 .END.

    1 XEQ CR -> 4 ( 1.7 s )
    5 R/S -> 88 ( 1.9 s )
    540 R/S -> 918,168 ( 1 min 18.7 s )
    5000 R/S -> 78,559,640 ( 12 min 12.1 s )

    ------------------------------------------------------------

    HP-42S:

    00 { 57-Byte Prgm }
    01>LBL "CR"
    02 STO 00
    03 ENTER
    04 ENTER
    05 ENTER
    06 2
    07 SQRT
    08 ÷
    09 -
    10 IP
    11 STO 01
    12 STO 02
    13 CLX
    14 STO 03
    15 RCL+ ST Y
    16 ×
    17>LBL 00
    18 DSE 00
    19 RCL 00
    20 X^2
    21 RCL- ST Y
    22 +/-
    23 SQRT
    24 RCL- ST Z
    25 IP
    26 STO+ 03
    27 Rv
    28 DSE 01
    29 GTO 00
    30 RCL 03
    31 STO+ ST X
    32 +
    33 RCL 02
    34 X^2
    35 +
    36 4
    37 ×
    38 X<0?
    39 +/-
    40 .END.

    1 XEQ CR -> 4 ( 1.0 s )
    5 R/S -> 88 ( 1.1 s )
    540 R/S -> 918,168 ( 45.6 s )
    5000 R/S -> 78,559,640 ( 6 min 58.3 s )

    ------------------------------------------------------------

    wp-34s:

    001 LBL C
    002 STO 00
    003 FILL
    004 2
    005 SQRT
    006 /
    007 -
    008 IP
    009 STO 01
    010 STO 02
    011 CLx
    012 STO 03
    013 RCL+ Y
    014 ×
    015 DEC 00
    016 RCL 00
    017 x^2
    018 RCL- Y
    019 +/-
    020 SQRT
    021 RCL- ST Z
    022 IP
    023 STO+ 03
    024 Rv
    025 DSE 01
    026 BACK 11
    027 RCL 03
    028 STO+ X
    029 +
    030 RCL 02
    031 x^2
    032 +
    033 4
    034 ×
    035 RTN

    1 XEQ C -> 4 ( - )
    5 R/S -> 88 ( - )
    540 R/S -> 918,168 ( 1.2 s )
    5000 R/S -> 78,559,640 ( 8.7 s )
    ------------------------------------------------------------


    #35

    42S on iPhone 3.1.3 brings:

    5000 -> 78,559,640 -> 0.5 sec

    ;)


    Edited: 29 Sept 2011, 2:03 a.m.

    #36

    41C 37 bytes
    r= 100 8.75s 31'796
    r=5000 6m 58.66s 78'559'640

    *LBL"PX"
    RDN
    RDN
    STO 01
    CLST
    *LBL 02 X Y Z T
    INT n x s
    ST+ Z
    SIGN
    + x s
    RCL 01
    ST+ X 2r x s
    RCL Y x 2r x s
    -
    LASTX
    *
    SQRT
    CHS
    RCL 01
    +
    X>Y? can never be equal
    GTO 02
    SIGN 1 x s s
    - x-1 s s s
    X^2 x-1^2
    -
    + 2*s-(x-1)^2
    RCL 01
    X^2
    -
    -4
    *
    END

    I move the center of the circle to (r,r) and consider only the quadrant where x,y <= r.
    Then a pixel (x,y) is *dark* when it is below the arc, or:

    y <= r - sqrt(x*(2*r-x))

    The number of dark pixels below the arc for a given x is:

    n(x) = INT(r - sqrt(x*(2*r-x)))

    And I'm finally rid of that pesky CEIL ;-)
    I determine s = sum of all n(x) till n<x, then the total number of dark pixels in the quadrant is:

    2*s - (x-1)^2

    Well Gene, that was a good one. Endless fun.

    Cheers, Werner


    #37

    Very nice idea to count the non lit pixels ...

    PS: Werner, you have a strange 41C, timing your program on my 41CX give me 13.6 sec for 100, 1min10 for 540 and 10min52 for 5000 ??

    Some variation

    HP41

    {54 BYTES}
    01>LBL'V
    02 STO 00
    03 STO 01
    04 ST+ 01
    05 RCL 00
    06 2
    07 SQRT
    08 /
    09 -
    10 INT add after 10: X=0?
    11 RCL 00 SF 00
    12 X^2
    13 RCL Y
    14 X^2
    15 +
    16 2
    17 / add after 17: FS?C 00
    18>LBL 00 GTO 01
    19 RCL 01
    20 RCL Z
    21 -
    22 LASTX
    23 *
    24 SQRT
    25 RCL 00
    26 -
    27 INT
    28 +
    29 DSE Y
    30 GTO 00 add after 30: LBL 01
    31 8 to have a good result for R=1,2,3
    32 *
    33 END

    a bit faster : 540 in 1min02, 5000 in 9min26

    For HP42, not timed but the inner loop is shorter :)

    00 {47 BYTES}
    01>LBL'V
    02 STO 00
    03 ST+ ST X
    04 RCL 00
    05 RCL 00
    06 2
    07 SQRT
    08 /
    09 -
    10 INT
    11 RCL 00
    12 X^2
    13 RCL Y
    14 X^2
    15 +
    16 2
    17 /
    18>LBL 00
    19 RCL ST Z
    20 RCL- ST Z
    21 RCL* ST Z
    22 SQRT
    23 RCL- 00
    24 INT
    25 +
    26 DSE ST Y
    27 GTO 00
    28 8
    29 *
    30 END

    apply the same correction to get good result for R=1,2,3


    Edited: 29 Sept 2011, 4:49 p.m.


    #38

    My CV died a few years ago, so I rely on the excellent simulator found on TOS. I had no idea the timings were off, though.

    Cheers, Werner

    #39

    Quote:
    wp-34s:

    001 LBL C
    002 STO 00
    003 FILL
    004 2
    005 SQRT
    006 /
    007 -
    008 IP
    009 STO 01
    010 STO 02
    011 CLx
    012 STO 03
    013 RCL+ Y
    014 ×
    015 DEC 00
    016 RCL 00
    017 x^2
    018 RCL- Y
    019 +/-
    020 SQRT
    021 RCL- ST Z
    022 IP
    023 STO+ 03
    024 Rv
    025 DSE 01
    026 BACK 11
    027 RCL 03
    028 STO+ X
    029 +
    030 RCL 02
    031 x^2
    032 +
    033 4
    034 ×
    035 RTN

    1 XEQ C -> 4 ( - )
    5 R/S -> 88 ( - )
    540 R/S -> 918,168 ( 1.2 s )
    5000 R/S -> 78,559,640 ( 8.7 s )
    ------------------------------------------------------------


    Same idea, but less steps, and all stack (and I think a bit faster):
    001 RCL Z
    002 FILL
    003 STO+ Z
    004 2
    005 SQRT
    006 /
    007 CEIL
    008 STO Z
    009 STOx Z
    010 -
    011 RCL T
    012 RCL- Y
    013 RCLx Y
    014 SQRT
    015 CEIL
    016 STO+ Z
    017 STO+ Z
    018 DROP
    019 DSE X
    020 BACK 09
    021 4
    022 RCLx Z
    Instead of using y=sqrt(r^2-x^2), I moved the center of the circle to 0,r and used y=sqrt(x*(2r-x)). This made it easier to use DSE to catch the edge between the y-axis and the r/sqrt(2) square. I have not read all the other solutions, so this may have been covered.

    #40

    Quote:
    Same idea, but less steps, and all stack (and I think a bit faster)

    Very nice! It's a bit faster indeed:

       540    R/S ->     918,168       (* 1.1 s )
    5000 R/S -> 78,559,640 (* 8.2 s )

    (*) with a chronometer

    Or, using the built-in TICKS command, 8 and 71 ticks, respectively, which multiplied by my unit's correction factor 1/9 gives 0.9 s and 7.9 s. Mine takes 8 and 75 ticks, respectively. A built-in quartz would be handy.

    001 LBL A
    002 TICKS
    003 STO 00
    004 x<>y
    005 XEQ C (your program)
    006 TICKS
    007 RCL- 00
    008 x<>y
    009 RTN

    Quote:
    I have not read all the other solutions,

    Same here, except for Werner's solution, from which I borrowed the idea of checking r*(1 - 1/sqrt(2)) columns instead of r/sqrt(2). I wrote my first draft as soon as I saw Gene's post and stuck to my first idea. I've computed the dark pixels too, so I never needed the CEIL function.

    Gerson.

    Edited: 29 Sept 2011, 9:30 p.m.


    #41

    Quote:
    Or, using the built-in TICKS command, 8 and 71 ticks, respectively, which multiplied by my unit's correction factor 1/9 gives 0.9 s and 7.9 s

    On a crystal unit, it would be 7.1 s.

    TICKS remains constant regardless of the correction factor.
    The time, however, doesn't unless a crystal is installed.


    - Pauli


    #42

    I tried to create a BASE 10 version, but got some errors, perhaps it is the SQRT bug you mentioned. Perhaps you can give it a try. It should run in about 1s.

    Edited: 29 Sept 2011, 9:44 p.m.


    #43

    I've not reflashed my 34S for ages so it still has the sqrt bug.

    The bug, when it occurs, gives a result 1 too high and only when carry is set. So a sequence that checks carry and if it is set squares the number and compares against the original...

    Or reflash with the latest from the sourceforge site and get a working integer SQRT.


    - Pauli

    #44

    Quote:
    TICKS remains constant regardless of the correction factor. The time, however, doesn't unless a crystal is installed.

    RTFS (Read The Fine Sources): In a non crystal equipped calculator, the PLL runs at a higher multiplication factor and the interrupts are scaled up slightly to compensate for a slower R/C clock:

    if ( Xtal ) {
    /*
    * Schedule the next one so that there are 50 calls in 5 seconds
    * We need to skip 3 ticks in 128 cycles
    */
    if ( Heartbeats == 40 || Heartbeats == 81 || Heartbeats == 122 ) {
    UserHeartbeatCountDown = 6;
    }
    else {
    UserHeartbeatCountDown = 5;
    }
    }
    else {
    /*
    * Without a crystal a less accurate timing is good enough
    */
    UserHeartbeatCountDown = Heartbeats & 1 ? 4 : 5;
    }
    The code is executed from the LCD refresh interrupt occurring every 640 slow clock cycles (roughly 50 Hz). TICKS are counted when the UserHeartbeatCountDown value reaches zero. In a R/C environment, the frequency is assumed to be about 15% slower.

    Why all this? The slow clock outputs a 32KHz nominal clock but the rate is typically 15 to 20% less then that without the crystal. The PLL multiplies this value to derive the processor clock. I use higher values for the PLL when no crystal is installed. The periodic interrupt is not controlled by the PLL but directly by the 32KHz oscillator through the LCD controller and needs a separate correction algorithm.

    In short: The TICKS results should be similar with or without a crystal but they typically don't exactly match. Also the resulting execution speed should be roughly the same between devices with or without the crystal.

    #45

    Quote:
    ... dark pixels ...

    Sounds ominous. :-) Kinda like dark matter--you cannot see it, but you can measure it, or so it is argued.
    #46

    This is my third (and last) attempt. More steps, one register and slightly slower than yours.
    Center of the circle at (0,r) and y=sqrt[(r+(r+x))*(r-(r-x))], with no simplication.

    001 LBL C
    002 FILL
    003 2
    004 SQRT
    005 /
    006 CEIL
    007 STO 00
    008 -
    009 0
    010 Rv
    011 Rv
    012 RCL- T
    013 RCL+ Y
    014 RCL* T
    015 SQRT
    016 CEIL
    017 STO+ Z
    018 DROP
    019 ENTER
    020 DSE T
    021 BACK 09
    022 RCL 00
    023 x^2
    024 R^
    025 STO+ X
    026 +
    027 4
    028 *
    029 RTN

    1 XEQ C -> 4 ( - )
    5 R/S -> 88 ( - )
    540 R/S -> 918,168 ( 1.1 s *)
    5000 R/S -> 78,559,640 ( 8.3 s *)

    (*) Or, using TICKS, 0.9 s and 8.0 seconds (8 and 72 TICKS respectively)


    #47

    Quote:
    my third (and last) attempt

    Next to last, I mean :-)

    Nothing but the stack. Well, it looks like Egan's, only somewhat less efficient. Currently I cannot experiment with CLx (which perhaps could be useful inside the loop) because it does not disable stack lift (outdated firmware here).

     


    001 LBL C
    002 FILL
    003 2
    004 SQRT
    005 /
    006 CEIL
    007 -
    008 RCL L
    009 x^2
    010 Rv
    011 Rv
    012 RCL- T
    013 RCL+ Y
    014 RCL* T
    015 SQRT
    016 CEIL
    017 STO+ X
    018 STO+ Z
    019 DROP
    020 ENTER
    021 DSE T
    022 BACK 10
    023 RCL Z
    024 4
    025 *
    026 RTN

    8 and 73 TICKS for r=540 and r=5000, respectively.

    Thanks, Egan, for the free RPN programming lesson!


    #48

    Quote:
    Next to last, I mean :-)

    Antepenultimate attempt, that was!

    A little trick to save a sum inside the loop (steps 10, 11, then final multiplication by 8 instead of 4). It's possible this has already been used, however.

    001 LBL C
    002 FILL
    003 2
    004 SQRT
    005 /
    006 CEIL
    007 -
    008 RCL L
    009 x^2
    010 2
    011 /
    012 Rv
    013 Rv
    014 RCL- T
    015 RCL+ Y
    016 RCL* T
    017 SQRT
    018 CEIL
    019 STO+ Z
    020 DROP
    021 ENTER
    022 DSE T
    023 BACK 09
    024 8
    025 RCL* T
    026 RTN

    Now 8 and 72 TICKS for r=540 and r=5000, respectively.


    #49

    Stack only but for a real HP made by HP (software too) ;)

    00 {48 BYTES}
    01>LBL'Y
    02 RCL ST X
    03 RCL ST X
    04 2
    05 SQRT
    06 /
    07 -
    08 IP
    X=0?
    SF 00
    09 ENTER
    10 X^2
    11 R^
    12 X^2
    13 +
    14 2
    15 /
    FS?C 00
    GTO 01
    16>LBL 00
    17 RCL ST Z
    18 RCL+ ST T
    19 RCL- ST Z
    20 RCL* ST Z
    21 SQRT
    22 RCL- ST T
    23 IP
    24 +
    25 DSE ST Y
    26 GTO 00
    LBL 01
    27 8
    28 *
    29 END

    non working for R=1,2,3 correction in steps not numbered (add some step, but don't slow the inner loop)

    The trick was not to use CEIL but INT (see other posts, counting 'black' dots)


    Edited: 4 Oct 2011, 2:58 a.m.


    #50

    some remarks:

    1.replace

     RCL ST X
    RCL ST X
    by
     ENTER
    STO ST Z
    saving a byte

    2.you need a CF 00 instruction at the beginning, of course, bringing the byte count to 10 to cater for the exceptions.

    3.replace

     RCL ST Z
    RCL+ ST T
    with (shorter by one byte):
     RCL ST Z
    STO+ ST X

    Here's my 41C version, with only 8 bytes overhead (but a bit more stack handling as the 41 does not have register recall arithmetic)

    *LBL"PIXELS"
    ENTER
    STO Z r r r
    2
    SQRT
    /
    -
    INT x r r r
    ENTER
    X^2
    RUP
    X^2
    +
    2
    / s x r r
    X<>Y x s r r
    X=0?
    GTO 00
    STO T
    RDN
    L X Y Z T
    *LBL 02 s r x
    RCL Y r s r x
    ST+ X 2r s r x
    RUP x 2r s r
    ST- Y x 2r-x s r
    ST* Y x x(2r-x) s r
    X<>Y x(2r-x) x s r
    SQRT
    RUP
    ST- Y r -n x s
    X<>Y
    INT [-n] r x s
    RUP
    +
    DSE Z
    GTO 02
    0
    *LBL 00
    +
    8
    *
    END
    Cheers, Werner

    #51

    Thanks for the optimization, but no need for CF 00, you start the program in a default state (see the puzzle post) so it's useless ;)

    Olivier

    #52

    RPL stack only (HP-50g):

    %%HP: T(3)A(D)F(,);
    \<< DUP DUP 2, \v/ / CEIL - DUP2 - SQ 2, / UNROT
    WHILE DUP
    REPEAT DUP2 - PICK3 SWAP + LASTARG - * \v/ CEIL 4, ROLL + UNROT 1, -
    END DROP2 8, *
    \>>

    22.76 seconds, for r=5000 (timed with TEVAL)

    It uses the same formula in my last wp34s program above, probably not the best one in this case, both speedwise and sizewise.

    Quote:
    non working for R=1,2,3 correction in steps not numbered (add some step, but don't slow the inner loop)

    In order to fix the same problem in the first version of the RPL program above, I simply moved the test to the beginning of the loop, by replacing DO UNTIL with WHILE REPEAT.

    "Endless fun", as Werner already said.

    Gerson.


    #53

    I have been fooling around with the challenge since Sept. 25. I think I have reached the point of diminishing returns, or at least should stop spending time on it. So I'll present some results, maybe that will stop the obsessive tweaking.

    Using the inscribed square reduction depicted here (which I did develop on my own on Sept. 26 before looking at any of the solutions), the following listing is about the best I can come up with for wp34s. It returns 78,528,204 for a radius of 4999 in about 8.20 seconds.

    001	LBL B		
    002 RCL Z Recall stack Z, since radius is there per rules
    003 ENTER Get another copy of the radius on the stack
    004 STO 08 Store radius for future use
    005 STO + 08 double the stored value of the radius for future use
    006 x^2 square the radius
    007 2 enter 2 to compute area of square inscribed in a quarter circle
    008 / divide 2 into radius squared to determine area of square inscribed in quarter circle
    009 SQRT take square root to determine side length of square inscribed in quarter circle
    010 CEIL go up to next whole pixel
    011 STO 06 store side length in register that will accumulate area
    012 STO x 06 multiply side length by stored side length to compute area of inscribed square in whole pixels
    013 - subtract size of square from radius to determine how many remaining columns must be summed
    014 STO I store in register I for looping to compute height of remaining columns
    015 RCL 8 Recall twice the radius
    016 RCL - I Recall index and subtract from twice the radius
    017 RCL x I Recall I, multiply times above to form 2xRadiusxIndex-Index^2
    018 SQRT square root to compute square root of 2xRadiusxIndex-Index^2
    019 CEIL compute height of row in full pixels if any part is touched by a circle of the given radius
    020 STO + 06 add to summation of area
    021 STO + 06 add again for area of upper wedge
    022 DSE I decrement index, skip when less than zero
    023 BACK 8 loop back if more columns
    024 4 enter 4
    025 RCL x 06 multiply times 4 to compute the area of the full circle in whole
    pixels if any part of a pixel is within the circle of the given radius
    026 STOP done

    Summing the area in the stack appears to be a popular option, so I created the listing below. It is one step longer, and also seems to run ever-so-slightly slower, returning the result for 4999 in about 8.35 seconds.

    001	LBL B		
    002 RCL Z Recall stack Z, since radius is there per rules
    003 ENTER Get another copy of the radius on the stack
    004 STO 08 Store radius for future use
    005 STO + 08 double the stored value of the radius for future use
    006 x^2 square the radius
    007 2 enter 2 to compute area of inscribed square
    008 / divide 2 into radius squared to determine area of square inscribed in quarter circle
    009 SQRT take square root to determine side length of square inscribed in quarter circle
    010 CEIL go up to next whole pixel
    011 x^2 square side length to compute area of inscribed square in whole pixels
    012 x<>y swap to get radius
    013 RCL - L subtract size of square from radius to determine how many remaining columns must be summed
    014 STO I store in register I for looping to compute height of remaining columns
    015 x<>y swap to get area of square to stack X for area sum start
    016 RCL 8 Recall twice the radius
    017 RCL - I Recall index and subtract from twice the radius, forms 2xRadius-Index
    018 RCL x I Recall I, multiply times above to form 2xRadiusxIndex-Index^2
    019 SQRT square root to compute square root of 2xRadiusxIndex-Index^2
    020 CEIL compute height of row in full pixels if any part is touched by a circle of the given radius
    021 + add to summation of column heights
    022 RCL + L add again for area of upper wedge
    023 DSE I decrement index, skip when less than zero
    024 BACK 08 loop back if more columns
    025 4 enter 4
    026 x multiply times 4 to compute the area of the full circle in whole
    pixels if any part of a pixel is within the circle of the given radius
    027 STOP done

    Observations - I don't claim the above to be the best possible, just likely the best I can do with reasonable effort.

    I have of course also worked on versions for the 15c LE, the latest of which returns the answer for 4999 in about 19 seconds. I may have to keep working on that one...

    ...

    Edited: 6 Oct 2011, 8:42 a.m. after one or more responses were posted


    #54

    I posted this a while back, but it has be archived:

    001 RCL Z
    002 FILL
    003 STO+ Z
    004 2
    005 SQRT
    006 /
    007 CEIL
    008 STO Z
    009 STOx Z
    010 -
    011 RCL T
    012 RCL- Y
    013 RCLx Y
    014 SQRT
    015 CEIL
    016 STO+ Z
    017 STO+ Z
    018 DROP
    019 DSE X
    020 BACK 09
    021 4
    022 RCLx Z
    Gerson has been building on it and timing it, but I think it runs in less than 8 s for r=5000. To make the programing smaller and to also deal with small r I moved the circle origin to r,0 and then used y=sqrt(x*(2r-x)). That with a bit of gymstacktics help reduce the number of instructions and the use of x^2. I would like to get it down to 1 sec with the use of integer mode, however my firmware still has the isqrt bug and I cannot flash (tried 3 different machines and 3 different OSes) until I get a machine with a real serial port.

    Edited: 7 Oct 2011, 10:12 a.m.: Changed 0,r to r,0 in description

    Edited: 7 Oct 2011, 10:12 a.m. after one or more responses were posted


    #55

    Quote:
    I posted this a while back, but it has be archived:

    I recall seeing it, but did not study it closely. Not much fun to use somebody else's idea(s) to improve your own program, after all.

    Quote:
    Gerson has been building on it and timing it, but I think it runs in less than 8 s for r=5000

    I get about 8.2 seconds, which is virtually the same as my versions. Not too surprising, since even though your method is more elegant, it still requires the same number of loops.

    Quote:
    I moved the circle origin to 0,r and then used y=sqrt(x*(2r-x)).

    Are you sure that you did not move the origin to r,0? Either way, a clever approach.

    Quote:
    I would like to get it down to 1 sec with the use of integer mode,

    I am running v2.2 1674. In BASE10 integer mode, when I take the square root of a non-perfect square, it returns the square root of the largest perfect square less than the original argument. Is this the behavior you are after? I tried a kludgy modification to your code to use this, and it returns 78,528,204 for a radius of 4999 in about 1.9 seconds. Unfortunately, it also returns the wrong answers for radii that produce perfect squares at certain points in the process, so it needs work. But it is fast!

    ...

    Edited: 6 Oct 2011, 8:55 a.m. after one or more responses were posted


    #56

    Quote:
    Unfortunately, it also returns the wrong answers for radii that produce perfect squares at certain points in the process, so it needs work. But it is fast!

    Try checking the carry flag (C). This is cleared if the square root was of a perfect square and set otherwise -- a very handy feature of integer mode.


    - Pauli


    #57

    I figured there must be a feature to make this work. I'll give it a try.


    #58

    Change:

    SQRT
    CEIL
    to:
    SQRT
    FS? C
    INC X

    #59

    I have a useful modification in mind: CEIL in integer mode should add the carry and clear it thereafter. Pauli, any ideas?


    #60

    An interesting idea. CEIL is already defined in integer mode but doesn't do anything very interesting.

    I'm not sure CEIL is the right command for this. A pair of add carry and subtract carry instructions would be more general and meaningful I think.

    We'd really do best by having add with carry and subtract with borrow instructions.


    - Pauli


    #61

    My idea was to make CEIL behaving similarly in DECM and integer modes. If all functions that would normally return a non integer result (such as 1 ENTER 2 /) set the carry, CEIL would be working the same in both environments.

    ADC and SBB seem natural for integer mode. I will have to learn ARM assembly to port this stuff to native ARM code.


    #62

    Quote:
    My idea was to make CEIL behaving similarly in DECM and integer modes. If all functions that would normally return a non integer result (such as 1 ENTER 2 /) set the carry, CEIL would be working the same in both environments.

    I believe most operations that can return a fractional result do set carry appropriately. If any don't (and the 16C did), please let me know.

    The problem here is that there are plenty of other ways to set it. Addition and subtraction set it for overflow & borrow and shifts and rotates set it as well.

    As things stand CEIL does work the same in both integer and floating point modes -- it rounds up to the nearest integer. In integer mode this means the number doesn't change which is correct.

    Quote:
    ADC and SBB seem natural for integer mode. I will have to learn ARM assembly to port this stuff to native ARM code.

    Do this and we'll save more than a bit of space in the integer support -- simulating the CPU flags in C is a huge consumer of code space. My ARM assembly is passable but not great.

    - Pauli

    Edited: 6 Oct 2011, 3:47 a.m.

    #63

    Yes, that is what I did for the second use of SQRT. For the first, I think you need to change:

    2
    SQRT
    /
    CEIL

    to:

    RCLx X (or x^2)
    2
    /
    SQRT
    INC X

    Your program, with the above changes, now looks like this:

    001 BASE 10
    002 RCL Z
    003 FILL
    004 STO+ Z
    005 RCLx X
    006 2
    007 /
    008 SQRT
    009 INC X
    010 STO Z
    011 STOx Z
    012 -
    013 RCL T
    014 RCL- Y
    015 RCLx Y
    016 SQRT
    017 FS? C
    018 INC X
    019 STO+ Z
    020 STO+ Z
    021 DROP
    022 DSE X
    023 BACK 10
    024 4
    025 RCLx Z
    026 DECM

    The above clocks in at right about 2 seconds for 4999.

    Edited: 6 Oct 2011, 8:07 a.m.


    #64

    If

    019 STO+ Z
    is changed to
    019 SL 1
    Does it help?

    Edited: 6 Oct 2011, 11:28 a.m.


    #65

    It is getting hard to say since I am timing things with my running watch, but it seems to get the time for 4999 down to about 1.9 seconds.

    ...

    #66

    Shifts are faster than addition or multiplication in integer mode. Dealing with carry and overflow consumes lots of cycles.

    The 2 / sequence could be replaced with SR 01 which is shorter and faster.


    - Pauli

    Edited: 6 Oct 2011, 5:49 p.m.


    #67

    Pauli,

    Although I would not expect a measurable improvement in execution time in this particular instance (since the 2 / sequence is executed only once), shorter is better so I made the change. But, since the SR 01 function divides by 2 in-situ, it messes with Egan’s clever gymstacktics, so the program returns incorrect answers if only that change is made. Some other change(s) would have to be made, for example following the SR 01 with ENTER DROP. Of course that takes more steps than are saved by the SR 01 instruction. Probably possible to use SR 01 and save the step, but I will leave that as an exercise for the reader.

    ...

    #68

    Quote:
    Are you sure that you did not move the origin to r,0? Either way, a clever approach.

    Typo, yes, I moved it to r,0.
    #69

    You can use the built-in TICKS command instead of a chronometer. The latter adds about a 200 ms delay, no matter how fast you think you are (most likely your program takes 8 seconds or less for r=5000 as well). For times in seconds, you have to find the correction factor unless your wp34s has a built-in crystal. Or simply compare the number of TICKS returned by program A to determine which program is faster.

    001 LBL C	001 LBL D
    002 FILL 002 FILL
    003 STO+ Z 003 STO+ Z
    004 2 004 2
    005 SQRT 005 SQRT
    006 / 006 /
    007 CEIL 007 CEIL
    008 STO Z 008 STO Z
    009 STOx Z 009 STO* Z
    010 - 010 -
    011 RCL T 011 .
    012 RCL- Y 012 5
    013 RCLx Y 013 STO* Z
    014 SQRT 014 DROP
    015 CEIL 015 RCL T
    016 STO+ Z 016 RCL- Y
    017 STO+ Z 017 RCL* Y
    018 DROP 018 SQRT
    019 DSE X 019 CEIL
    020 BACK 09 020 STO+ Z
    021 4 021 DROP
    022 RCLx Z 022 DSE X
    023 RTN 023 BACK 08
    024 8
    025 RCL* Z
    026 RTN

    027 LBL A
    028 TICKS
    029 STO 10
    030 X<>Y
    031 XEQ B
    032 TICKS
    033 RCL- 10
    034 RTN

    TICKS for 5000 XEQ A

    C: 71 (7.9 seconds )
    D: 70 (7.8 seconds )

    C is Egan Ford's original program, except for the first step, which has been suppressed.
    D is a slightly modified version of his program, no significant gain however.

    I wasn't able to time your programs because my wp34s lacks DSL (it needs a firmware update).

    Gerson.


    #70

    With a crystal installed, taking the difference of two TICKS calls should be fairly accurate.

    Without a crystal, the difference of two TICKS calls will still returns the same duration, it just won't necessarily match with real time so well.


    - Pauli


    #71

    Quote:
    Without a crystal, the difference of two TICKS calls will still returns the same duration, it just won't necessarily match with real time so well.

    The way it's implemented the two scenarios might differ. I tried to get TICKS right in either environment but the accuracy is dependent on the actual R/C frequency.

    Edited: 6 Oct 2011, 2:20 a.m.

    #72

    These are all great, and I thank all of you for your interest in this problem.

    I plan to write-up the winning entries from the conference as well as including the entries here. I will need a bit of time to key them in and time them with my sample input values for comparison to the conference results.

    For what it is worth, this contest was open to all RPN machines.

    I now have a solution running on the HP 67. :-)

    It is a bit slow...


    #73

    to a radius of 4999 in about 1.4 hours. :-)

    78,528,204.

    Edited: 27 Sept 2011, 12:28 p.m. after one or more responses were posted


    #74

    To reduce time, this one can be faster ...
    (I don't have my 97 at hand)

    001  STOI      *** I Index register
    002 STO0
    003 STO1 *** one register
    004 STx1 *** one register
    005 2
    006 SQRT
    007 /
    008 INT
    009 LSTX
    010 FRC
    011 X#0?
    012 ENT^
    013 X#0?
    014 /
    015 +
    016 X^2
    017 LSTX
    018 RCLI *** I Index register
    019 -
    020 STOI *** I Index register
    021 Rv *** R down
    022 *LBL0
    023 RCL1 *** one register
    024 RCL0
    025 RCLI *** I Index register
    026 +
    027 X^2
    028 -
    029 SQRT
    030 INT
    031 LSTX
    032 FRC
    033 X#0?
    034 ENT^
    035 X#0?
    036 /
    037 +
    038 ENT^
    039 +
    040 +
    041 ISZI
    042 GTO0
    043 4
    044 x

    on a 97 it should be around 50 minutes for 4999

    Just timed it it is more like 30 minutes :)
    (an hp41 is 12 minutes, so it's not too bad)

    (but a 32SII is 3 minutes with the same program ...)

    BUG on HP97: don't work for R < 3 : ISZI peculiarity : changes

    change old 21 by
    CF0
    X=0?
    SF0
    Rv
    F?0
    GTO1
    add before old 43
    LBL1

    it is not needed for HP32SII (use ISG instead of ISZ)


    Edited: 28 Sept 2011, 4:17 a.m. after one or more responses were posted


    #75

    Faster one ? (shorter loop, need timing)

    001  ENT^      
    002 STO0
    003 ST+0
    004 2
    005 SQRT
    006 /
    007 INT
    008 LSTX
    009 FRC
    010 X#0?
    011 ENT^
    012 X#0?
    013 /
    014 +
    015 X^2
    016 X<>Y
    017 LSTX
    018 -
    019 STOI *** I Index register
    020 X=0?
    021 SF2
    022 CLX
    023 0
    024 F2?
    025 GTO2
    026 *LBL0
    027 RCL0
    028 RCLI *** I Index register
    029 -
    030 LSTX
    031 x
    032 SQRT
    033 INT
    034 LSTX
    035 FRC
    036 X=0?
    037 GTO1
    038 CLX
    039 EEX
    040 *LBL1
    041 +
    042 +
    043 DSZI
    044 GTO0
    045 ENT^
    046 +
    047 *LBL2
    048 +
    049 4
    050 x

    Edited: 28 Sept 2011, 11:51 a.m.

    #76

    Quote:
    I now have a solution running on the HP 67. :-)

    It is a bit slow...


    Anyone going to try on the HP 25 or 10C :-)


    I might see if my Elektronika MK-61 is still functioning and up to the task. It will win the slowness stakes I'm sure.


    - Pauli


    #77

    Quote:
    Anyone going to try on the HP 25 or 10C :-)

    What about the HP-33C? Under half an hour, even less if the loop is optimized to use the stack whenever possible.

    Gerson.

    -----------------------------------------------

    Keystrokes |Display |
    | |
    [f]CLEAR[PRGM] |00- |
    [ENTER] |01- 31 |
    [STO]0 |02- 23 0 |
    [ENTER] |03- 31 |
    [ENTER] |04- 31 |
    [x] |05- 61 |
    [STO]1 |06- 23 1 |
    [CLx] |07- 34 |
    [STO]2 |08- 23 2 |
    [RDN] |09- 22 |
    2 |10- 2 |
    [f][SQRT] |11- 14 0 |
    [/] |12- 71 |
    [-] |13- 41 |
    [g][INTG] |14- 15 32 |
    [-] |15- 41 |
    [STO]4 |16- 23 4 |
    [x<>y] |17- 21 |
    [STO]3 |18- 23 3 |
    [f][x<=y] |19- 14 41 |
    [GTO]35 |20- 13 35 |
    1 |21- 1 |
    [STO][-]3 |22- 23 41 3 |
    [RCL]0 |23- 24 0 |
    [RCL]1 |24- 24 1 |
    [RCL]3 |25- 24 3 |
    [g][x^2] |26- 15 0 |
    [-] |27- 41 |
    [f][SQRT] |28- 14 0 |
    [-] |29- 41 |
    [g][INTG] |30- 15 32 |
    [STO][-]2 |31- 23 41 2 |
    [RCL]4 |32- 24 4 |
    [RCL]3 |33- 24 3 |
    [GTO]19 |34- 13 19 |
    [RCL]2 |35- 24 2 |
    [ENTER] |36- 31 |
    [+] |37- 51 |
    [RCL]1 |38- 24 1 |
    [+] |39- 51 |
    [RCL]4 |40- 24 4 |
    [RCL]0 |41- 24 0 |
    [-] |42- 41 |
    [g][x^2] |43- 15 0 |
    [+] |44- 51 |
    4 |45- 4 |
    [x] |46- 61 |
    [GTO]00 |47- 13 00 |

    1 R/S -> 4 ( 2.7 s )
    5 R/S -> 88 ( 3.8 s )
    540 R/S -> 918,168 ( 3 min 14.1 s )
    4999 R/S -> 78,528,204 ( 29 min 36.7 s )


    -----------------------------------------------

    #78

    My solutions:

    15C:

    001 -       33  Rv          
    002 - 33 Rv
    003 - 42, 7, 0 FIX 0
    004 - 44 1 STO 1
    005 - 42, 5, 1 DSE 1
    006 - 22 2 GTO 2
    007 - 22 3 GTO 3
    008 - 42,21, 2 LBL 2
    009 - 44 0 STO 0
    010 - 43 11 x^2
    011 - 44 2 STO 2
    012 - 42,21, 1 LBL 1
    013 - 45 2 RCL 2
    014 - 45 1 RCL 1
    015 - 43 11 x^2
    016 - 30 -
    017 - 11 SQRT
    018 - 36 ENTER
    019 - 43 44 INT
    020 - 44,40, 0 STO+ 0
    021 - 43,30, 6 TEST 6
    022 - 42, 6, 0 ISG 0
    023 - 43 7 DEG
    024 - 42, 5, 1 DSE 1
    025 - 22 1 GTO 1
    026 - 45 0 RCL 0
    027 - 42,21, 3 LBL 3
    028 - 4 4
    029 - 20 x

    34S (as submitted at HHC2011. NOTE: I was in a hurry and directly ported over my 15C program with the exception of using CEIL.):

    001 Rv
    002 Rv
    003 STO 01
    004 DSE 01
    005 GTO 02
    006 GTO 03
    007 LBL 02
    008 STO 00
    009 X^2
    010 STO 02
    011 LBL 01
    012 RCL 02
    013 RCL 01
    014 X^2
    015 -
    016 SQRT
    017 CEIL
    018 STO+00
    019 DSE 01
    020 GTO 01
    021 RCL 00
    022 LBL 03
    023 4
    024 x

    34S do-over after talking to Marcus, Ari, and reading the manual a bit more:

    001 RCL Z
    002 STO 00
    003 STO 01
    004 x^2
    005 STO 02
    006 SKIP 07
    007 RCL 02
    008 RCL 01
    009 x^2
    010 -
    011 SQRT
    012 CEIL
    013 STO+ 00
    014 DSE 01
    015 BACK 08
    016 RCL 00
    017 4
    018 x

    The 34S is a remarkably fun machine. The SKIP and BACK functions rock.

    Edited: 28 Sept 2011, 7:09 p.m.


    #79

    Egan's 34s program was the winner, in the "pure RPN" category. I don't want to steal Gene's thunder, I'll let him give full results and explain the category issue. (I don't think he has posted the results yet, but I could have missed it. If so, ignore this message.)


    Possibly Related Threads…
    Thread Author Replies Views Last Post
      HHC 2013 RPN Programming Challenge - Final Thought Jeff O. 7 2,483 10-17-2013, 11:02 AM
    Last Post: Jeff O.
      HHC 2013 programming contest winners Brian Walsh 52 13,088 09-26-2013, 11:39 PM
    Last Post: David Hayden
      Box left at HHC Tim Wessman 1 1,648 09-26-2013, 11:05 PM
    Last Post: Wlodek Mier-Jedrzejowicz
      HHC 2013 speakers' files Joe Horn 2 1,673 09-26-2013, 02:51 AM
    Last Post: Geoff Quickfall
      HHC 2013 Day 2 Highlights Eddie W. Shore 6 2,663 09-23-2013, 04:03 PM
    Last Post: Kimberly Thompson
      HHC 2013: Day 1 Highlights Eddie W. Shore 28 8,160 09-23-2013, 03:22 PM
    Last Post: Brad Barton
      HHC / HP Museum Programming Contest for RPN and RPL machines Gene Wright 18 5,797 09-22-2013, 09:39 AM
    Last Post: Miguel Toro
      HHC 2013 room numbers David Hayden 2 1,344 09-20-2013, 05:34 PM
    Last Post: sjthomas
      Good puzzle for kids to solve on 35s? snaggs 11 3,515 09-18-2013, 10:40 PM
    Last Post: David Hayden
      FYI: HHC 2013 Programming contests are coming Gene Wright 0 933 09-17-2013, 03:12 PM
    Last Post: gene wright

    Forum Jump: