The average HP RPN scientific can calculate summations, square them, do all sorts of stuff, like least squares fitting, give an average (mean), and standard deviations, and this includes, of course, the 32SII.

But I'd love to use the 32SII to also calculate a median and modes (frequency, the statistical kind). Personally, I found that keystroke programming with the use of registers seems not to allow data sorting of any kind. Am I wrong, or would that require as many registers as you have data points?

Is it at all possible to create a reasonable (lengthwise) program to calculate a statistical median or mode from a set of data with one of these calculators (any HP RPN handheld before the Charlemagnes)?

In order to compute the median or mode, you need to know the frequencies of each distinct data point. This is essentially equivalent to having all of the data points.

On the other hand, here is a little challenge which can be solved on the 32SII or any other HP programmable. (Its a little easier on machines which have a pseudo-random number generator, but, if your favourite calculator doesn't have one, you can always code one as part of your solution.)

Imagine a user having a long list of numbers that they key into the calculator under the guidance of a program you have written. The goal is for your program to select one of the user's inputs "at random". By this I mean that when the user is finished keying data, the calculator displays one of the user's data inputs. It must be true that any of the inputs that the user supplied was *equally likely* to have been selected by your program.

And, no, you don't know in advance how many items are in the user's data list.

Err... I think it's workable... HP32SII version:

LBL F

INPUT X

SUMMA+

1/X

RANDOM

X<Y?

GTO C

GTO F
LBL C #'X->Y'

RCL X

STO Y

GTO F

LBL R #the 'result'

VIEW Y

RTN

The variable X is the actual data, and Y is the 'favourite'. If the RANDOM<1/n (n: number of datas, what typed), then X->Y.

If you want to see the actual Y, type XEQ R, on the 'X?' screen.

Usage: CLSUMMA, (CLVARS), XEQ F, type datas [R/S], XEQ R

Csaba

Ed --

The RPN/AOS HP-17Bii will store the entire set of input data for staistical analysis, and has built-in functions for median, max, min, and sorting. "Mode" could likely be programmed using its powerful language for equation-solving, documented __not__ in its own manual, but in the HP-19 manual.

I know where I can abtain a 1993(?) HP-17Bii for you.

A very interesting problem posted by Patrick. My immediate inclination was to say that all the data must be stored, and a point picked out from the complete stored list after each new entry. It just didn't seem "right" to have all but one point ruled out as the winner part way through the data entry.

However, when I constructed the decision tree for n = 3 or 4 inputs, it worked out that each input would have a 1/n probability, where n = number of inputs. And it appears that this scales up to larger n.

It's been along time since I've done any significant calculator programming, but it looks like Tizedes' solution will work.

Interesting...

LC

Karl, could you e-mail me? I cannot access yours. Thanks.

Well done, Mr. Csaba. Your answer is right on.

In words, the basic idea of the algorithm is that the program always keeps the "current" selection stored, together with the current number of items entered to date. When the n'th input is made, it replaces the previous current selection with probability 1/n. So, for example, the first item entered is selected as the current value with probability 1. The second one entered is selected with probability 1/2, and so on.

After all the data is entered (or at any other time) the current selection is the value the program will return as the "random pick" of the user's data.

Here is the proof that the program selects each item with equal probability. Suppose that the entire list consists of N items. Consider the k'th item in the list, with 1<=k<=N. What is the probability that it ends up as the final selection? In order for our program to be correct, this probability must be 1/N. Well, in order for this to happen, it must be selected when it is keyed in and then all further keyed data inputs must **not** be selected. These events are independent and so we multiply the probabilities for them each to happen:

(1/k)*(k/(k+1))*((k+1)/(k+2))* ... *((N-1)/N) = 1/N

Techniques like this used to abound in the literature in the days when memory and CPU costs were large. Typically, a large data set was read off of a tape and certain reports were generated from the data. Core memory was not so plentiful that it could maintain all of that data in memory all at the same time. Nevertheless, it was critically important to programs of the day that they not require multiple passes through the data -- reading a tape was expensive, in CPU time and in wall-clock time. "Single pass" type algorithms were key.

The algorithm at the base of this challenge is a simple example of such a single-pass (stochastic) algorithm. Many such have been lost from general knowledge and practice because of how inexpensive memory (e.g., virtual memory) and CPU power has become.

Nice proof, elegant and simple!

LC

:D

I'm very happy to solve little (but interesting) problems. Do you think good idea is to start 'Challenge-Section'? I've got some problems, and I think some of that's interesting...

Csaba

Ps: English notation vs. Hungarian notation: My family name is 'Tizedes', and my first name is 'Csaba' ;) I know, it's not a classical name, and I maked a mistake when I typed it's in the wrong order... No problem! :)

Cs.