Algorithm for fitting a logistic curve?



#2

Hello,

Working on the math library here, and I have had an immensely difficult time finding how to efficiently implement a logistic curve fit. Note, this isn't a full fledged binary logistic regression (which I can find lots of information on), but rather the fitting of a curve to a set of data with the form L/(1+a*^(-b*x)).

The fitting method in the math library right now linearizes the equation and it doesn't give a very good fit at all so I am trying to replace it.

Does anyone have any helpful pointers to any algorithms for this type of problem?

I have posted the Charlie Patton's code comments below from the 48 math library for those interested (he wrote this originally). I am not 100% certain if the issue is the linearization and a completely different method is needed, or just the L esitmator routine needs replacing/improvement.

* Name: fitlogist
* Algorithm: The basic model is:
* y=L/(1+A*e^(B*x))
* which is equivalent to:
* Ae^(B*x)=(L-y)/y
* so that
* ln(A)+B*x=ln(L/y-1).
*
* Fit a linear model to the transformed data
* ( x(i), ln(L/y(i)-1)) to obtain y=a+bx
* then A=e^a and B=b
*
* Note B normally would be negative
*

**Name: Lestimate
**
**Category: Logistic Fit Utility
**
**Entry:
**
** Stack: [XY] (sorted)
**
**
** Temp. Env.
**
**
**Exit:
**
** Stack: L% (or %0 if there's a problem with zero divisors)
**
**
** Temp. Env.
**
**Errors:
**
**
**Description/Algorithm:
**
** This utility attempts to estimate the saturation value for a logistic
** equation from sorted statistical samples.
**
** It is assuming that the data Y(X) (stored in pair form X[i],Y[i]) corresponds
** to samples from a differential equation dY(X)/dX = Y(X)*k*(L-Y(X))
**
** Note that this is an autonomous ODE with nodes at Y=0 and Y=L.
** If we plot (1/Y)*(dY(X)/dX) as a function of Y (note that it doesn't really depend
** on X) we will get a straight line with a zero at Y=L. It is this fact we will use
** to approximate L from the data. Namely, replacing dY(X)/dX by it sampled version
** dY(X)/dX ~ Y'[i]=(Y[i+1]-Y[i])/(X[i+1]-X[i])
** we do linear regression on the pair Y[i],Z[i] with Z[i]=Y'[i]/Y[i] and
** find the zero of the corresponding line.
**

**Author: C.M.Patton
**Date Written: April 10, 1995

TW

--

Although I work for the HP calculator department, the comments and opinions I express here are my own.

Edited: 11 Nov 2011, 12:18 p.m.


#3

In my experience, fitting either the logistic function or the tanh function tends to get poor results. I suspect that this is due to how rapidly they go asymptotic, but that's really only a guess on my part. Hopefully someone knowledgeable about numerical analysis can explain how to do it properly.

#4

Quote:
The fitting method in the math library right now linearizes the equation and it doesn't give a very good fit at all so I am trying to replace it

Linearizing the equation, followed by a simple linear regression, is a classic method that usually gives decent results. However, it does not minimize the sum of the residuals' squares. How did you determine the quality of the fit here?
Quote:
I have posted the Charlie Patton's code comments below from the 48 math library for those interested (he wrote this originally). I am not 100% certain if the issue is the linearization and a completely different method is needed, or just the L esitmator routine needs replacing/improvement.

As far as I can see the comments simply refer to the common linearization, which here leads to the transformation ln(A)+B*x = ln(L/y-1). The goal of the algorithm however seems to be a different one: implement a method to estimate the saturation parameter L: "This utility attempts to estimate the saturation value for a logistic equation from sorted statistical samples".

A true least-square regression, i.e. one that exactly minimizes the sum of the residuals' squares, is not trivial. I came across the following document and think it's an interesting read on this subject:
http://home2.fvcc.edu/~dhicketh/DiffEqns/Activities/logistic.pdf

Dieter

#5

Tim,
Unfortunately, if you linearize an equation to fit it to your data, you do not minimize the SSE. And, not all equations can be linearized. What is needed is a non-linear fitting routine. But this requires a calculator with a CAS system so you can compute the derivatives of the equation with respect to the parameters. I have done this for the TI-89 and the NSpire CX CAS. If you want, I can send the routines to you. My email is don.phillips@gmail.com.

Don


#6

I can't believe I'm saying this (being a big fan of CAS calculators), but you don't NEED to use a CAS. I use Excel's Solver routine all the time to do least squares fitting to arbitrary function forms.

#7

Tim,

Quote:
Does anyone have any helpful pointers to any algorithms for this type of problem?

How critical is speed?

Perhaps you've already been down this road, but using a brute-force approach I took the equivalent equation:

y = L/(1+a*exp(-k*x))

and applied least-mean-square principles:

Let E = sum i=1 to n of (L/(1+A*EXP(-K*X_i)) - Y_i)^2

then minimized E by taking the partial derivatives of E with respect to L, a, and k and setting them to zero.

E 'L' DERIV 0 =
E 'A' DERIV 0 =
E 'K' DERIV 0 =

This gives three non-linear equations which can then be solved numerically for L, a, k.

I tried this with a few sample data points on the 50g (using the SOLVESYS lib to solve) and in Maxima (using MNEWTON to solve) and got matching results which also matched the FitLogistic command in the computer software GeoGebra. I don't know if you're allowed to use GPL code for your project, but GeoGebra is a GPL program with source code available from http://www.geogebra.org/source/program/. Perhaps you could look and see how they handle it.

You don't need the CAS since the derivatives can be hard coded, but numerically solving the equations is of course the bottleneck. It might even be faster to use the linearized results as the initial values in the iterative solving process.

~wes


Possibly Related Threads...
Thread Author Replies Views Last Post
  Entering,Saving,and Analysis /Fitting X Y Data on the Prime Harold A Climer 6 349 10-26-2013, 01:54 PM
Last Post: Tim Wessman
  Challenge(?): Intersection curve between two cylinders in a specific position Pier Aiello 15 553 09-17-2013, 05:58 PM
Last Post: Pier Aiello
  HP Prime : geometry & curve Mic 0 119 09-15-2013, 02:31 PM
Last Post: Mic
  HP 32S-II Vertical Curve Program Ron Cardwell 2 217 05-20-2013, 07:54 AM
Last Post: Thomas Klemm
  Question about a "combinations" algorithm Namir 9 358 09-20-2012, 04:51 PM
Last Post: Gilles Carpentier
  Linear Programming - Simplex Algorithm LarryLion 5 224 09-04-2012, 10:57 PM
Last Post: David Hayden
  Legible version of 29C Curve Fitting program Matt Agajanian 6 303 03-21-2012, 07:46 PM
Last Post: Matt Agajanian
  HP 32sII Integration Error of Standard Normal Curve Anthony (USA) 4 223 03-14-2012, 03:18 AM
Last Post: Nick_S
  Experience (Learning) Curve Program for HP-41C/CV/CX Chris Catotti 1 148 09-20-2011, 01:46 PM
Last Post: Frido Bohn
  Advantage Pac Curve Fitting mbrethen 9 358 06-26-2011, 08:06 PM
Last Post: mbrethen

Forum Jump: