Algorithm for fitting a logistic curve? « Next Oldest | Next Newest »

 ▼ Tim Wessman Posting Freak Posts: 1,278 Threads: 44 Joined: Jul 2007 11-11-2011, 12:05 PM Hello, Working on the math library here, and I have had an immensely difficult time finding how to efficiently implement a logistic curve fit. Note, this isn't a full fledged binary logistic regression (which I can find lots of information on), but rather the fitting of a curve to a set of data with the form L/(1+a*^(-b*x)). The fitting method in the math library right now linearizes the equation and it doesn't give a very good fit at all so I am trying to replace it. Does anyone have any helpful pointers to any algorithms for this type of problem? I have posted the Charlie Patton's code comments below from the 48 math library for those interested (he wrote this originally). I am not 100% certain if the issue is the linearization and a completely different method is needed, or just the L esitmator routine needs replacing/improvement. ``` * Name: fitlogist * Algorithm: The basic model is: * y=L/(1+A*e^(B*x)) * which is equivalent to: * Ae^(B*x)=(L-y)/y * so that * ln(A)+B*x=ln(L/y-1). * * Fit a linear model to the transformed data * ( x(i), ln(L/y(i)-1)) to obtain y=a+bx * then A=e^a and B=b * * Note B normally would be negative * **Name: Lestimate ** **Category: Logistic Fit Utility ** **Entry: ** ** Stack: [XY] (sorted) ** ** ** Temp. Env. ** ** **Exit: ** ** Stack: L% (or %0 if there's a problem with zero divisors) ** ** ** Temp. Env. ** **Errors: ** ** **Description/Algorithm: ** ** This utility attempts to estimate the saturation value for a logistic ** equation from sorted statistical samples. ** ** It is assuming that the data Y(X) (stored in pair form X[i],Y[i]) corresponds ** to samples from a differential equation dY(X)/dX = Y(X)*k*(L-Y(X)) ** ** Note that this is an autonomous ODE with nodes at Y=0 and Y=L. ** If we plot (1/Y)*(dY(X)/dX) as a function of Y (note that it doesn't really depend ** on X) we will get a straight line with a zero at Y=L. It is this fact we will use ** to approximate L from the data. Namely, replacing dY(X)/dX by it sampled version ** dY(X)/dX ~ Y'[i]=(Y[i+1]-Y[i])/(X[i+1]-X[i]) ** we do linear regression on the pair Y[i],Z[i] with Z[i]=Y'[i]/Y[i] and ** find the zero of the corresponding line. ** **Author: C.M.Patton **Date Written: April 10, 1995 ``` TW -- Although I work for the HP calculator department, the comments and opinions I express here are my own. Edited: 11 Nov 2011, 12:18 p.m. ▼ Eric Smith Posting Freak Posts: 2,309 Threads: 116 Joined: Jun 2005 11-11-2011, 01:47 PM In my experience, fitting either the logistic function or the tanh function tends to get poor results. I suspect that this is due to how rapidly they go asymptotic, but that's really only a guess on my part. Hopefully someone knowledgeable about numerical analysis can explain how to do it properly. Dieter Senior Member Posts: 653 Threads: 26 Joined: Aug 2010 11-11-2011, 04:15 PM Quote: The fitting method in the math library right now linearizes the equation and it doesn't give a very good fit at all so I am trying to replace it Linearizing the equation, followed by a simple linear regression, is a classic method that usually gives decent results. However, it does not minimize the sum of the residuals' squares. How did you determine the quality of the fit here? Quote: I have posted the Charlie Patton's code comments below from the 48 math library for those interested (he wrote this originally). I am not 100% certain if the issue is the linearization and a completely different method is needed, or just the L esitmator routine needs replacing/improvement. As far as I can see the comments simply refer to the common linearization, which here leads to the transformation ln(A)+B*x = ln(L/y-1). The goal of the algorithm however seems to be a different one: implement a method to estimate the saturation parameter L: "This utility attempts to estimate the saturation value for a logistic equation from sorted statistical samples". A true least-square regression, i.e. one that exactly minimizes the sum of the residuals' squares, is not trivial. I came across the following document and think it's an interesting read on this subject: http://home2.fvcc.edu/~dhicketh/DiffEqns/Activities/logistic.pdf Dieter MacDonald Phillips Junior Member Posts: 13 Threads: 1 Joined: Jul 2007 11-11-2011, 05:42 PM Tim, Unfortunately, if you linearize an equation to fit it to your data, you do not minimize the SSE. And, not all equations can be linearized. What is needed is a non-linear fitting routine. But this requires a calculator with a CAS system so you can compute the derivatives of the equation with respect to the parameters. I have done this for the TI-89 and the NSpire CX CAS. If you want, I can send the routines to you. My email is don.phillips@gmail.com. Don ▼ Crawl Senior Member Posts: 306 Threads: 3 Joined: Sep 2009 11-11-2011, 07:27 PM I can't believe I'm saying this (being a big fan of CAS calculators), but you don't NEED to use a CAS. I use Excel's Solver routine all the time to do least squares fitting to arbitrary function forms. Wes Loewer Junior Member Posts: 25 Threads: 1 Joined: Sep 2011 11-13-2011, 01:22 AM Tim, Quote: Does anyone have any helpful pointers to any algorithms for this type of problem? How critical is speed? Perhaps you've already been down this road, but using a brute-force approach I took the equivalent equation: ```y = L/(1+a*exp(-k*x)) ``` and applied least-mean-square principles: ```Let E = sum i=1 to n of (L/(1+A*EXP(-K*X_i)) - Y_i)^2 ``` then minimized E by taking the partial derivatives of E with respect to L, a, and k and setting them to zero. ```E 'L' DERIV 0 = E 'A' DERIV 0 = E 'K' DERIV 0 = ``` This gives three non-linear equations which can then be solved numerically for L, a, k. I tried this with a few sample data points on the 50g (using the SOLVESYS lib to solve) and in Maxima (using MNEWTON to solve) and got matching results which also matched the FitLogistic command in the computer software GeoGebra. I don't know if you're allowed to use GPL code for your project, but GeoGebra is a GPL program with source code available from http://www.geogebra.org/source/program/. Perhaps you could look and see how they handle it. You don't need the CAS since the derivatives can be hard coded, but numerically solving the equations is of course the bottleneck. It might even be faster to use the linearized results as the initial values in the iterative solving process. ~wes

 Possibly Related Threads... Thread Author Replies Views Last Post Entering,Saving,and Analysis /Fitting X Y Data on the Prime Harold A Climer 6 796 10-26-2013, 01:54 PM Last Post: Tim Wessman Challenge(?): Intersection curve between two cylinders in a specific position Pier Aiello 15 1,205 09-17-2013, 05:58 PM Last Post: Pier Aiello HP Prime : geometry & curve Mic 0 258 09-15-2013, 02:31 PM Last Post: Mic HP 32S-II Vertical Curve Program Ron Cardwell 2 437 05-20-2013, 07:54 AM Last Post: Thomas Klemm Question about a "combinations" algorithm Namir 9 728 09-20-2012, 04:51 PM Last Post: Gilles Carpentier Linear Programming - Simplex Algorithm LarryLion 5 510 09-04-2012, 10:57 PM Last Post: David Hayden Legible version of 29C Curve Fitting program Matt Agajanian 6 634 03-21-2012, 07:46 PM Last Post: Matt Agajanian HP 32sII Integration Error of Standard Normal Curve Anthony (USA) 4 503 03-14-2012, 03:18 AM Last Post: Nick_S Experience (Learning) Curve Program for HP-41C/CV/CX Chris Catotti 1 312 09-20-2011, 01:46 PM Last Post: Frido Bohn Advantage Pac Curve Fitting mbrethen 9 820 06-26-2011, 08:06 PM Last Post: mbrethen

Forum Jump: 