3.4 Non-Linear Least Squares Techniques
3.4.1 Iterative Techniques (Like a Rolling Stone )
What happens when the model equations are non-linear in the coefficients? You do the same thing: MINIMIZE CHI SQUARED. The difference is that you have to do it iteratively. There is no simple way of writing down and solving a set of linear analytic equations.
All non-linear fitting routines use some means of doing this, and all non-linear fitting routines do it by systematically searching coefficient space for a chi squared minimum. They do it by repeatedly calculating the chi squared for your model equation and data, and nudging the coefficients in differing directions until they reach a minimum (or as close to a minimum as you stipulate). The only difference between the techniques is the precise search mechanism, which can range from very crude to quite elaborate. The Bevington book (chapter 8) describes three basic approaches: grid search, gradient search and a combination. The grid search works well when you are near the chi squared minimum, but are not very efficient at moving large distances in coefficient space. The gradient search moves large distances well, since it travels down the path of steepest descent, adjusting all the coefficients at once, but tends to sometimes get trapped in long valleys and doesn't converge well near the global chi squared minimum.
The more sophisticated routines tend to use gradient search routines initially, which adjust the size of the nudge that they give to the coefficients to the size of the change in the chi squared. Near the minimum they perform a grid search, and the last effort usually involves some kind of polynomial interpolation of the chi squared surface. The price is that you need to supply them with the means to not only evaluate your model function for your data (xi's), but also the gradient in your data. A popular method is the "Levenberg-Marquardt Method". We won't go into the details right now.
Think of the algorithm like a marble rolling around on a surface of hills and valleys (a little like the arcade game "marble madness"). The ball will continue to roll down hill until it reaches the lowest point (sometimes oscillating around the minimum as it goes, depending on the character of the algorithm used).
All of these routines require you to provide:
Keep one thing in mind. Nothing works better than an initial first guess. You might get a little sloppy and tend to let the computer do the work, but the chi squared surface in coefficient hyperspace may have more than one minimum, and your algorithm may get trapped in a local minimum which is not nearly as good as a global minimum which just over the next "hill" in hyperspace. As a corollary you should always plot your results: the eyeball is not often fooled. You may not be able to distinguish precise fits, but you can tell if the fit is far from optimal. Sometimes it is worthwhile to do a grid search of initial starting values, or to approach the optimum from extremely different directions (e.g., from positive, then negative values).
For large data sets and/or complicated non-linear functions with many coefficients, the iterative techniques are computationally expensive. You may find that you have no choice. Keep in mind that a sensible choice of model equations, based on a realistic physical or chemical model of the system you are fitting will take you much further than any sophisticated mathematics or powerful computers. Common sense is a powerful ally.
3.4.2 Uncertainties in Coefficients
Computing the uncertainties in coefficients for non-linear regressions is a challenging problem. It involves examining the shape of the chi squared surface in coefficient hyperspace and estimating the boundaries of confidence intervals associated with "elevation" changes in chi squared. The user is often given the covariance matrix (for the coefficients) on convergence, and calculate associated uncertainties from that. In addition to the excellent discussion by Bevington in chapter 8, the reader is referred to section 15.6 of Recipes for a discussion.
3.4.3 Example Calculation: Gaussian on a Constant Background
To give you an idea of how this might work, consider a data set which you suspect to be a gaussian shaped peak embedded in a constant background, with some measurement noise. The measurements are represented by the dots below.

The first thing you do is create a MATLAB m-file whose sole purpose is to compute your model function. You can call it anything you want, but here we will call it "modfunc.m". It is a function which takes as its arguments the x data and vector containing the coefficients, and returns the model fitted data. The m-file in this example contains a whole two lines:
function out=modfunc(x,p) out = p(1) + p(2)*exp(-((x-p(3))/p(4)).^2);
Note the following about this m-file. First the function statement makes a function call rather than just a simple script. You need to do this if you want it to return any values to MATLAB. Second, note the "dot" before the "caret". This way you don't end up getting a matrix from squaring "x". This statement calculates all of the y's from the x's at once. The above is equivalent to the following mathematical equation:
Next, you download from our Web Site a non-linear least squares fitting routine called nlleasqr.m and a helper routine called dfdp.m which you need to do this. Both files ultimately came from the MathWorks Web Site, but have been fixed up a little by wjj to remove some bugs. You need both to do this. The first is the main engine. The last is a program which calculates the chi squared gradient in coefficient space. It does this numerically rather than analytically. You can make your own analytic function (and call it some distinct name), which may do a better job, but it is not really necessary. Finally, there is the data x.dat and y.dat (how original!).
OK, now you have the m-files, you start up MATLAB, and load in the x and y data you had. You then plot it up, since you need to have some idea of your initial starting guesses for parameters. You eyeball the plot and guess what the baseline value should be (this is a1) which we'll guess to be 1. You then guess the height of the gaussian (a2) to be 8. The center of the gaussian (a3) we'll guess to be at 5, and its width (a4) to be 2. So we set our initial parameter guess to be
load x.dat load y.dat pin=[1 8 5 2];
Note that the order is important, and should be the same as you use in the m-file "modfunc.m". Next, you type "help nlleasqr" to find out how to feed it information. Note that most of the input parameters are optional, so don't be intimidated, and don't wear out your fingers. We'll just try
[f,p,kvg,iter,corp,covp,covr,stdresid,Z,r2] = nlleasqr(x,y,pin,'modfunc');
The stuff on the left is all the goodies you get back, but we'll only look at a few of them. Your "answer" or coefficients is stored in "p". First, let's look at what kind of job this routine did for us. Let's plot it:
xf=[0:.1:10]; % dummy array to plot yf=modfunc(xf,p); % compute fit for the dummy array plot(x,y,'*',xf,yf) % plot both data and fit on same graph
Hmmm, not bad. About the numbers: "p" is the output coefficients, "kvg" is a flag to say if convergence was achieved before the routine gave up, "iter" is the number of iterations, "covp" is the covariance matrix for the coefficients (the square roots of the diagonal elements are the uncertainties in the parameters, and r2 is the overall correlation coefficient. For this example, we get
coefficients = [0.6595 9.9624 4.4780 0.9561] uncertainties = [0.1220 0.3760 0.0249 0.0466] r2 = 0.9160
Oh yes, the "real" values were [0.5 10.0 4.50 1.0].
The answer is roughly within errors of the real values, but will
differ because of the noise that was added to the data. Your mileage
may differ slightly when you do this experiment due to roundoff
error.
GoTo the Next Lecture
GoTo Lecture TOC
GoTo 12.747 TOC