12.747 Lecture 5: Section 1:

Objective Mapping and Kriging

File last modified 27 September 2000


5.1 Contouring and gridding concepts

This lecture covers the question: "what do you do when your data is not on a regular grid?" This question comes up frequently because computers can only draw, for example, contour lines when they know where to draw them. Often a contouring package will grid your data for you using a "default" method of generating a grid and this will be acceptable. But there's more to it than making it easy for computers to draw contour lines. Nevertheless, different gridding methods will produce different looking maps. How, then, can we objectively decide between these different results? The answer to this question is, of course, dependent upon the question you are asking.

5.1.1 Data on a regular grid

There is a straightforward way to contour irregularly spaced data: triangularization. The individual data points are connected in a network of "triangles". The location of the contour line along these triangularization lines is then computed by a simple linear interpolation (see Fig. 5.1.1). This approach is "OK" for producing contour maps, but is difficult to use for derived products (gradients, etc.) And is compute intensive. Furthermore, if you were to sample at different locations you would get a different contour map.

Better then to put your data onto a regular, rectangular grid. A regular grid is easier for the computer to use, but is more difficult for the user to generate. The benefits to be gain for this extra trouble are large.

5.1.2 Methods: nearest neighbor, bilinear, inverse distance, etc.

Nearest Neighbor: is a method that works in a way you might expect from the name. The grid value is estimated from the value of the nearest neighbor data point. The distance from a grid point to the actual data points is given by:

where the index k will be reserved for a number indicating grid number (in a sequential sense) and iin this case refers to sequential numbers identifying the actual data points.

Is the bare bones nearest neighbor formulation. Sometimes this method is augmented by the "N-nearest neighbors":

This method of generating grids is of particular use for filling in gaps in data already on a regular grid or nearly so.

Bilinear Interpolation: is a method that is frequently referred to as a "close enough for government work" method. The value at the grid point is an interpolation product (y) from the following formulas:

and:

where y1-4 are the actual data points surrounding the grid point (or sometimes called node). This method can be augmented and there are the logical extensions such as bicubic interpolation which yields higher order accuracy.

Inverse distance: is actually a class of methods that weight the data points contribution to the grid point by the inverse of the distance between the grid point and data point (sometimes this weight is raised to a power, 2, 3, or even higher if there is a reason). This is basically Eqn 5.1.3 with the Di,k raised to the power mentioned. This method is fast, but has a tendency to generate "bull's-eyes" around the actual data points.

Kriging: is a method to determined the best linear unbiased estimate of the grid points. We will discuss this in greater detail in section 5.4. This method is very flexible, but requires the user to bring a priori information about the data to the problem. This information takes the form of a variogram of the semivariances and there are several models of variograms that can be used. Typically, real data is best dealt with a linear variogram unless there is considerable amount of data to derive a robust variogram (more in sections 5.3 and 5.4).

5.1.3 Weighted averaging

Several of the above methods can also have weighting added to improve the fidelity of the grid to the actual data. Consider Eqn 5.1.3 (N-nearest neighbors), this equation can have a weighting factor added to the numerator to increase the influence of some data points over others on the value of the grid points. Typically the weighting is done with some idea of the uncertainty in the individual data points themselves.

5.1.4 Splines

We don't plan on covering splines per se. Like many of the topics covered in this course, splines are a course onto themselves. But we would be remiss if we did not mention them, here. Splines got their start as long flexible pieces of wood or metal. They were used to fit curvilinearly smooth shapes when the mathematics and/or the tools were not available to machine the shapes directly (i.e. hull shapes and the curviture of airplane wings).

Since then a mathematical equivalent has grown up around their use and they are extremely useful in fitting a smooth line or surface to irregularly spaced data points. Essentially then, splines are piecewise functions for connecting points in 2 or 3 dimensions. They are not analytical functions nor are they statistical models, they are purely empirical and devoid of any theoretical basis. They are also useful for interpolating between data points. They exist as piecewise polynomials constrained to have continuous derivatives at the joints between segments.

The most common spline (there are many of the them) is the cubic spline. A cubic polynomial can pass through any four points at once. To make sure that it is continuously smooth, a cubic spline is fit to only two of the data points at a time. This allows for the use of the "other" information to maintain this smoothness.

If you consider Fig. 5.1.3 there are four data points (P1, P2, P3, and P4). Cubic polynomials are fit to only two data points at a time (P1 to P2, P2 to P3, etc.). By requiring the tangent of P1-P2 at P2 to be equal to the tangent of P2-P3 at P2 to be equal, we can write a series of simultaneous equations and solve for the unknown coefficients.

There are a number of known problems with splines. Extrapolating beyond the edges of the data domain quite often yields wildly erratic results. This is be cause there is no information beyond the data domain to constrain the extrapolation and splines are essentially higher order polynomials which will grow to large values (positive or negative). Closely spaced data points can develop aneurysms. In an attempt to squeeze a higher order polynomial into a tight space large over- and under-shoots of the true function can occur. These problems also occur in 3-D applications of splines. However, if a smooth surface is what you are looking for, frequently splines (spline relaxations) will give you a good, usable smooth fit to your data.


GoTo Next Section
GoTo Lecture TOC
GoTo 12.747 TOC


The text, graphics, and other materials contained in this homepage and attached documents are intended solely for scholarly use by the scientific and academic community. No reproduction, re-transmission or linking of this page to any other page without the author's expressed written permission is permitted.
© 1998, 2000 -- David M. Glover, WHOI --