12.747 Lecture 5: Section 3:

Objective Mapping and Kriging

File last modified 2 October 2000


5.3 Variograms

At the heart of kriging is the semivariogram or structure function of the regionalized variables that you are trying to estimate. This amounts to the a priori information that you must supply to the software in order to make a regular grid out of your irregularly spaced data. Basically the idea is to have an estimate of the distance one would need to travel before data points separated by that much distance are uncorrelated. This information is usually presented in the form of the variogram, which is a function of the semivariance vs. this distance lag.

5.3.1 Semivariance

First remember the definition of variance:

in most cases the variance of a data set is a number (scalar). The semivariance is a curve (vector) derived from the data according to:

where the star indicates an experimental variogram computed from the data and h is the lag distance between data point pairs. There also are theoretical semivariograms which model the structure of the underlying correlation between data points, such as the exponential model:

where c0 equals the nugget, c equals the sill and a equals the range of the semivariogram model.

5.3.2 The nugget, range and sill

There are three parameters that define the semivariogram:

Nugget (co):
Represents unresolved, sub-grid scale variation or measurement error and is seen on the variogram as the intercept of the variogram.
Range (a):
The scalar that controls the degree of correlation between data points, usually represented as a distance.
Sill (c):
The value of the semivariance as the lag (h) goes to infinity, it is equal to the total variance of the data set.

Given the two parameters range and sill and the appropriate model of semivariogram, the semivariances can be calculated for any h. These quantities can be best visualized in Fig. 5.3.1, a simple exponential model of semivariance.

The constant offset (co) added to the theoretical semivariance models is known as the "nugget effect". This constant accounts for the influence of high concentration centers in the data that prevent the experimental semivariogram from passing through the origin. This model has its beginnings with mining geologist who were looking for "nuggets" of gold.

There are several models of semivariance top pick from, the trick is to pick the one that best fits your data. We will discover later on in our discussions of kriging and cokriging that if you are estimating the semivariogram experimentally (i.e. from actual data) then the linear model seems to give the best results. You have already seen the exponential model, there are also the:

5.3.3 Isotropic and anisotropic data

The easiest semivariance model to envision of your data is when the sill and range values are always the same, regardless of the direction being considered. But that is not always the case and it is often found that data display anisotropic behavior in their range and sill values. Consider again an exponential model but now look at the difference revealed when the semivariances are calculated only in the north-south direction compared to only in the east-west direction.

Knowledge of these anisotropies is necessary when designing an appropriate semivariogram model of your data prior to kriging.

5.3.4 Robust semivariogram

There will be times when you will hear references to a robust semivariance estimator. This idea was championed by Noel Cressie and is dealt with in some detail in his book (Statistics for Spatial Data). Basically it is a variant on equation (5.3.2) that accounts for the effects of outliers in your data. Outliers (data in the tails of your data distribution that fall outside Gaussian expectations) have a tendancy to distort the results of equation (5.3.2). Cressie has put forward the following equation to make the experimentally determined semivariogram less sensitive to these outliers (hence, robust).

While somewhat overwhelming looking, upon inspection we see that this is just equation (5.3.2) modified. By taking the absolute value of the difference between two data points separated by a distance h, then taking its square root, dividing by the number of data pairs separated by the distance h, and then raising the results to the fourth power we diminish the effects of these outliers. The denominator is nothing more than a normalization to make gamma unbiased. This form of the experimental semivariogram is very useful in cases where we have a lot of data to estimate the semivariogram from and outliers can become an irksome problem; although this equation also works on lower data densities.


GoTo Next Section
GoTo Lecture TOC
GoTo 12.747 TOC


The text, graphics, and other materials contained in this homepage and attached documents are intended solely for scholarly use by the scientific and academic community. No reproduction, re-transmission or linking of this page to any other page without the author's expressed written permission is permitted.
© 1998, 2000 -- David M. Glover, WHOI --