5.3 Variograms
At the heart of kriging is the semivariogram or structure function of the regionalized variables that you are trying to estimate. This amounts to the a priori information that you must supply to the software in order to make a regular grid out of your irregularly spaced data. Basically the idea is to have an estimate of the distance one would need to travel before data points separated by that much distance are uncorrelated. This information is usually presented in the form of the variogram, which is a function of the semivariance vs. this distance lag.
First remember the definition of variance:
in most cases the variance of a data set is a number (scalar). The semivariance is a curve (vector) derived from the data according to:
where the star indicates an experimental variogram computed from the data and h is the lag distance between data point pairs. There also are theoretical semivariograms which model the structure of the underlying correlation between data points, such as the exponential model:
where c0 equals the nugget, c equals the sill and a equals the range of the semivariogram model.
There are three parameters that define the semivariogram:
Given the two parameters range and sill and the appropriate model of semivariogram, the semivariances can be calculated for any h. These quantities can be best visualized in Fig. 5.3.1, a simple exponential model of semivariance.
The constant offset (co) added to the theoretical semivariance models is known as the "nugget effect". This constant accounts for the influence of high concentration centers in the data that prevent the experimental semivariogram from passing through the origin. This model has its beginnings with mining geologist who were looking for "nuggets" of gold.
There are several models of semivariance top pick from, the trick is to pick the one that best fits your data. We will discover later on in our discussions of kriging and cokriging that if you are estimating the semivariogram experimentally (i.e. from actual data) then the linear model seems to give the best results. You have already seen the exponential model, there are also the:
and the slope (b) is nothing more than the ratio of the sill (c) to the range (a).
The easiest semivariance model to envision of your data is when the sill and range values are always the same, regardless of the direction being considered. But that is not always the case and it is often found that data display anisotropic behavior in their range and sill values. Consider again an exponential model but now look at the difference revealed when the semivariances are calculated only in the north-south direction compared to only in the east-west direction.
Knowledge of these anisotropies is necessary when designing an appropriate semivariogram model of your data prior to kriging.
While somewhat overwhelming looking, upon inspection we see that this is just equation (5.3.2) modified. By taking the absolute value of the difference between two data points separated by a distance h, then taking its square root, dividing by the number of data pairs separated by the distance h, and then raising the results to the fourth power we diminish the effects of these outliers. The denominator is nothing more than a normalization to make gamma unbiased. This form of the experimental semivariogram is very useful in cases where we have a lot of data to estimate the semivariogram from and outliers can become an irksome problem; although this equation also works on lower data densities.
GoTo Next Section