12.747 Lecture 5: Section 5:
Objective Mapping and Kriging
File last modified 27 September 2000
We have been very fortunate to obtain from Denis Marcotte (via e-mail) copies of the m-files
published in Marcotte (1991). This section of the lecture notes covers material on how to use this
program. Although not covered in lecture, this very powerful program will extremely useful to
any of you that must make objective grids of their data during your careers.
The concept of cokriging is nothing more than a multivariate extension of the kriging technique
we went over in class and is covered in lecture notes section 5.4. Instead of going through all of
the machinations necessary for kriging one property at a time, we do all of the properties we wish
to grid in one calculation. In addition, covariance information about the way properties related to
each other is used to improve the grid estimation and reduce the error associated with the grid
estimates.
Along with the types of variograms estimated in lecture notes section 5.3, cross-variograms are
also necessary. These are logical extensions of the variograms we have already dealt with.
Remember the semivariance is provided by:
The cross-semivariance is given by:
where N refers to the number of data pairs that are separated by the same distance h, when j=k
you have the definition of the semivariogram. One interesting thing about the cross-semivariance
is that it can take on negative values. The semivariance must, by definition always be positive,
the cross-semivariance can be negative because the value of one property may be increasing
while the other in the pair is decreasing.
As we discussed earlier, a regionalized variable is a variable that is distributed in "space", where
the meaning of space can be extended to include phenomena that are generally thought of as
occurring in time. A regionalized phenomena can be represented by several inter-correlated
variables, for example, lead-zinc deposits or nutrients in the ocean. Then there may be some
advantage to study them simultaneously, this is an extension of the regionalized variable theory
to mulitvariate space and is what amounts to a coregionalized model. We can see from Eqn 5.5.2
that the cross-variogram is symmetric in (j,k) and (h,-h), which is not always the case in the
covariance matrix formed from the data.
In this section I will try to give you my best understanding of the program cokri.m. In this way I
hope to make the simplest and most straightforward application of this program available to you
while opening the possibility of future, more complicated uses, to you as well.
Sometimes the easiest way to understand a program is to understand, as best as possible, what the
input and output variables are. In the case of cokri.m they are as follows:
Input:
- x this is the n by (p+d) matrix of data points. In this program n refers to the total number of
sample locations (stations, well locations, etc.), p refers to the number of properties you are
estimating, d the dimensionality of the problem being studied (1-D, 2-D, 3-D, etc.).
- x0 is the m by d matrix of points on your grid that you will be kriging (cokriging) onto. In this
program m in the number of grid points, e.g. if you are working a 2-D problem and decide to
put your estimates onto a 21 by 57 point grid, m will be equal to 1197, x0 is however
represented as a 1197 by 2 matrix of coordinate doublets.
- model is perhaps one of the trickiest variables in the program. It represents half of the
coregionalization model to be used. If you are using only one model in 3-D, then model is a 1
by 4 "matrix" wherein the first column is the code of the model (1=nugget, 2=exponential,
3=gaussian, 4=spherical, 5=linear). The remaining columns of the model variable represents
the range of your model in the x, y, and z directions. A special note should be made here about
the use of a linear model. As stated in the help information of cokri.m, the ranges in a linear
model are arbitrary so that when they are divided into the also arbitrary sill values in c they
produce the slope of the linear semivariogram model being used for that property in that
direction. It may not be possible to find two sets of three numbers that will produce the nine
slopes estimated from three properties in three-dimensions, hence a sort of least square best
estimate should be used instead.
- c is the variable containing the rp by p sills of the coregionalization model in use. In this
program r is used to represent the number of variogram models being applied to the problem.
For example, one might wish to combine the effects of a nugget model with a linear model for
three properties in 3-dimensions, then r=2, model is a 2 by 4 matrix, and c is a 6 by 3 matrix
of numbers. A nugget model is indicated when the intercept of a linear semivariogram model
is not zero and that intercept value is put in the first p by p sub-matrix of c to correspond to the
first model row of the model variable.
- itype is a scale variable indicating the type of cokriging to be done. In five different values just
about everything is covered, from simple cokriging to universal cokriging with a trend surface
of order 2. In general, simple cokriging should be used when the mean of the data is known
and the data field is globally stationary in its mean as well as locally stationary in it variance.
- avg, in his paper (Marcotte, 1991) states that this variable is not used, but later in one of his
examples he uses it. I cannot get the program to run unless I provide a 1 by p matrix of the
averages of the individual properties being cokriged when doing simple cokriging.
- block is a 1 by d vector of the size of the "block" to be cokriged. If we were certain of the
volume of our individual samples we could use something other than point kriging, i.e. any
positive values will work in that case..
- nd is a 1 by d vector of the discretization grid for cokriging, if using point cokriging make
them all ones.
- ival is a scalar describing whether or not cross-validation should be done and how. I find it
easier and quicker to run the program with ival set to zero for no cross-validation.
- nk is a scalar indicating the number of nearest neighbors of the x input matrix to use in
estimating the cokriged grid point. This is a difficult parameter to give hard and fast rules for
deciding how large to make this. You may wish cokri.m to use all of the data points and set
this scalar to a very large number, on the other hand you may wish for only local effects to
factor into the weighted estimates for the grid point. If you don't get satisfactory results the
first time around, increase or decrease this number.
- rad is a scalar that describes the radius of search for the nearest neighbors in nk, clearly they
are interrelated and one helps constrain the other. Additionally, it is clear here that the units of
the coordinates need to be in the same units, if not, standardization helps.
- ntok a scalar descibing how many groups of grid points in x0 will be cokriged as one. When
ntokis greater than one, the nk points inside the rad search radius will be found from the
block's centroid location.
Output:
- x0s is, of course, your answer. It is a m by (d+p) matrix of the grid point estimates. The
d columns correspond to the grid point coordinates given in x0 and the p columns correspond
to the estimates of the p properties at those grid point coordinates.
- s is a m by (d+p) matrix of the error estimates of the grid points. This is the big benefit to
kriging in that it provides you with not only an estimate of a property's value at a grid point,
but also an estimate of the uncertainty in that estimate.
- sv is a 1 by p vector of variances of points in the universe.
- id is a (nk times p) by 2 matrix of the identifiers of the lambda (or in Davis w) weights for the
last cokriging system solved (i.e. the last grid point system of equations).
- l is a ((nk times p) minus nc) by (ntok times p) matrix with the lambda (w) weights and
Lagrange multipliers of the last cokriging system solved. In this program nc refers to the
number of constraints applied to the simple cokriging system.
A word of warning, for some reason, Marcotte has set up cokri.m to turn off case sensitivity.
When the program is finished running variables Axb and axb are considered the same and
making reference to a variable such as Axb will generate a "variable or function not found" error.
Simply issue the command "casesen" and case sensitivity will be restored.
GoTo Next Lecture
GoTo Lecture TOC
GoTo 12.747 TOC
The text, graphics, and other materials contained in this homepage and
attached documents are intended solely for scholarly use by the scientific
and academic community. No reproduction, re-transmission or linking of this
page to any other page without the author's expressed written permission is
permitted.
© 1998, 2000 -- David M. Glover, WHOI --