12.747 Lecture 6: Section 1:
Sequence Analysis I: Uniform Series, Cross- and
Auto-Correlation, and Fourier Transforms
File last modified 4 October 1996
6.1 Goals and Examples of Sequence Analysis
Sequences of data, either in space or in time, appear all the
time in geochemical research. You may have a time series of measurements
at a location (e.g., sediment trap data, or ocean surface
temperature), a series of stations along a hydrographic section,
or isotope measurements on a long sediment core. For the sake
of simplicity (initially) we shall discuss only regularly sampled
data, that is samples taken at identical intervals in space
or time. The analysis becomes more difficult and more complicated
when we discuss irregularly spaced samples, but the principles
are best understood in terms of the simplest case at first. What
do we hope to achieve in the analysis of data sequences? There
are as many approaches as there are reasons, and there are as
many reasons (or perhaps more) as there are data sequences. The
next subsections are a heterogeneous list of some of the major
conceptual motivations.
6.1.1 Searching or testing for structure
or periodicities
You might be looking or testing for changes
in a system due to periodic forcing. For example, the effect of
seasonal changes on biological production, or the effect of lunar
tides on shell-fish contamination. This may be extended to spatial
regularity as well, in that you may be looking for evidence of
large scale kelvin waves on dissolved nutrients near convergence
zones in the ocean. Structure is not restricted to periodicities.
You may be interested in the decorrelation timescale or distance.
For example, to what extent does the weekend weather depend on
the previous Wednesday's. Also, how far apart do you need to space
hydrographic stations (putting them too close together, i.e.
within the decorrelation distance results in redundant data).
6.1.2 Correlation or correspondence between
phenomena (and lags)
Comparison of different records, for
example sea surface temperature and sediment trap yields at some
great depth, needs to be performed to establish relationships
between the two phenomena. Other examples might include the sunspot
cycle and large scale weather patterns, oxygen isotopes in sediment
cores and ice-core CO2 concentrations. Lags between
records may imply causality, or at least possible delay mechanisms
associated with the interrelationships. Thinking, for example
about the sediment trap data, any lag between the oxygen isotope
data in the sediment trap and sea surface temperature may yield
the particle transit time from the surface to the trap.Bear in
mind, however, that statistically significant correlation between
data sequences does not prove causal relationship: it may, in
fact point to co-causal relations (both sequences being
driven by another, unexpected process).
6.1.3 Predictions (interpolation and extrapolation)
Of course, one of your goals might be to predict a certain value
at a cherished location or time, which becomes methodologically
a regression exercise. You may choose some method of regressing
the series with a least squares algorithm, but you may also wish
to map your data into frequency space to make your prediction.
What this means is that you believe that there are predictable
periodicities in your data which make your predictions more accurate
in the frequency domain, or that it is more computationally efficient
to perform the regression in this different domain. The reasons
for doing this may include regularizing your data when
your record is incomplete due to missing data points, or extending
your data over longer periods.
6.1.4 Filtering and Signal Extraction
Another situation occurs when your signal is buried in a background
of noise which you wish to filter out. There are techniques to
digitally filter your data to remove unwanted noise, separate
interfering signals, or in general improve the signal to noise
ratio. For example, if you were measuring temperature and oxygen
on a mooring, and there appeared to be a significant tidal signal
which caused temperature and corresponding oxygen variations on
a regular basis you could us a "notch filter" (one which
did not pass a specific frequency). If there were further high
frequency variations associated with wave action (assuming you
were interested in longer term trends and variations) you could
apply a "low-pass filter".
6.1.5 Power Spectral Analysis
In many
of the above applications, and indeed as a technique and end unto
itself, you may wish to do power spectral analysis. This simply
means that you want to see how the energy of the system (we use
energy in a very broad sense, since it can mean anything ranging
from variability to power) is distributed as a function of frequency
or wavenumber. This kind of analysis yields clues as to the behavior
of the system, and the underlying physical laws operating. The
shape of the spectrum tells you something about your measurement
system, and your experiment design as well. The study of sequential
analysis reveals the scope and fundamental limitations the measurement
process.
6.1.6 The ground rules: stationary processes,
etc.
We will restrict our discussion initially to regularly (evenly)
spaced data. A lot of data taken satisfies this requirement, although
experience dictates that there is an equal amount of data out
there that does not. The assumption of regularity makes the analysis
much more straightforward mathematically, although we will get
into unevenly spaced data at the end of the next lecture. An additional
assumption, which you will often come across in sequential analysis
is that the series is stationary. This simply means that
if you look at a stretch of the record, it looks pretty much the
same as some other stretch. That is, there are no long term trends
in the data. This isn't always the case, but is usually easily
fixed. All you do is to plot (or statistically test) the data
for the existence of a long term trend, regress the trend and
subtract it from your data prior to sequence analysis. This kind
of makes sense, because if your mindset is to examine the data
for periodicities (sines and cosine waves, for example) then having
an upward trend through your data means you might be sitting on
the rising part of a very long term sine wave. This would be a
sine wave which you have grossly undersampled (you sampled only
a small percentage of its period) so you don't want to include
it in your analysis. How you subtract it off really doesn't matter,
as long as it (a) makes physical sense, and (b) does a statistically
good job. You already have some tools to do this. Now we realize
that the world is not now made of sines and cosines (well, maybe
)
but it is just the way in which we will begin to look at things
later on in these two lectures.
The concept of stationarity is actually much deeper than we described,
and you can look for a more complete treatment in books like Jenkins
and Watts if you need to. One of the underlying implications
of stationarity is that the statistics of the data variation
do not change with time. This is may or may not be a particularly
damning restriction. It pays you to think carefully about what
is changing during the sampling period, not only from the viewpoint
of the measurement process (e.g., are there calibration
shifts?) but also in the fundamental physical nature of the phenomena
you are measuring.
In the remainder of this (and the next) lecture, we will talk
in terms of time series analysis. Keep in mind that this applies
equally well to space series analysis. Instead of time, you'd
use distance, and instead of frequency, you'd use wavenumber.
(frequency is to period as wavenumber is to wavelength, they are
inverses of one another). From time to time we will remind you
of the correspondence, but not always.
GoTo Next Section
GoTo Lecture TOC
GoTo 12.747 TOC
The text, graphics, and other materials contained in this webpage and
attached documents are intended solely for scholarly use by the scientific
and academic community. No reproduction, re-transmission or linking of this
page to any other page without the author's expressed written permission is
permitted.
© 1998, 2000 -- David M. Glover, WHOI --