12.747 Lecture 6: Section 1:

Sequence Analysis I: Uniform Series, Cross- and Auto-Correlation, and Fourier Transforms

File last modified 4 October 1996


6.1 Goals and Examples of Sequence Analysis

Sequences of data, either in space or in time, appear all the time in geochemical research. You may have a time series of measurements at a location (e.g., sediment trap data, or ocean surface temperature), a series of stations along a hydrographic section, or isotope measurements on a long sediment core. For the sake of simplicity (initially) we shall discuss only regularly sampled data, that is samples taken at identical intervals in space or time. The analysis becomes more difficult and more complicated when we discuss irregularly spaced samples, but the principles are best understood in terms of the simplest case at first. What do we hope to achieve in the analysis of data sequences? There are as many approaches as there are reasons, and there are as many reasons (or perhaps more) as there are data sequences. The next subsections are a heterogeneous list of some of the major conceptual motivations.

6.1.1 Searching or testing for structure or periodicities

You might be looking or testing for changes in a system due to periodic forcing. For example, the effect of seasonal changes on biological production, or the effect of lunar tides on shell-fish contamination. This may be extended to spatial regularity as well, in that you may be looking for evidence of large scale kelvin waves on dissolved nutrients near convergence zones in the ocean. Structure is not restricted to periodicities. You may be interested in the decorrelation timescale or distance. For example, to what extent does the weekend weather depend on the previous Wednesday's. Also, how far apart do you need to space hydrographic stations (putting them too close together, i.e. within the decorrelation distance results in redundant data).

6.1.2 Correlation or correspondence between phenomena (and lags)

Comparison of different records, for example sea surface temperature and sediment trap yields at some great depth, needs to be performed to establish relationships between the two phenomena. Other examples might include the sunspot cycle and large scale weather patterns, oxygen isotopes in sediment cores and ice-core CO2 concentrations. Lags between records may imply causality, or at least possible delay mechanisms associated with the interrelationships. Thinking, for example about the sediment trap data, any lag between the oxygen isotope data in the sediment trap and sea surface temperature may yield the particle transit time from the surface to the trap.Bear in mind, however, that statistically significant correlation between data sequences does not prove causal relationship: it may, in fact point to co-causal relations (both sequences being driven by another, unexpected process).

6.1.3 Predictions (interpolation and extrapolation)

Of course, one of your goals might be to predict a certain value at a cherished location or time, which becomes methodologically a regression exercise. You may choose some method of regressing the series with a least squares algorithm, but you may also wish to map your data into frequency space to make your prediction. What this means is that you believe that there are predictable periodicities in your data which make your predictions more accurate in the frequency domain, or that it is more computationally efficient to perform the regression in this different domain. The reasons for doing this may include regularizing your data when your record is incomplete due to missing data points, or extending your data over longer periods.

6.1.4 Filtering and Signal Extraction

Another situation occurs when your signal is buried in a background of noise which you wish to filter out. There are techniques to digitally filter your data to remove unwanted noise, separate interfering signals, or in general improve the signal to noise ratio. For example, if you were measuring temperature and oxygen on a mooring, and there appeared to be a significant tidal signal which caused temperature and corresponding oxygen variations on a regular basis you could us a "notch filter" (one which did not pass a specific frequency). If there were further high frequency variations associated with wave action (assuming you were interested in longer term trends and variations) you could apply a "low-pass filter".

6.1.5 Power Spectral Analysis

In many of the above applications, and indeed as a technique and end unto itself, you may wish to do power spectral analysis. This simply means that you want to see how the energy of the system (we use energy in a very broad sense, since it can mean anything ranging from variability to power) is distributed as a function of frequency or wavenumber. This kind of analysis yields clues as to the behavior of the system, and the underlying physical laws operating. The shape of the spectrum tells you something about your measurement system, and your experiment design as well. The study of sequential analysis reveals the scope and fundamental limitations the measurement process.

6.1.6 The ground rules: stationary processes, etc.

We will restrict our discussion initially to regularly (evenly) spaced data. A lot of data taken satisfies this requirement, although experience dictates that there is an equal amount of data out there that does not. The assumption of regularity makes the analysis much more straightforward mathematically, although we will get into unevenly spaced data at the end of the next lecture. An additional assumption, which you will often come across in sequential analysis is that the series is stationary. This simply means that if you look at a stretch of the record, it looks pretty much the same as some other stretch. That is, there are no long term trends in the data. This isn't always the case, but is usually easily fixed. All you do is to plot (or statistically test) the data for the existence of a long term trend, regress the trend and subtract it from your data prior to sequence analysis. This kind of makes sense, because if your mindset is to examine the data for periodicities (sines and cosine waves, for example) then having an upward trend through your data means you might be sitting on the rising part of a very long term sine wave. This would be a sine wave which you have grossly undersampled (you sampled only a small percentage of its period) so you don't want to include it in your analysis. How you subtract it off really doesn't matter, as long as it (a) makes physical sense, and (b) does a statistically good job. You already have some tools to do this. Now we realize that the world is not now made of sines and cosines (well, maybe…) but it is just the way in which we will begin to look at things later on in these two lectures.

The concept of stationarity is actually much deeper than we described, and you can look for a more complete treatment in books like Jenkins and Watts if you need to. One of the underlying implications of stationarity is that the statistics of the data variation do not change with time. This is may or may not be a particularly damning restriction. It pays you to think carefully about what is changing during the sampling period, not only from the viewpoint of the measurement process (e.g., are there calibration shifts?) but also in the fundamental physical nature of the phenomena you are measuring.

In the remainder of this (and the next) lecture, we will talk in terms of time series analysis. Keep in mind that this applies equally well to space series analysis. Instead of time, you'd use distance, and instead of frequency, you'd use wavenumber. (frequency is to period as wavenumber is to wavelength, they are inverses of one another). From time to time we will remind you of the correspondence, but not always.


GoTo Next Section
GoTo Lecture TOC
GoTo 12.747 TOC
The text, graphics, and other materials contained in this webpage and attached documents are intended solely for scholarly use by the scientific and academic community. No reproduction, re-transmission or linking of this page to any other page without the author's expressed written permission is permitted.
© 1998, 2000 -- David M. Glover, WHOI --