Nullspace shuttles
Geophys. J . Int. (1996) 124,372-380
TREITEL SECTION
Nullspace shuttles
Michael M. Deal and Guust Nolet
Department of Geological and Geophysical Sciences, Princeton University, Princeton, NJ 08544, USA
Accepted 1994 December 5. Received 1994 November 8; in original form 1994 April 14
computational effort.
Key words: inversion, seismic tomography.
INTRODUCTION
While the images obtained with seismic tomography give us a
new and extraordinary view of the Earth, we must keep in
mind the limitations and weaknesses of the solutions. A
solution to an inverse problem is not a unique solution, but
instead is one that attempts to explain or fit the data subject
to a criterion that selects an ‘optimal’ model. For example, in
a standard least-squares problem there are many solutions
that minimize the sum of the squared errors, and the solution
with the smallest Euclidean norm is often chosen as the
‘optimal’ model. Solving for the minimum-norm solution is
computationally easy to implement using back projections
(Van der Sluis & Van der Vorst 1987), but the complexity of
a given problem may be better suited by using a different
definition of optimality.
Many applications in global tomography suffer from an
uneven distribution of ray paths. In this case, minimum-norm
solutions tend to distribute the velocity anomalies over model
cells that are visited by many rays while leaving the other cells
untouclsd. This introduces unwarranted lateral heterogeneity.
Su, Woodhouse & Dziewonski (1994) avoid such effects by
372
projecting the model onto a basis of low-order spherical
harmonics. However, this forces an extreme smoothness on
the model which does not allow us to see small-scale structures
such as subduction zones, even when the degree is raised to
angular order 12 (Su et al. 1994). In shallow seismic investigations, additional information is often available from well
logging or surface geology indicating sharp transitions or a
limited choice of seismic velocities. None of such a priori
information is addressed by simple minimum-norm or minimum-gradient solutions.
The criteria used to solve the tomographic problem can be
adjusted and modified until all a priori information is incorporated into the inversion, but this will undoubtedly lead to an
extremely complicated penalty function. Solving a difficult
penalty function may not be computationally feasible or justifiable. Stork & Clayton (1991) solve the tomographic problem
by including filters in a modified Dines & Lytle (1979) iteration
formula. To prevent instability in the inversion, the filters used
are limited to those that have a bandpass or averaging
structure. Another approach is to solve the problem using a
straightforward method such as the minimum-norm, leastsquares algorithm and then to modify the image a posteriori.
0 1996 RAS
SUMMARY
In seismic tomography the problem is generally underdetermined. The solution to the
tomographic problem depends on the specific optimization condition used a n d is
inherently distorted d u e t o noise in the d a t a and approximations in the theory.
Smoothing is often applied to reduce inversion artefacts with short correlation lengths.
However, a posteriori smoothing generally affects the data fit. For more sophisticated,
non-linear filters this effect can be severe. We present a technique t o conserve data fit
for filters of arbitrary complexity. The difference between the ‘optimal’ solution and a
filtered version is projected onto the nullspace of the model space in order to preserve
the data fit. Thus, we only allow changes t o the image that do not conflict with the
data. We demonstrate the benefits of such conservative filters using several different
non-linear filters to reduce noise, smooth the image, and highlight edges.
T h e method is exact in small-scale experiments, where we can use the method of
singular value decomposition: eigenvectors with large eigenvalues are used to project
the difference between the original model a n d the filtered version o n t o the nullspace.
With large-scale tomographic problems, calculation of all of the large eigenvectors
is unrealistic. We show how to use the iterative method of conjugate gradients to
apply conservative filters to large-scale tomographic problems with minimum
Nullspace shuttles
THEORY
The linearized inverse problem is commonly expressed in
matrix notation as
AX= d ,
(1)
where Aij is the distance the ith ray travels in cell j , x j is the
magnitude of the slowness deviations from the reference model
in cell j , and di is the traveltime delay for ray i. We assume N
data and M unknown model parameters. The familiar leastsquares solution is derived from the Gauss normal equation
A~AX
=~
~ d .
(2)
When ATA is singular or nearly singular, an inverse can be
constructed using singular value decomposition (SVD), and
the slowness model x can be represented as a linear combination of the first k eigenvectors of ATA belonging to the k
largest eigenvalues (Wiggins 1972; Jackson 1972):
k
(3)
where V, is a matrix with the eigenvectors of ATAas columns.
Assuming that R is the rank of matrix A, the last R - k
eigenvalues are smaller than a given tolerance value and are
ignored. By increasing the tolerance value we decrease the
number of eigenvectors and effectively damp the solution. We
define the 'generalized' nullspace as the space of all vectors
that satisfy
(4)
This leads to a relationship between the eigenvalue cut-off and
a tolerance value E'. If x is an eigenvector with eigenvalue 1,
then
lAxl= I l l 1x1,
(5)
and thus the generalized nullspace is spanned by all
eigenvectors with
Ill < E' .
(6)
Increasing E' allows us to expand the nullspace at the expense
0 1996 RAS, GJI 124, 372-380
of the data fit. The effect of the tolerance E' on the geometry
of the generalized nullspace may depend strongly on the type
of experiment. In a teleseismic inversion such as that by Nolet
(1989, the eigenvalues tend to decrease exponentially (Van
der Sluis & Van der Vorst 1987). In a more controlled set-up,
the eigenvalues may decrease much more slowly and the
dimension of the nullspace is a much weaker function of the
data misfit. As an example, the synthetic experiment presented
later is based on a borehole-type problem. The eigenvalues
associated with the matrix A are shown in Fig. 1. The first 100
eigenvalues decrease quickly, but the eigenvalues decrease very
gradually in the last three-quarters of the eigenvalue spectrum.
A plot of the root-mean-squared (RMS) misfit associated with
our synthetic experiment, Fig. 2, is similar in shape to the
eigenvalue spectrum. Use of a tolerance value of E' = 2, E , where
as a threshold in
is the largest eigenvalue and E = 1 x
deciding which eigenvalues to ignore produces a solution
which is a linear combination of the first 785 out of 900
eigenvectors. The solution has a variance reduction of 99.0 per
cent. By incorporating only the first 250 eigenvectors into the
solution, we obtain a variance (...truncated)