Kalman Filters for Time Delay of Arrival-Based Source Localization
Hindawi Publishing Corporation
EURASIP Journal on Applied Signal Processing
Volume 2006
Kalman Filters for Time Delay of Arrival-Based Source Localization
Ulrich Klee 0
Tobias Gehrig 0
John McDonough 0
0 Institut fu ̈r Theoretische Informatik, Universita ̈t Karlsruhe , Am Fasanengarten 5, 76131 Karlsruhe , Germany
In this work, we propose an algorithm for acoustic source localization based on time delay of arrival (TDOA) estimation. In earlier work by other authors, an initial closed-form approximation was first used to estimate the true position of the speaker followed by a Kalman filtering stage to smooth the time series of estimates. In the proposed algorithm, this closed-form approximation is eliminated by employing a Kalman filter to directly update the speaker's position estimate based on the observed TDOAs. In particular, the TDOAs comprise the observation associated with an extended Kalman filter whose state corresponds to the speaker's position. We tested our algorithm on a data set consisting of seminars held by actual speakers. Our experiments revealed that the proposed algorithm provides source localization accuracy superior to the standard spherical and linear intersection techniques. Moreover, the proposed algorithm, although relying on an iterative optimization scheme, proved efficient enough for real-time operation.
1. INTRODUCTION
Most practical acoustic source localization schemes are based
on time delay of arrival estimation (TDOA) for the following
reasons: such systems are conceptually simple. They are
reasonably effective in moderately reverberant environments.
Moreover, their low computational complexity makes them
well-suited to real-time implementation with several sensors.
Time delay of arrival-based source localization is based
on a two-step procedure.
(
1
) The TDOA between all pairs of microphones is
estimated, typically by finding the peak in a
cross-correlation or generalized cross-correlation function [1].
(
2
) For a given source location, the squared error is
calculated between the estimated TDOAs and those
determined from the source location. The estimated source
location then corresponds to that position which
minimizes this squared error.
If the TDOA estimates are assumed to have a
Gaussiandistributed error term, it can be shown that the least-squares
metric used in Step (
2
) provides the maximum likelihood
(ML) estimate of the speaker location [2]. Unfortunately,
this least-squares criterion results in a nonlinear
optimization problem that can have several local minima. Several
authors have proposed solving this optimization problem with
standard gradient-based iterative techniques. While such
techniques typically yield accurate location estimates, they
are typically computationally intensive and thus ill-suited for
real-time implementation [3, 4].
For any pair of microphones, the surface on which the
TDOA is constant is a hyperboloid of two sheets. A
second class of algorithms seeks to exploit this fact by
grouping all microphones into pairs, estimating the TDOA of each
pair, then finding the point where all associated hyperboloids
most nearly intersect. Several closed-form position estimates
based on this approach have appeared in the literature; see
Chan and Ho [5] and the literature review found there.
Unfortunately, the point of intersection of two hyperboloids can
change significantly based on a slight change in the
eccentricity of one of the hyperboloids. Hence, a third class of
algorithms was developed wherein the position estimate is
obtained from the intersection of several spheres. The first
algorithm in this class was proposed by Schau and Robinson
[6], and later came to be known as spherical intersection.
Perhaps the best-known algorithm from this class is the spherical
interpolation method of Smith and Abel [7]. Both methods
provide closed-form estimates suitable for real-time
implementation.
Brandstein et al. [4] proposed yet another closed-form
approximation known as linear intersection. Their algorithm
proceeds by first calculating a bearing line to the source for
each pair of sensors. Thereafter, the point of nearest approach
is calculated for each pair of bearing lines, yielding a potential
source location. The final position estimate is obtained from
a weighted average of these potential source locations.
In the algorithm proposed here, the closed-form
approximation used in prior approaches is eliminated by
employing an extended Kalman filter to directly update the speaker’s
position estimate based on the observed TDOAs. In
particular, the TDOAs comprise the observation associated with
an extended Kalman filter whose state corresponds to the
speaker’s position. Hence, the new position estimate comes
directly from the update formulae of the Kalman filter. It is
worth noting that similar approaches have been proposed by
Dvorkind and Gannot [8] for an acoustic source localizer, as
well as by Duraiswami et al. [9] for a combined audio-vi (...truncated)