On Randomized Trace Estimates for Indefinite Matrices with an Application to Determinants
Foundations of Computational Mathematics (2022) 22:875–903
https://doi.org/10.1007/s10208-021-09525-9
On Randomized Trace Estimates for Indefinite Matrices
with an Application to Determinants
Alice Cortinovis1 · Daniel Kressner1
Received: 20 May 2020 / Revised: 16 February 2021 / Accepted: 11 May 2021 /
Published online: 9 July 2021
© The Author(s) 2021
Abstract
Randomized trace estimation is a popular and well-studied technique that approximates the trace of a large-scale matrix B by computing the average of x T Bx for
many samples of a random vector X . Often, B is symmetric positive definite (SPD)
but a number of applications give rise to indefinite B. Most notably, this is the case
for log-determinant estimation, a task that features prominently in statistical learning,
for instance in maximum likelihood estimation for Gaussian process regression. The
analysis of randomized trace estimates, including tail bounds, has mostly focused on
the SPD case. In this work, we derive new tail bounds for randomized trace estimates
applied to indefinite B with Rademacher or Gaussian random vectors. These bounds
significantly improve existing results for indefinite B, reducing the number of required
samples by a factor n or even more, where n is the size of B. Even for an SPD matrix,
our work improves an existing result by Roosta-Khorasani and Ascher (Found Comput
Math, 15(5):1187–1212, 2015) for Rademacher vectors. This work also analyzes the
combination of randomized trace estimates with the Lanczos method for approximating the trace of f (B). Particular attention is paid to the matrix logarithm, which is
needed for log-determinant estimation. We improve and extend an existing result, to
not only cover Rademacher but also Gaussian random vectors.
Keywords Trace estimation · Determinant · Tail bounds · Entropy method · Lanczos
method
Communicated by Nicholas J. Higham.
The work of Alice Cortinovis has been supported by the SNSF Research Project Fast algorithms from
low-rank updates, Grant Number: 200020_178806.
B
Alice Cortinovis
Daniel Kressner
1
Institute of Mathematics, EPF Lausanne, 1015 Lausanne, Switzerland
123
876
Foundations of Computational Mathematics (2022) 22:875–903
Mathematics Subject Classification 65C05 · 65F40 · 65F60 · 68W20 · 60E15
1 Introduction
This paper is concerned with approximating the trace of a symmetric matrix B ∈
Rn×n that is accessible only implicitly via matrix-vector products or, more precisely,
(approximate) quadratic forms. If X is a random vector of length n such that E[X ] = 0
and E[X X T ] = I , then E[X T B X ] = tr(B). Based on this result, a stochastic trace
estimator [27] is obtained from sampling an average of N quadratic forms:
tr N (B) :=
N
1 (i) T
(X ) B X (i) ,
N
(1)
i=1
where X (i) , i = 1, . . . , N , are independent copies of X . The most common choices for
X are standard Gaussian and Rademacher random vectors. The latter are defined by
having i.i.d. entries that take values ±1 with equal probability. We will consider both
choices in this paper and denote the resulting trace estimates by tr GN (B) and tr RN (B),
respectively.
Hutchinson [27] used tr RN (B) to approximate the trace of the influence matrix
of Laplacian smoothing splines. In this setting, B = A−1 for a symmetric positive
definite (SPD) matrix A and, in turn, A is SPD as well. Other applications, such as
spectral density estimation [31], triangle counting in graphs [3,17], and determinant
computation [5], may feature a symmetric but indefinite matrix B. For approximating
the determinant, one exploits the relation
log(det(A)) = tr(log(A)),
(2)
where log(A) denotes the matrix logarithm of A. The need for estimating determinants
arises, for instance, in statistical learning [2,18,20], lattice quantum chromodynamics
[39], and Markov random fields models [43]. Certain quantities associated with graphs
can be formulated as determinants, such as the number of spanning trees, and various
negative approximation results exist in this context; see, e.g., [16,35]. Relying on the
Cholesky factorization, the exact computation of the determinant is often infeasible for
a large matrix A. In contrast, the Hutchinson estimator combined with (2) bypasses the
need for factorizing A and instead requires to (approximately) evaluate the quadratic
form x T log(A)x for several vectors x ∈ Rn . Compared to the task of estimating
the trace of A−1 , the determinant computation via (2) is complicated by two issues:
(a) Even when A is SPD, the matrix B = log(A) may be indefinite; and (b) the
quadratic forms x T log(A)x themselves are expensive to compute exactly, so they
need to be approximated. We mention in passing that there are other methods to
approximate traces and determinants, including randomized subspace iteration [37]
and block Krylov methods [30], but they only work well in specific cases, e.g., when
A = σ I + C for a matrix C of low numerical rank. The Hutch++ trace estimator,
recently proposed and analyzed for the SPD case in [32], overcomes this limitation via
123
Foundations of Computational Mathematics (2022) 22:875–903
877
a combination with stochastic trace estimation. Although it is not difficult to imagine
that the results presented in this work are useful in extending the analysis from [32] to
the indefinite case, a thorough discussion of this extension is beyond the scope of this
work. Another direction of work on large-scale determinant estimation has explored
the use of spectral sparsifiers for symmetric diagonally dominant matrices [16,26].
Trace Estimation of Indefinite Matrices. By the central limit theorem, estimate (1)
can be expected to become more reliable as N increases; see, e.g., [13, Corollaries
3.3 and 4.3] for such an asymptotic result as N → ∞. Most existing non-asymptotic
results for trace estimation are specific to an SPD matrix B; see [4,22,36] for examples.
They provide a bound on the estimated number N of probe vectors to ensure a small
relative error with high probability:
tr(B) − tr G,R (B)
N
P
≥ ε ≤ δ;
tr(B)
(3)
see Remark 2 for a specific example. As already mentioned, the assumption that B
is SPD is usually not met when computing the determinant of an SPD matrix A via
tr(log(A)) because this would require all eigenvalues of A to be larger than one. For
general indefinite B, it is unrealistic to aim at a bound of form (3) for the relative error,
because tr(B) = 0 does not imply zero error. Ubaru, Chen, and Saad [40] derive a
bound for the absolute error via rescaling, that is, the results from [36] are applied to
the matrix C := − log(λA) for a value of λ > 0 that ensures C to be SPD. Specifically,
for Rademacher vectors it is shown in [40, Corollary 4.5] that
P | tr RN (log(A)) − log det(A)| ≥ ε ≤ δ
(4)
is satisfied with fixed failure probability δ if the number of samples N grows proportionally to ε−2 n 2 log(1 + κ(A))2 log 2δ where κ(A) denotes the condition number
of A. Unfortunately, t (...truncated)