On Randomized Trace Estimates for Indefinite Matrices with an Application to Determinants (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s10208-021-09525-9.pdf

On Randomized Trace Estimates for Indefinite Matrices with an Application to Determinants

Foundations of Computational Mathematics (2022) 22:875–903 https://doi.org/10.1007/s10208-021-09525-9 On Randomized Trace Estimates for Indefinite Matrices with an Application to Determinants Alice Cortinovis1 · Daniel Kressner1 Received: 20 May 2020 / Revised: 16 February 2021 / Accepted: 11 May 2021 / Published online: 9 July 2021 © The Author(s) 2021 Abstract Randomized trace estimation is a popular and well-studied technique that approximates the trace of a large-scale matrix B by computing the average of x T Bx for many samples of a random vector X . Often, B is symmetric positive definite (SPD) but a number of applications give rise to indefinite B. Most notably, this is the case for log-determinant estimation, a task that features prominently in statistical learning, for instance in maximum likelihood estimation for Gaussian process regression. The analysis of randomized trace estimates, including tail bounds, has mostly focused on the SPD case. In this work, we derive new tail bounds for randomized trace estimates applied to indefinite B with Rademacher or Gaussian random vectors. These bounds significantly improve existing results for indefinite B, reducing the number of required samples by a factor n or even more, where n is the size of B. Even for an SPD matrix, our work improves an existing result by Roosta-Khorasani and Ascher (Found Comput Math, 15(5):1187–1212, 2015) for Rademacher vectors. This work also analyzes the combination of randomized trace estimates with the Lanczos method for approximating the trace of f (B). Particular attention is paid to the matrix logarithm, which is needed for log-determinant estimation. We improve and extend an existing result, to not only cover Rademacher but also Gaussian random vectors. Keywords Trace estimation · Determinant · Tail bounds · Entropy method · Lanczos method Communicated by Nicholas J. Higham. The work of Alice Cortinovis has been supported by the SNSF Research Project Fast algorithms from low-rank updates, Grant Number: 200020_178806. B Alice Cortinovis Daniel Kressner 1 Institute of Mathematics, EPF Lausanne, 1015 Lausanne, Switzerland 123 876 Foundations of Computational Mathematics (2022) 22:875–903 Mathematics Subject Classification 65C05 · 65F40 · 65F60 · 68W20 · 60E15 1 Introduction This paper is concerned with approximating the trace of a symmetric matrix B ∈ Rn×n that is accessible only implicitly via matrix-vector products or, more precisely, (approximate) quadratic forms. If X is a random vector of length n such that E[X ] = 0 and E[X X T ] = I , then E[X T B X ] = tr(B). Based on this result, a stochastic trace estimator [27] is obtained from sampling an average of N quadratic forms: tr N (B) := N 1 (i) T (X ) B X (i) , N (1) i=1 where X (i) , i = 1, . . . , N , are independent copies of X . The most common choices for X are standard Gaussian and Rademacher random vectors. The latter are defined by having i.i.d. entries that take values ±1 with equal probability. We will consider both choices in this paper and denote the resulting trace estimates by tr GN (B) and tr RN (B), respectively. Hutchinson [27] used tr RN (B) to approximate the trace of the influence matrix of Laplacian smoothing splines. In this setting, B = A−1 for a symmetric positive definite (SPD) matrix A and, in turn, A is SPD as well. Other applications, such as spectral density estimation [31], triangle counting in graphs [3,17], and determinant computation [5], may feature a symmetric but indefinite matrix B. For approximating the determinant, one exploits the relation log(det(A)) = tr(log(A)), (2) where log(A) denotes the matrix logarithm of A. The need for estimating determinants arises, for instance, in statistical learning [2,18,20], lattice quantum chromodynamics [39], and Markov random fields models [43]. Certain quantities associated with graphs can be formulated as determinants, such as the number of spanning trees, and various negative approximation results exist in this context; see, e.g., [16,35]. Relying on the Cholesky factorization, the exact computation of the determinant is often infeasible for a large matrix A. In contrast, the Hutchinson estimator combined with (2) bypasses the need for factorizing A and instead requires to (approximately) evaluate the quadratic form x T log(A)x for several vectors x ∈ Rn . Compared to the task of estimating the trace of A−1 , the determinant computation via (2) is complicated by two issues: (a) Even when A is SPD, the matrix B = log(A) may be indefinite; and (b) the quadratic forms x T log(A)x themselves are expensive to compute exactly, so they need to be approximated. We mention in passing that there are other methods to approximate traces and determinants, including randomized subspace iteration [37] and block Krylov methods [30], but they only work well in specific cases, e.g., when A = σ I + C for a matrix C of low numerical rank. The Hutch++ trace estimator, recently proposed and analyzed for the SPD case in [32], overcomes this limitation via 123 Foundations of Computational Mathematics (2022) 22:875–903 877 a combination with stochastic trace estimation. Although it is not difficult to imagine that the results presented in this work are useful in extending the analysis from [32] to the indefinite case, a thorough discussion of this extension is beyond the scope of this work. Another direction of work on large-scale determinant estimation has explored the use of spectral sparsifiers for symmetric diagonally dominant matrices [16,26]. Trace Estimation of Indefinite Matrices. By the central limit theorem, estimate (1) can be expected to become more reliable as N increases; see, e.g., [13, Corollaries 3.3 and 4.3] for such an asymptotic result as N → ∞. Most existing non-asymptotic results for trace estimation are specific to an SPD matrix B; see [4,22,36] for examples. They provide a bound on the estimated number N of probe vectors to ensure a small relative error with high probability: tr(B) − tr G,R (B) N P ≥ ε ≤ δ; tr(B) (3) see Remark 2 for a specific example. As already mentioned, the assumption that B is SPD is usually not met when computing the determinant of an SPD matrix A via tr(log(A)) because this would require all eigenvalues of A to be larger than one. For general indefinite B, it is unrealistic to aim at a bound of form (3) for the relative error, because tr(B) = 0 does not imply zero error. Ubaru, Chen, and Saad [40] derive a bound for the absolute error via rescaling, that is, the results from [36] are applied to the matrix C := − log(λA) for a value of λ > 0 that ensures C to be SPD. Specifically, for Rademacher vectors it is shown in [40, Corollary 4.5] that P | tr RN (log(A)) − log det(A)| ≥ ε ≤ δ (4) is satisfied with fixed failure probability δ if the number of samples N grows proportionally to ε−2 n 2 log(1 + κ(A))2 log 2δ where κ(A) denotes the condition number of A. Unfortunately, t (...truncated)