Estimating cosmological parameter covariance
MNRAS 442, 2728–2738 (2014)
doi:10.1093/mnras/stu996
Estimating cosmological parameter covariance
Andy Taylor1‹ and Benjamin Joachimi1,2
1 Scottish
Universities Physics Alliance, Institute for Astronomy, School of Physics and Astronomy, University of Edinburgh, Royal Observatory,
Blackford Hill, Edinburgh EH9 3HJ, UK
2 Department of Physics and Astronomy, University College London, Gower Street, London WC1E 6BT, UK
Accepted 2014 May 16. Received 2014 May 16; in original form 2014 February 25
We investigate the bias and error in estimates of the cosmological parameter covariance matrix due to sampling or modelling the data covariance matrix, for likelihood width and peak
scatter estimators. We show that these estimators do not coincide unless the data covariance is
exactly known. For sampled data covariances with Gaussian-distributed data and parameters,
the parameter covariance matrix estimated from the width of the likelihood has a Wishart distribution, from which we derive the mean and covariance. This mean is biased and we propose
an unbiased estimator of the parameter covariance matrix. Comparing our analytic results to
a numerical Wishart sampler of the data covariance matrix we find excellent agreement. An
accurate ansatz for the mean parameter covariance for the peak scatter estimator is found, and
we fit its covariance to our numerical analysis. The mean is again biased and we propose an
unbiased estimator for the peak parameter covariance. For sampled data covariances, the width
estimator is more accurate than the peak scatter estimator. We investigate modelling the data
covariance, or equivalently data compression, and show that the peak scatter estimator is less
sensitive to biases in the model data covariance matrix than the width estimator, but requires
independent realizations of the data to reduce the statistical error. If the model bias on the
peak estimator is sufficiently low, this is promising, otherwise the sampled width estimator is
preferable.
Key words: methods: data analysis – methods: statistical – cosmological parameters – largescale structure of Universe.
1 I N T RO D U C T I O N
The high precision required to probe the nature of dark energy, dark
matter and modifications to gravity (e.g. Amendola et al. 2013)
is driving cosmology to an era where the accuracy of parameter
estimation will have to reach sub-per cent levels. To meet this challenge, large-scale ground and space-based cosmological surveys are
being planned and carried out which are optimized to deliver high
statistical accuracy (e.g. VST-KiDS, DES, HSC, LSST, Euclid). For
these surveys to be successful, systematic biases will also have to
be controlled to an unprecedented level, within the bounds set by
the statistical uncertainty. The introduction of statistical uncertainty
and systematic biases will have to be tracked at every step of the
data analysis, from observation to parameter estimation.
An aspect which has recently been receiving more attention in
this process is the final parameter estimation step when data, compressed into the form of power spectra or correlation functions, are
compared with cosmological models and further compressed into
E-mail:
estimates of the model parameters along with an estimate of their
accuracy. In particular, we want to have reliable, unbiased estimates
of the parameter covariance matrix, which is needed to demonstrate
how accurate the parameters have been measured as well as delineating the volumes of parameter space where acceptable models
reside. Beyond this, if we want to apply some form of model selection, for example investigating the Bayesian evidence, we need to
have an accurate representation of the posterior distribution of the
parameters. If the parameter covariance matrix is biased by a poor
estimator, it will either over- or underestimate the actual errors and
covariances of the measured parameters. In addition, a sub-optimal
covariance estimator will itself have significant uncertainties which
should be folded into the overall error budget.
There are two common approaches to estimate the uncertainty
on parameters derived from cosmological data. One is to estimate
the variance, or width, of the likelihood surface in parameter space.
This can be done by mapping out the likelihood surface and numerically integrating on a grid, or using a Monte Carlo Markov
Chain (e.g. Lewis & Bridle 2002) to sample the likelihood distribution and Monte Carlo integrating. A second approach is to generate
many independent realizations of the survey, either by simulating
C 2014 The Authors
Published by Oxford University Press on behalf of the Royal Astronomical Society
ABSTRACT
Estimating cosmological parameter covariance
ance matrix, with no sampling variance, where uncorrected biases
will propagate into the parameter covariance matrix. In Section 3,
we discuss how to numerically generate random realizations of the
data covariance matrix to compare with our results. In Section 4,
we derive the exact distribution for the parameter covariance matrix
estimated from the width of the likelihood, and its bias and error. We
also use our numerical results to propose an ansatz for the bias in the
peak scatter estimator and a fit to its error. We study modelling of
the data covariance and data compression in Section 5, and present
our summary and conclusions in Section 6. We begin by reviewing
estimators for the parameter covariance and how a bias in the data
covariance matrix propagates through.
2 PA R A M E T E R C OVA R I A N C E
For a given set of data, D, cosmological parameters, θ, can
be estimated by sampling the posterior parameter distribution,
P(θ| D) ∝ P( D|θ )P(θ), where the likelihood distribution of the
data is P( D|θ), and the parameter prior is P(θ ). We will focus on
the case where the data follow a Gaussian distribution and the mean
of the likelihood depends on the cosmological parameters, μ(θ ),
while the data covariance matrix, M = D Dt , is independent
of the parameters, and D = D − D is the fluctuation of the
data around estimates of the mean. The log-likelihood is given by
L = −2 ln P( D|θ ) = Dt D, where = M−1 is the inverse
data covariance matrix, the precision matrix.
If the data covariance matrix is estimated with some uncertainty,
we can treat it, and the precision matrix, as random variables and
marginalize over the uncertainty in the likelihood function with a
prior on the precision matrix,
P(θ| D, )
P(|),
P(θ| D, ) = d
(1)
is its estimated value and
where is the true precision matrix,
P(|)
is the prior. In the case that the precision matrix is known,
the prior will be a delta-function. But if the mean of the data and
precision matrix is estimated from the inverse of the sampled data
covariance matrix,
−1
NS
−1
1
t
Di Di
,
(2)
=M =
NS − 1 i=1
where Di is the ith realization from NS random Gaussian samples, the prior is Inverse-Wishart distributed.1 If the (...truncated)