Estimating cosmological parameter covariance (pdf)

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/mnras/article-pdf/442/3/2728/3602076/stu996.pdf

Estimating cosmological parameter covariance

MNRAS 442, 2728–2738 (2014) doi:10.1093/mnras/stu996 Estimating cosmological parameter covariance Andy Taylor1‹ and Benjamin Joachimi1,2 1 Scottish Universities Physics Alliance, Institute for Astronomy, School of Physics and Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh EH9 3HJ, UK 2 Department of Physics and Astronomy, University College London, Gower Street, London WC1E 6BT, UK Accepted 2014 May 16. Received 2014 May 16; in original form 2014 February 25 We investigate the bias and error in estimates of the cosmological parameter covariance matrix due to sampling or modelling the data covariance matrix, for likelihood width and peak scatter estimators. We show that these estimators do not coincide unless the data covariance is exactly known. For sampled data covariances with Gaussian-distributed data and parameters, the parameter covariance matrix estimated from the width of the likelihood has a Wishart distribution, from which we derive the mean and covariance. This mean is biased and we propose an unbiased estimator of the parameter covariance matrix. Comparing our analytic results to a numerical Wishart sampler of the data covariance matrix we find excellent agreement. An accurate ansatz for the mean parameter covariance for the peak scatter estimator is found, and we fit its covariance to our numerical analysis. The mean is again biased and we propose an unbiased estimator for the peak parameter covariance. For sampled data covariances, the width estimator is more accurate than the peak scatter estimator. We investigate modelling the data covariance, or equivalently data compression, and show that the peak scatter estimator is less sensitive to biases in the model data covariance matrix than the width estimator, but requires independent realizations of the data to reduce the statistical error. If the model bias on the peak estimator is sufficiently low, this is promising, otherwise the sampled width estimator is preferable. Key words: methods: data analysis – methods: statistical – cosmological parameters – largescale structure of Universe. 1 I N T RO D U C T I O N The high precision required to probe the nature of dark energy, dark matter and modifications to gravity (e.g. Amendola et al. 2013) is driving cosmology to an era where the accuracy of parameter estimation will have to reach sub-per cent levels. To meet this challenge, large-scale ground and space-based cosmological surveys are being planned and carried out which are optimized to deliver high statistical accuracy (e.g. VST-KiDS, DES, HSC, LSST, Euclid). For these surveys to be successful, systematic biases will also have to be controlled to an unprecedented level, within the bounds set by the statistical uncertainty. The introduction of statistical uncertainty and systematic biases will have to be tracked at every step of the data analysis, from observation to parameter estimation. An aspect which has recently been receiving more attention in this process is the final parameter estimation step when data, compressed into the form of power spectra or correlation functions, are compared with cosmological models and further compressed into E-mail: estimates of the model parameters along with an estimate of their accuracy. In particular, we want to have reliable, unbiased estimates of the parameter covariance matrix, which is needed to demonstrate how accurate the parameters have been measured as well as delineating the volumes of parameter space where acceptable models reside. Beyond this, if we want to apply some form of model selection, for example investigating the Bayesian evidence, we need to have an accurate representation of the posterior distribution of the parameters. If the parameter covariance matrix is biased by a poor estimator, it will either over- or underestimate the actual errors and covariances of the measured parameters. In addition, a sub-optimal covariance estimator will itself have significant uncertainties which should be folded into the overall error budget. There are two common approaches to estimate the uncertainty on parameters derived from cosmological data. One is to estimate the variance, or width, of the likelihood surface in parameter space. This can be done by mapping out the likelihood surface and numerically integrating on a grid, or using a Monte Carlo Markov Chain (e.g. Lewis & Bridle 2002) to sample the likelihood distribution and Monte Carlo integrating. A second approach is to generate many independent realizations of the survey, either by simulating C 2014 The Authors Published by Oxford University Press on behalf of the Royal Astronomical Society ABSTRACT Estimating cosmological parameter covariance ance matrix, with no sampling variance, where uncorrected biases will propagate into the parameter covariance matrix. In Section 3, we discuss how to numerically generate random realizations of the data covariance matrix to compare with our results. In Section 4, we derive the exact distribution for the parameter covariance matrix estimated from the width of the likelihood, and its bias and error. We also use our numerical results to propose an ansatz for the bias in the peak scatter estimator and a fit to its error. We study modelling of the data covariance and data compression in Section 5, and present our summary and conclusions in Section 6. We begin by reviewing estimators for the parameter covariance and how a bias in the data covariance matrix propagates through. 2 PA R A M E T E R C OVA R I A N C E For a given set of data, D, cosmological parameters, θ, can be estimated by sampling the posterior parameter distribution, P(θ| D) ∝ P( D|θ )P(θ), where the likelihood distribution of the data is P( D|θ), and the parameter prior is P(θ ). We will focus on the case where the data follow a Gaussian distribution and the mean of the likelihood depends on the cosmological parameters, μ(θ ), while the data covariance matrix, M = D Dt , is independent of the parameters, and D = D − D is the fluctuation of the data around estimates of the mean. The log-likelihood is given by L = −2 ln P( D|θ ) = Dt D, where = M−1 is the inverse data covariance matrix, the precision matrix. If the data covariance matrix is estimated with some uncertainty, we can treat it, and the precision matrix, as random variables and marginalize over the uncertainty in the likelihood function with a prior on the precision matrix, P(θ| D, ) P(|), P(θ| D, ) = d (1) is its estimated value and where is the true precision matrix, P(|) is the prior. In the case that the precision matrix is known, the prior will be a delta-function. But if the mean of the data and precision matrix is estimated from the inverse of the sampled data covariance matrix, −1 NS −1 1 t Di Di , (2) =M = NS − 1 i=1 where Di is the ith realization from NS random Gaussian samples, the prior is Inverse-Wishart distributed.1 If the (...truncated)