State-dependent swap strategies and automatic reduction of number of temperatures in adaptive parallel tempering algorithm
State-dependent swap strategies and automatic reduction of number of temperatures in adaptive parallel tempering algorithm
Mateusz Krzysztof Ła˛cki 1 2
Błaz˙ej Miasojedow 0 2
0 Institute of Applied Mathematics, University of Warsaw , Banacha 2, 02-097 Warsaw , Poland
1 Institute of Informatics, University of Warsaw , Banacha 2, 02-097 Warsaw , Poland
2 Mateusz Krzysztof Ła ̨cki
In this paper we present extensions to the original adaptive Parallel Tempering algorithm. Two different approaches are presented. In the first one we introduce statedependent strategies using current information to perform a swap step. It encompasses a wide family of potential moves including the standard one and Equi-Energy type move, without any loss in tractability. In the second one, we introduce online trimming of the number of temperatures. Numerical experiments demonstrate the effectiveness of the proposed method.
Parallel tempering; Adaptive MCMC; Swapping strategies; Equi-Energy sampler
1 Introduction
Markov chain Monte Carlo (MCMC) is a generic method to
approximate an integral of the form
I :=
Electronic supplementary material The online version of this
article (doi:10.1007/s11222-015-9579-0) contains supplementary
material, which is available to authorized users.
where π is a probability density function, which can be
evaluated point-wise up to a normalising constant. Such an integral
occurs frequently when computing Bayesian posterior
expectations (Robert and Casella 1999; Gilks et al. 1998).
The random walk Metropolis algorithm (Metropolis et al.
1953) often works well, provided the target density π is,
roughly speaking, sufficiently close to unimodal. The
efficiency of the Metropolis algorithm can be optimised by a
suitable choice of proposal distribution. These, in turn, can
be chosen automatically by several adaptive MCMC
algorithms; see Haario et al. (2001), Atchadé and Rosenthal
(2005), Roberts and Rosenthal (2009), Andrieu and Thoms
(2008) and references therein.
When π has multiple well-separated modes, the random
walk-based methods tend to stuck in a single mode for long
periods of time. It can lead to false convergence and severely
erroneous results. Using a tailored Metropolis-Hastings
algorithm can help, but, in many cases, finding a good proposal
distribution is not easy. Tempering of π , that is,
considering auxiliary distributions with density proportional to π β
with β ∈ (0, 1), often provides better mixing between modes
(Swendsen and Wang 1986; Marinari and Parisi 1992;
Hansmann 1997; Woodard et al. 2009; Neal 1996).
We focus here particularly on the parallel tempering
algorithm, which is also known as the replica exchange Monte
Carlo and the Metropolis-coupled Markov chain Monte
Carlo.
The tempering approach is particularly tempting in such
settings where π admits a physical interpretation, and there
is good intuition how to choose the temperature schedule for
the algorithm.
In general, choosing the temperature schedule is a
nontrivial task, but there are generic guidelines for temperature
selection based on both empirical findings and theoretical
analysis (Kofke 2002; Kone and Kofke 2005; Atchadé et al.
2011; Roberts and Rosenthal 2012). These theoretical
findings were used to derive adaptive version of the Parallel
Tempering (Miasojedow et al. 2013a). Another approach to
temperature tuning can be found in (Behrens et al. 2012). This
approach offers a different criterion for choosing
temperature schedule and is developed for the Tempered Transitions
algorithm (Neal 1996).
In the present paper we consider the adaptive version of the
Parallel Tempering algorithm. The adaption consists in
introducing state-dependent swaps between differently tempered
random walks. We study the impact of different
distributions on potential steps and call them Strategies. Our choice
of strategies is driven by solutions already known to the
literature (Kou et al. 2006) and used within Parallel
Tempering algorithm by Baragatti et al. (2013). The novelty of
our approach stems from an alternative implementation of
Equi Energy moves that renders the algorithm parameters
free, i.e. the user does not need to provide precise Energy
Rings any more. We also investigate different modifications
of this new approach.
We also propose an automated method for reducing the
actual number of considered temperatures, in the spirit
of Miasojedow et al. (2013a). The temperature adaptation
scheme depends on the parameters of the adaptive random
walks applied in the parallelised Metropolis-Hastings stage
of the algorithm in case when the state space amounts to be
the usual Rd .
We have also showed that the proposed algorithm satisfies
the Law of Large numbers, in the same setting as in
Miasojedow et al. (2013a).
2 Definition and notations
Our basic object of interest is the density π : Ω → R+,
where Ω = Rd . We assume we can evaluate point-wise a
function that is proportional to π by some constant. The
Parallel Tempering approa (...truncated)