State-dependent swap strategies and automatic reduction of number of temperatures in adaptive parallel tempering algorithm
Stat Comput
DOI 10.1007/s11222-015-9579-0
State-dependent swap strategies and automatic reduction
of number of temperatures in adaptive parallel tempering
algorithm
Mateusz Krzysztof Ła˛cki1 · Błażej Miasojedow2
Received: 12 June 2014 / Accepted: 15 May 2015
© The Author(s) 2015. This article is published with open access at Springerlink.com
Abstract In this paper we present extensions to the original adaptive Parallel Tempering algorithm. Two different
approaches are presented. In the first one we introduce statedependent strategies using current information to perform a
swap step. It encompasses a wide family of potential moves
including the standard one and Equi-Energy type move, without any loss in tractability. In the second one, we introduce
online trimming of the number of temperatures. Numerical
experiments demonstrate the effectiveness of the proposed
method.
Keywords Parallel tempering · Adaptive MCMC ·
Swapping strategies · Equi-Energy sampler
1 Introduction
Markov chain Monte Carlo (MCMC) is a generic method to
approximate an integral of the form
I :=
Rd
f (y)π(y)dy,
Electronic supplementary material The online version of this
article (doi:10.1007/s11222-015-9579-0) contains supplementary
material, which is available to authorized users.
B Błażej Miasojedow
;
Mateusz Krzysztof Ła˛cki
1
Institute of Informatics, University of Warsaw, Banacha 2,
02-097 Warsaw, Poland
2
Institute of Applied Mathematics, University of Warsaw,
Banacha 2, 02-097 Warsaw, Poland
where π is a probability density function, which can be evaluated point-wise up to a normalising constant. Such an integral
occurs frequently when computing Bayesian posterior expectations (Robert and Casella 1999; Gilks et al. 1998).
The random walk Metropolis algorithm (Metropolis et al.
1953) often works well, provided the target density π is,
roughly speaking, sufficiently close to unimodal. The efficiency of the Metropolis algorithm can be optimised by a
suitable choice of proposal distribution. These, in turn, can
be chosen automatically by several adaptive MCMC algorithms; see Haario et al. (2001), Atchadé and Rosenthal
(2005), Roberts and Rosenthal (2009), Andrieu and Thoms
(2008) and references therein.
When π has multiple well-separated modes, the random
walk-based methods tend to stuck in a single mode for long
periods of time. It can lead to false convergence and severely
erroneous results. Using a tailored Metropolis-Hastings algorithm can help, but, in many cases, finding a good proposal
distribution is not easy. Tempering of π , that is, considering auxiliary distributions with density proportional to π β
with β ∈ (0, 1), often provides better mixing between modes
(Swendsen and Wang 1986; Marinari and Parisi 1992; Hansmann 1997; Woodard et al. 2009; Neal 1996).
We focus here particularly on the parallel tempering algorithm, which is also known as the replica exchange Monte
Carlo and the Metropolis-coupled Markov chain Monte
Carlo.
The tempering approach is particularly tempting in such
settings where π admits a physical interpretation, and there
is good intuition how to choose the temperature schedule for
the algorithm.
In general, choosing the temperature schedule is a nontrivial task, but there are generic guidelines for temperature
selection based on both empirical findings and theoretical
analysis (Kofke 2002; Kone and Kofke 2005; Atchadé et al.
123
Stat Comput
2011; Roberts and Rosenthal 2012). These theoretical findings were used to derive adaptive version of the Parallel
Tempering (Miasojedow et al. 2013a). Another approach to
temperature tuning can be found in (Behrens et al. 2012). This
approach offers a different criterion for choosing temperature schedule and is developed for the Tempered Transitions
algorithm (Neal 1996).
In the present paper we consider the adaptive version of the
Parallel Tempering algorithm. The adaption consists in introducing state-dependent swaps between differently tempered
random walks. We study the impact of different distributions on potential steps and call them Strategies. Our choice
of strategies is driven by solutions already known to the
literature (Kou et al. 2006) and used within Parallel Tempering algorithm by Baragatti et al. (2013). The novelty of
our approach stems from an alternative implementation of
Equi Energy moves that renders the algorithm parameters
free, i.e. the user does not need to provide precise Energy
Rings any more. We also investigate different modifications
of this new approach.
We also propose an automated method for reducing the
actual number of considered temperatures, in the spirit
of Miasojedow et al. (2013a). The temperature adaptation
scheme depends on the parameters of the adaptive random
walks applied in the parallelised Metropolis-Hastings stage
of the algorithm in case when the state space amounts to be
the usual Rd .
We have also showed that the proposed algorithm satisfies
the Law of Large numbers, in the same setting as in Miasojedow et al. (2013a).
2 Definition and notations
Our basic object of interest is the density π : Ω → R+ ,
where Ω = Rd . We assume we can evaluate point-wise a
function that is proportional to π by some constant. The Parallel Tempering approach suggests to construct a Markov
chain on the product space Ω L , where L is the number of
temperature levels. On that space a new density π β is constructed by posing for x ∈ Ω L
π β (x) = π β (x1 , . . . , x L ) ∝ π β1 (x1 ) × · · · × π β L (x L )
so that xi ∈ Ω, β = β1 , β2 , . . . , β L are the inverse temperatures subject to 1 = β1 > β2 > · · · > β L .
Vector T = (T1 , . . . , TL ), where T = β−1 , contains
numbers known as temperatures and is itself referred to as
the temperature scheme.
Density π β is known up to proportionality factor and by
marginalising it w.r. to the first coordinate we retrieve the
original distribution π .
123
Markov chain X = {X [k] }k≥0 targets π β and can be
thought of as L coordinate chains, X [k] = X 1[k] , . . . , X [k]
L .
First coordinate chain will be referred to as the base (temperature) chain.
The main idea behind Parallel Tempering is to interweave
random walk steps with random swaps between chains. Each
random swap exchanges results of a random walk step from
two coordinate chains. Chains corresponding to higher temperatures1 should, in principle, be more volatile and travel
between different modes more easily than chains linked to
lower temperatures. For if x is the last visited place by the
lth chain, and y is a proposal drawn from a region where the
density assumes smaller values, π(y) < π(x), then the probability of accepting such proposal, that we call η , is higher
on the more tempered chain2
η (x, y) = 1 ∧
π(y)
π(x)
βl
>1∧
π(y)
= η1 (x, y).
π(x)
Therefore, the more exchanges of higher tempered chains
with the base chain, the bigger the chance of getting out
from a local probability cluster where a simple Markov chain
would stuck.
The generation of Markov (...truncated)