Theoretical guarantees for permutation-equivariant quantum neural networks
www.nature.com/npjqi
ARTICLE
OPEN
Theoretical guarantees for permutation-equivariant quantum
neural networks
Louis Schatzki
1,2,6 ✉
, Martín Larocca3,4,6, Quynh T. Nguyen
3,5
, Frédéric Sauvage3 and M. Cerezo
1✉
Despite the great promise of quantum machine learning models, there are several challenges one must overcome before unlocking
their full potential. For instance, models based on quantum neural networks (QNNs) can suffer from excessive local minima and
barren plateaus in their training landscapes. Recently, the nascent field of geometric quantum machine learning (GQML) has
emerged as a potential solution to some of those issues. The key insight of GQML is that one should design architectures, such as
equivariant QNNs, encoding the symmetries of the problem at hand. Here, we focus on problems with permutation symmetry (i.e.,
symmetry group Sn), and show how to build Sn-equivariant QNNs We provide an analytical study of their performance, proving that
they do not suffer from barren plateaus, quickly reach overparametrization, and generalize well from small amounts of data. To
verify our results, we perform numerical simulations for a graph state classification task. Our work provides theoretical guarantees
for equivariant QNNs, thus indicating the power and potential of GQML.
1234567890():,;
npj Quantum Information (2024)10:12 ; https://doi.org/10.1038/s41534-024-00804-1
INTRODUCTION
Symmetry studies and formalizes the invariance of objects under
some set of operations. A wealth of theory has gone into
describing symmetries as mathematical entities through the
concept of groups and representations. While the analysis of
symmetries in nature has greatly improved our understanding of
the laws of physics, the study of symmetries in data has just
recently gained momentum within the framework of learning
theory. In the past few years, classical machine learning
practitioners realized that models tend to perform better when
constrained to respect the underlying symmetries of the data. This
has led to the blossoming field of geometric deep learning1–5,
where symmetries are incorporated as geometric priors into the
learning architectures, improving trainability and generalization
performance6–13.
The tremendous success of geometric deep learning has
recently inspired researchers to import these ideas to the realm
of quantum machine learning (QML)14–16. QML is a new and
exciting field at the intersection of classical machine learning, and
quantum computing. By running routines in quantum hardware,
and thus exploiting the exponentially large dimension of the
Hilbert space, the hope is that QML algorithms can outperform
their classical counterparts when learning from data17.
The infusion of ideas from geometric deep learning to QML has
been termed ‘geometric quantum machine learning’ (GQML)18–24.
GQML leverages the machinery of group and representation
theory25 to build quantum architectures that encode symmetry
information about the problem at hand. For instance, when the
model is parametrized through a quantum neural network
(QNN)16,26–28, GQML indicates that the layers of the QNN should
be equivariant under the action of the symmetry group associated
to the dataset. That is, applying a symmetry transformation on the
input to the QNN layers should be the same as applying it to its
output.
One of the main goals of GQML is to create architectures that
solve, or at least significantly mitigate, some of the known issues
of standard symmetry non-preserving QML models16. For instance,
it has been shown that the optimization landscapes of generic
QNNs can exhibit a large number of local minima29–32, or be prone
to the barren plateau phenomenon33–45 whereby the loss function
gradients vanish exponentially with the problem size. Crucially, it
is known that barren plateaus and excessive local minima are
connected to the expressibility30,32,37,43,46 of the QNN, so that
problem-agnostic architectures are more likely to exhibit trainability issues. In this sense, it is expected that following the GQML
program of baking symmetry directly into the algorithm, will lead
to models with sharp inductive biases that suitably limit their
expressibility and search space.
In this work, we leverage the GQML toolbox to create models
that are permutation invariant, i.e., models whose outputs remain
invariant under the action of the symmetric group Sn (see Fig. 1).
We focus on this particular symmetry as learning problems with
permutation symmetries abound. Examples include learning over
sets of elements47,48, modeling relations between pairs
(graphs)49–54 or multiplets (hypergraphs) of entities55–57, problems
defined on grids (such as condensed matter systems)58–61,
molecular systems62–64, evaluating genuine multipartite entanglement65–68, or working with distributed quantum sensors69–71.
Our first contribution is to provide guidelines to build unitary Snequivariant QNNs. We then derive rigorous theoretical guarantees
for these architectures in terms of their trainability and generalization capabilities. Specifically, we prove that Sn-equivariant QNNs
do not lead to barren plateaus, can be overparametrized with
polynomially deep circuits, and generalize well with only a
polynomial number of training points. We also identify problems
(i.e., datasets) for which the model is trainable, but also datasets
leading to untrainability. All these appealing properties are also
demonstrated in numerical simulations of a graph classification
1
Information Sciences, Los Alamos National Laboratory, Los Alamos, NM 87545, USA. 2Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL
61801, USA. 3Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA. 4Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM
87545, USA. 5Harvard Quantum Initiative, Harvard University, Cambridge, ML 02138, USA. 6These authors contributed equally: Louis Schatzki, Martín Larocca.
✉email: ;
Published in partnership with The University of New South Wales
L. Schatzki et al.
2
will restrict to L-layered QNNs
U θ ¼ U LθL U 1θ1 ; where U lθl ðρÞ ¼ eiθl Hl ρeiθl Hl ;
QL
(1)
iθl Hl
.
for some Hermitian generators {Hl}, so that UðθÞ ¼ l¼1 e
Moreover, we consider models that depend on a loss function of
the form
ℓθ ðρi Þ ¼ Tr½U θ ðρi ÞO;
(2)
where O is a Hermitian observable. We quantify the training
error via the so-called empirical loss, or training error, which is
defined as
b
LðθÞ
¼
M
X
ci ℓθ ðρi Þ:
(3)
1234567890():,;
i¼1
Fig. 1 GQML embeds geometric priors into a QML model.
Incorporating prior knowledge through Sn-equivariance heavily
restricts the search space of the model. We show that such
inductive biases lead to models that do not exhibit barren plateaus,
can be efficiently overparametrized, and require small amounts of
data to generalize well.
task. Our empirical results verify our theoretical ones, and even
show that the pe (...truncated)