Adversarial robustness guarantees for quantum classifiers
npj | quantum information
Article
Published in partnership with The University of New South Wales
https://doi.org/10.1038/s41534-025-01129-3
Adversarial robustness guarantees for
quantum classifiers
Check for updates
1234567890():,;
1234567890():,;
Neil Dowling 1,2,6 , Maxwell T. West
Muhammad Usman3,4 & Kavan Modi
3,6
, Angus Southwell2, Azar C. Nakhl3, Martin Sevior
3
,
2,5
Despite their ever more widespread deployment throughout society, machine learning algorithms
remain critically vulnerable to being spoofed by subtle adversarial tampering with their input data. The
prospect of near-term quantum computers being capable of running quantum machine learning (QML)
algorithms has therefore generated intense interest in their adversarial vulnerability. Here we show that
quantum properties of QML algorithms can confer fundamental protections against such attacks, in
certain scenarios guaranteeing robustness against classically-armed adversaries. We leverage tools
from many-body physics to identify the quantum sources of this protection. Our results offer a
theoretical underpinning of recent evidence which suggest quantum advantages in the search for
adversarial robustness. In particular, we prove that quantum classifiers are: (i) protected against weak
perturbations of data drawn from the trained distribution, (ii) protected against local attacks if they are
insufficiently scrambling, and (iii) show evidence that they are protected against universal adversarial
attacks if they are sufficiently chaotic. Our analytic results are supported by numerical evidence
demonstrating the applicability of our theorems and the resulting robustness of a quantum classifier in
practice. This line of inquiry constitutes a concrete pathway to advantage in QML, orthogonal to the
usually sought improvements in model speed or accuracy.
Ten years on from their initial discovery1–3, adversarial attacks remain a
potent weapon for deceiving even highly sophisticated machine learning
(ML) models4. Remarkably, for example, powerful image classifiers can be
fooled by carefully chosen perturbations which are almost invisible to a
human eye5, or even by changing the value of a single pixel6. Due to the
accelerating delegation of important tasks to ML, and the tendency of
empirical defense strategies to be later bypassed7, the need for provable
guarantees against such spoofing attempts is only growing8,9.
Concurrently, the increasing capabilities of quantum computers have
generated significant research to determine whether quantum advantage
may be expected in machine learning10–13, but the extent to which they can be
expected to deliver direct speed-ups remains unclear13–23. It is therefore an
opportune moment to search for a different kind of advantage in QML24,25.
In fact, the field of quantum adversarial machine learning has generated
considerable interest24,26–39. Notably, in a series of recent papers, QML
models were studied that indicated significantly increased adversarial
robustness against classical adversaries34–37 (Fig. 1(a)). However, these
results are empirical, lacking a foundational understanding of the source of
the advantage.
In this work we address this by supplying a sequence of provable
quantum adversarial robustness guarantees for QML, in extremely broad
yet practically relevant scenarios. These rely on distinct properties of the
encoding scheme, as well as on the dynamical complexity of the constituent
quantum circuit. Our results include analytic theorems relying on the
genuinely quantum properties of a QML architecture, offering robustness
guarantees not applicable to classical ML. These are further supported with
probabilistic bounds and numerical results for a realistic quantum classifier
model. These guarantees circumvent previous existence proofs of adversarial examples in QML27,31, by restricting to the physically relevant case of a
classical adversary whose allowable perturbations are constrained by the
data encoding strategy employed by the model. More specifically, we study
the robustness of QML models under three distinct attack scenarios: a weak
perturbation designed to induce a misclassification for a target input classical state (data), a strong universal perturbation40,41 designed to induce
1
Institut für Theoretische Physik, Universität zu Köln, Zülpicher Strasse 77, 50937 Köln, Germany. 2School of Physics & Astronomy, Monash University, Clayton,
VIC, 3800, Australia. 3School of Physics, The University of Melbourne, Parkville, VIC, 3010, Australia. 4Data61, CSIRO, Clayton, 3168 VIC, Australia. 5Science,
Mathematics and Technology Cluster, Singapore University of Technology and Design, 8 Somapah Road, 487372 Singapore, Singapore. 6These authors
e-mail: ; ;
contributed equally: Neil Dowling, Maxwell T. West.
npj Quantum Information | (2026)12:16
1
Article
https://doi.org/10.1038/s41534-025-01129-3
Fig. 1 | Schematic of adversarial machine learning setting. a Machine learning
models are generally highly susceptible to extremely subtle adversarial tampering
with their input data, but quantum models have been empirically found to be robust
to attacks by classical adversaries35. In the general
quantum machine learning setting,
a classical data string x is encoded in a state ψðxÞ , a (trained) quantum algorithm Uθ
is applied before measurement of some few-qubit operator Z. An adversarial attack
can then be modeled by some change to the initial bit string x → x + ϵw, which is
Table 1 | Summary of robustness guarantees
Amplitude
Angle
Dense
Arbitrary
Weak (Thm. 1)
Local (Thm. 1 & 2)
✓
✓
pffiffiffiffi
ϵ ≲ 1= N
pffiffiffiffi
ϵ ≲ 1= N
ϵ ≲ Δx=Δψ
OTOC ≪ 1
Quantum
Scrambling
Universal
(Thm. 3)
–
✓
✓
–
Chaotic
The applicability of our theorems, which depend on both the attack strategy and the form of data
encoding, x 2 RN 7!ψðxÞ ¼ EðxÞj0ih0jE y ðxÞ. ϵ denotes the ℓ∞ norm of the adversarial perturbation. In
some cases, our results apply unconditionally (denoted by a tick) while in others there is a specified
dependence on the details of the encoding. Non-applicability is denoted by a dash. In the bottom
row, we record the property of the model (qualitatively) responsible for the guarantee: “Quantum”
refers to the contractive nature of any quantum classifier (e.g. a unitary circuit), “Scrambling” refers
to a quickly decaying out-of-time-ordered correlator (OTOC) [Eq. (12)], while by “Chaotic” we mean a
linearly-growing local-operator entanglement (LOE) [Eq. (14)]. We also note that the ticks in the righthand column are based on a conjecture, supported by numerical evidence and analytic results under
a stronger condition than the most general universal adversarial attack (see Eqs. (13) and (16)).
npj Quantum Information | (2026)12:16
equivalent to the action of a unitary W on the encoded state, jx0 i ¼ W jxi. b Chaotic
unitaries scramble information throughout quantum degrees of freedom in a manybody system. c It is difficult for an adversary to carefully manipulate (...truncated)