Adversarial robustness guarantees for quantum classifiers

npj Quantum Information, Jan 2026

Despite their ever more widespread deployment throughout society, machine learning algorithms remain critically vulnerable to being spoofed by subtle adversarial tampering with their input data. The prospect of near-term quantum computers being capable of running quantum machine learning (QML) algorithms has therefore generated intense interest in their adversarial vulnerability. Here we show that quantum properties of QML algorithms can confer fundamental protections against such attacks, in certain scenarios guaranteeing robustness against classically-armed adversaries. We leverage tools from many-body physics to identify the quantum sources of this protection. Our results offer a theoretical underpinning of recent evidence which suggest quantum advantages in the search for adversarial robustness. In particular, we prove that quantum classifiers are: (i) protected against weak perturbations of data drawn from the trained distribution, (ii) protected against local attacks if they are insufficiently scrambling, and (iii) show evidence that they are protected against universal adversarial attacks if they are sufficiently chaotic. Our analytic results are supported by numerical evidence demonstrating the applicability of our theorems and the resulting robustness of a quantum classifier in practice. This line of inquiry constitutes a concrete pathway to advantage in QML, orthogonal to the usually sought improvements in model speed or accuracy.

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41534-025-01129-3.pdf

Adversarial robustness guarantees for quantum classifiers

npj | quantum information Article Published in partnership with The University of New South Wales https://doi.org/10.1038/s41534-025-01129-3 Adversarial robustness guarantees for quantum classifiers Check for updates 1234567890():,; 1234567890():,; Neil Dowling 1,2,6 , Maxwell T. West Muhammad Usman3,4 & Kavan Modi 3,6 , Angus Southwell2, Azar C. Nakhl3, Martin Sevior 3 , 2,5 Despite their ever more widespread deployment throughout society, machine learning algorithms remain critically vulnerable to being spoofed by subtle adversarial tampering with their input data. The prospect of near-term quantum computers being capable of running quantum machine learning (QML) algorithms has therefore generated intense interest in their adversarial vulnerability. Here we show that quantum properties of QML algorithms can confer fundamental protections against such attacks, in certain scenarios guaranteeing robustness against classically-armed adversaries. We leverage tools from many-body physics to identify the quantum sources of this protection. Our results offer a theoretical underpinning of recent evidence which suggest quantum advantages in the search for adversarial robustness. In particular, we prove that quantum classifiers are: (i) protected against weak perturbations of data drawn from the trained distribution, (ii) protected against local attacks if they are insufficiently scrambling, and (iii) show evidence that they are protected against universal adversarial attacks if they are sufficiently chaotic. Our analytic results are supported by numerical evidence demonstrating the applicability of our theorems and the resulting robustness of a quantum classifier in practice. This line of inquiry constitutes a concrete pathway to advantage in QML, orthogonal to the usually sought improvements in model speed or accuracy. Ten years on from their initial discovery1–3, adversarial attacks remain a potent weapon for deceiving even highly sophisticated machine learning (ML) models4. Remarkably, for example, powerful image classifiers can be fooled by carefully chosen perturbations which are almost invisible to a human eye5, or even by changing the value of a single pixel6. Due to the accelerating delegation of important tasks to ML, and the tendency of empirical defense strategies to be later bypassed7, the need for provable guarantees against such spoofing attempts is only growing8,9. Concurrently, the increasing capabilities of quantum computers have generated significant research to determine whether quantum advantage may be expected in machine learning10–13, but the extent to which they can be expected to deliver direct speed-ups remains unclear13–23. It is therefore an opportune moment to search for a different kind of advantage in QML24,25. In fact, the field of quantum adversarial machine learning has generated considerable interest24,26–39. Notably, in a series of recent papers, QML models were studied that indicated significantly increased adversarial robustness against classical adversaries34–37 (Fig. 1(a)). However, these results are empirical, lacking a foundational understanding of the source of the advantage. In this work we address this by supplying a sequence of provable quantum adversarial robustness guarantees for QML, in extremely broad yet practically relevant scenarios. These rely on distinct properties of the encoding scheme, as well as on the dynamical complexity of the constituent quantum circuit. Our results include analytic theorems relying on the genuinely quantum properties of a QML architecture, offering robustness guarantees not applicable to classical ML. These are further supported with probabilistic bounds and numerical results for a realistic quantum classifier model. These guarantees circumvent previous existence proofs of adversarial examples in QML27,31, by restricting to the physically relevant case of a classical adversary whose allowable perturbations are constrained by the data encoding strategy employed by the model. More specifically, we study the robustness of QML models under three distinct attack scenarios: a weak perturbation designed to induce a misclassification for a target input classical state (data), a strong universal perturbation40,41 designed to induce 1 Institut für Theoretische Physik, Universität zu Köln, Zülpicher Strasse 77, 50937 Köln, Germany. 2School of Physics & Astronomy, Monash University, Clayton, VIC, 3800, Australia. 3School of Physics, The University of Melbourne, Parkville, VIC, 3010, Australia. 4Data61, CSIRO, Clayton, 3168 VIC, Australia. 5Science, Mathematics and Technology Cluster, Singapore University of Technology and Design, 8 Somapah Road, 487372 Singapore, Singapore. 6These authors e-mail: ; ; contributed equally: Neil Dowling, Maxwell T. West. npj Quantum Information | (2026)12:16 1 Article https://doi.org/10.1038/s41534-025-01129-3 Fig. 1 | Schematic of adversarial machine learning setting. a Machine learning models are generally highly susceptible to extremely subtle adversarial tampering with their input data, but quantum models have been empirically found to be robust to attacks by classical adversaries35. In the general  quantum machine learning setting, a classical data string x is encoded in a state ψðxÞ , a (trained) quantum algorithm Uθ is applied before measurement of some few-qubit operator Z. An adversarial attack can then be modeled by some change to the initial bit string x → x + ϵw, which is Table 1 | Summary of robustness guarantees Amplitude Angle Dense Arbitrary Weak (Thm. 1) Local (Thm. 1 & 2) ✓ ✓ pffiffiffiffi ϵ ≲ 1= N pffiffiffiffi ϵ ≲ 1= N   ϵ ≲ Δx=Δψ  OTOC ≪ 1 Quantum Scrambling Universal (Thm. 3) – ✓ ✓ – Chaotic The applicability of our theorems, which depend on both the attack strategy and the form of data encoding, x 2 RN 7!ψðxÞ ¼ EðxÞj0ih0jE y ðxÞ. ϵ denotes the ℓ∞ norm of the adversarial perturbation. In some cases, our results apply unconditionally (denoted by a tick) while in others there is a specified dependence on the details of the encoding. Non-applicability is denoted by a dash. In the bottom row, we record the property of the model (qualitatively) responsible for the guarantee: “Quantum” refers to the contractive nature of any quantum classifier (e.g. a unitary circuit), “Scrambling” refers to a quickly decaying out-of-time-ordered correlator (OTOC) [Eq. (12)], while by “Chaotic” we mean a linearly-growing local-operator entanglement (LOE) [Eq. (14)]. We also note that the ticks in the righthand column are based on a conjecture, supported by numerical evidence and analytic results under a stronger condition than the most general universal adversarial attack (see Eqs. (13) and (16)). npj Quantum Information | (2026)12:16 equivalent to the action of a unitary W on the encoded state, jx0 i ¼ W jxi. b Chaotic unitaries scramble information throughout quantum degrees of freedom in a manybody system. c It is difficult for an adversary to carefully manipulate (...truncated)


This is a preview of a remote PDF: https://www.nature.com/articles/s41534-025-01129-3.pdf
Article home page: https://www.nature.com/articles/s41534-025-01129-3

Dowling, Neil, West, Maxwell T., Southwell, Angus, Nakhl, Azar C., Sevior, Martin, Usman, Muhammad, Modi, Kavan. Adversarial robustness guarantees for quantum classifiers, npj Quantum Information, 2026, DOI: 10.1038/s41534-025-01129-3