Branes with brains: exploring string vacua with deep reinforcement learning

Jun 2019

Abstract We propose deep reinforcement learning as a model-free method for exploring the landscape of string vacua. As a concrete application, we utilize an artificial intelligence agent known as an asynchronous advantage actor-critic to explore type IIA compactifications with intersecting D6-branes. As different string background configurations are explored by changing D6-brane configurations, the agent receives rewards and punishments related to string consistency conditions and proximity to Standard Model vacua. These are in turn utilized to update the agent’s policy and value neural networks to improve its behavior. By reinforcement learning, the agent’s performance in both tasks is significantly improved, and for some tasks it finds a factor of \( \mathcal{O}(200) \) more solutions than a random walker. In one case, we demonstrate that the agent learns a human-derived strategy for finding consistent string models. In another case, where no human-derived strategy exists, the agent learns a genuinely new strategy that achieves the same goal twice as efficiently per unit time. Our results demonstrate that the agent learns to solve various string theory consistency conditions simultaneously, which are phrased in terms of non-linear, coupled Diophantine equations.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2FJHEP06%282019%29003.pdf

Branes with brains: exploring string vacua with deep reinforcement learning

Published for SISSA by Springer Received: April 4, 2019 Accepted: May 26, 2019 Published: June 3, 2019 James Halverson,a Brent Nelsona and Fabian Ruehleb,c a Department of Physics, Northeastern University, Boston, MA 02115, U.S.A. b CERN, CERN, Theoretical Physics Department, 1 Esplanade des Particules, Geneva 23, CH-1211, Switzerland c Rudolf Peierls Centre for Theoretical Physics, Oxford University, 1 Keble Road, Oxford, OX1 3NP, U.K. E-mail: , , Abstract: We propose deep reinforcement learning as a model-free method for exploring the landscape of string vacua. As a concrete application, we utilize an artificial intelligence agent known as an asynchronous advantage actor-critic to explore type IIA compactifications with intersecting D6-branes. As different string background configurations are explored by changing D6-brane configurations, the agent receives rewards and punishments related to string consistency conditions and proximity to Standard Model vacua. These are in turn utilized to update the agent’s policy and value neural networks to improve its behavior. By reinforcement learning, the agent’s performance in both tasks is significantly improved, and for some tasks it finds a factor of O(200) more solutions than a random walker. In one case, we demonstrate that the agent learns a human-derived strategy for finding consistent string models. In another case, where no human-derived strategy exists, the agent learns a genuinely new strategy that achieves the same goal twice as efficiently per unit time. Our results demonstrate that the agent learns to solve various string theory consistency conditions simultaneously, which are phrased in terms of non-linear, coupled Diophantine equations. Keywords: Superstring Vacua, D-branes ArXiv ePrint: 1903.11616 Open Access, c The Authors. Article funded by SCOAP3 . https://doi.org/10.1007/JHEP06(2019)003 JHEP06(2019)003 Branes with brains: exploring string vacua with deep reinforcement learning Contents 2 2 Basics of reinforcement learning 2.1 Classic solutions to markov decision processes 2.2 Deep reinforcement learning 2.2.1 Value function approximation 2.2.2 Policy gradients 2.2.3 Actor-critic methods 2.2.4 Asynchronous advantage actor-critics (A3C) 5 8 10 11 12 13 14 3 The environment for type IIA string theory 3.1 IIA Z2 × Z2 orbifold 3.2 Truncated IIA Z2 × Z2 orbifold 3.2.1 Truncating state and action space 3.2.2 The Douglas-Taylor truncation 3.3 Different views on the landscape: environment implementation 3.3.1 The stacking environment 3.3.2 The flipping environment 3.3.3 The one-in-a-billion search environments 3.3.4 Comparison of environments 3.4 A3C implementation via OpenAI Gym and ChainerRL 15 17 19 19 22 24 24 25 27 28 29 4 Systematic reinforcement learning and landscape exploration 4.1 Reward functions 4.2 SUSY conditions and constrained quadratic programming 4.3 Neural network architecture 4.4 Learning to solve string consistency conditions 4.5 Learning a human-derived strategy: filler branes 4.6 Systematic RL stacking agent vs. random agent experiments 4.7 Additional stacking agent experiments 4.8 Flipping and one-in-a-billion agents 4.9 Comparison with earlier work 31 31 35 36 37 40 42 48 49 51 5 Discussion and summary 53 A Value sets for reward functions 55 –1– JHEP06(2019)003 1 Introduction 1 Introduction • Supervised learning: perhaps the best-known type of machine learning is learning that is supervised. Labelled training data is used to create a model that accurately predicts outputs given inputs, including tests on unseen data that is not used in training the model. Supervised learning makes up the bulk of the work thus far on machine learning in string theory. In [12] it was shown that genetic algorithms can be utilized to 1 The number of weak Fano toric fourfolds that give rise to smooth Calabi-Yau threefold hypersurfaces was recently estimated [5] to be 1010,000 , but it is not clear how many of the threefolds are distinct. –2– JHEP06(2019)003 String theory is a theory of quantum gravity that has shed light on numerous aspects of theoretical physics in recent decades, bringing new light to old problems and influencing a diverse array of fields, from condensed matter physics to pure mathematics. As a theory of quantum gravity it is also a natural candidate for unifying known particle physics and cosmology. The proposition is strengthened by the low energy degrees of freedom that arise in string theory, which resemble the basic building blocks of Nature, but is made difficult by the vast number of solutions of string theory, which arrange the degrees of freedom in diverse ways and give rise to different laws of physics. This vast number of solutions is the landscape of string vacua, which, if correct, implies that fundamental physics is itself a complex system. Accordingly, studies of the string landscape are faced with difficulties that arise in other complex systems. These include not only the solutions themselves, which limit computation by virtue of their number, but also tasks that are necessary to understand the physics of the solutions, which hamper computation by virtue of their complexity. As examples of large numbers of solutions, original estimates of the existence of at least 10500 flux vacua [1] have ballooned in recent years to 10272,000 flux vacua [2] on a fixed geometry. Furthermore, the number of geometries has also grown, with an exact lower bound [3] of 10755 on the number of F-theory geometries, which Monte Carlo estimates demonstrate is likely closer to 103000 in the toric case [4].1 In fact, in 1986 it was already anticipated [6] that there are over 101500 consistent chiral heterotic compactifications. As examples of complexity, finding small cosmological constants in the Bousso-Polchinski model is NP-complete [7], constructing scalar potentials in string theory and finding minima are both computationally hard [8], and the diversity of Diophantine equations that arise in string theory (for instance, in index calculations) raises the issue of undecidability in the landscape [9] by analogy to the negative solution to Hilbert’s 10th problem. Finally, in addition to difficulties posed by size and complexity, there are also critical formal issues related to the lack of a complete definition of string theory and M-theory. Formal progress is therefore also necessary for fully understanding the landscape. For these reasons, in recent years it has been proposed to use techniques from data science, machine learning, and artificial intelligence to understand string theory broadly, and string vacua in particular, beginning with [10–13]. Numerous techniques from two of the three canonical types of machine learning have been applied to a variety physical problems: • Unsupervised learning: another type of learning is unsupervised. In this case data is not labelled, but the algorithm attempts to learn features that describe correlations between data points. St (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2FJHEP06%282019%29003.pdf
Article home page: https://link.springer.com/article/10.1007/JHEP06%282019%29003

James Halverson, Brent Nelson, Fabian Ruehle. Branes with brains: exploring string vacua with deep reinforcement learning, 2019, pp. 3, Volume 2019, Issue 6, DOI: 10.1007/JHEP06(2019)003