Branes with brains: exploring string vacua with deep reinforcement learning
Published for SISSA by
Springer
Received: April 4, 2019
Accepted: May 26, 2019
Published: June 3, 2019
James Halverson,a Brent Nelsona and Fabian Ruehleb,c
a
Department of Physics, Northeastern University,
Boston, MA 02115, U.S.A.
b
CERN, CERN, Theoretical Physics Department,
1 Esplanade des Particules, Geneva 23, CH-1211, Switzerland
c
Rudolf Peierls Centre for Theoretical Physics, Oxford University,
1 Keble Road, Oxford, OX1 3NP, U.K.
E-mail: , ,
Abstract: We propose deep reinforcement learning as a model-free method for exploring
the landscape of string vacua. As a concrete application, we utilize an artificial intelligence
agent known as an asynchronous advantage actor-critic to explore type IIA compactifications with intersecting D6-branes. As different string background configurations are explored by changing D6-brane configurations, the agent receives rewards and punishments
related to string consistency conditions and proximity to Standard Model vacua. These
are in turn utilized to update the agent’s policy and value neural networks to improve its
behavior. By reinforcement learning, the agent’s performance in both tasks is significantly
improved, and for some tasks it finds a factor of O(200) more solutions than a random
walker. In one case, we demonstrate that the agent learns a human-derived strategy for
finding consistent string models. In another case, where no human-derived strategy exists,
the agent learns a genuinely new strategy that achieves the same goal twice as efficiently
per unit time. Our results demonstrate that the agent learns to solve various string theory
consistency conditions simultaneously, which are phrased in terms of non-linear, coupled
Diophantine equations.
Keywords: Superstring Vacua, D-branes
ArXiv ePrint: 1903.11616
Open Access, c The Authors.
Article funded by SCOAP3 .
https://doi.org/10.1007/JHEP06(2019)003
JHEP06(2019)003
Branes with brains: exploring string vacua with deep
reinforcement learning
Contents
2
2 Basics of reinforcement learning
2.1 Classic solutions to markov decision processes
2.2 Deep reinforcement learning
2.2.1 Value function approximation
2.2.2 Policy gradients
2.2.3 Actor-critic methods
2.2.4 Asynchronous advantage actor-critics (A3C)
5
8
10
11
12
13
14
3 The environment for type IIA string theory
3.1 IIA Z2 × Z2 orbifold
3.2 Truncated IIA Z2 × Z2 orbifold
3.2.1 Truncating state and action space
3.2.2 The Douglas-Taylor truncation
3.3 Different views on the landscape: environment implementation
3.3.1 The stacking environment
3.3.2 The flipping environment
3.3.3 The one-in-a-billion search environments
3.3.4 Comparison of environments
3.4 A3C implementation via OpenAI Gym and ChainerRL
15
17
19
19
22
24
24
25
27
28
29
4 Systematic reinforcement learning and landscape exploration
4.1 Reward functions
4.2 SUSY conditions and constrained quadratic programming
4.3 Neural network architecture
4.4 Learning to solve string consistency conditions
4.5 Learning a human-derived strategy: filler branes
4.6 Systematic RL stacking agent vs. random agent experiments
4.7 Additional stacking agent experiments
4.8 Flipping and one-in-a-billion agents
4.9 Comparison with earlier work
31
31
35
36
37
40
42
48
49
51
5 Discussion and summary
53
A Value sets for reward functions
55
–1–
JHEP06(2019)003
1 Introduction
1
Introduction
• Supervised learning: perhaps the best-known type of machine learning is learning
that is supervised. Labelled training data is used to create a model that accurately
predicts outputs given inputs, including tests on unseen data that is not used in
training the model.
Supervised learning makes up the bulk of the work thus far on machine learning
in string theory. In [12] it was shown that genetic algorithms can be utilized to
1
The number of weak Fano toric fourfolds that give rise to smooth Calabi-Yau threefold hypersurfaces
was recently estimated [5] to be 1010,000 , but it is not clear how many of the threefolds are distinct.
–2–
JHEP06(2019)003
String theory is a theory of quantum gravity that has shed light on numerous aspects of
theoretical physics in recent decades, bringing new light to old problems and influencing a
diverse array of fields, from condensed matter physics to pure mathematics. As a theory
of quantum gravity it is also a natural candidate for unifying known particle physics and
cosmology. The proposition is strengthened by the low energy degrees of freedom that arise
in string theory, which resemble the basic building blocks of Nature, but is made difficult
by the vast number of solutions of string theory, which arrange the degrees of freedom in
diverse ways and give rise to different laws of physics.
This vast number of solutions is the landscape of string vacua, which, if correct, implies that fundamental physics is itself a complex system. Accordingly, studies of the string
landscape are faced with difficulties that arise in other complex systems. These include not
only the solutions themselves, which limit computation by virtue of their number, but also
tasks that are necessary to understand the physics of the solutions, which hamper computation by virtue of their complexity. As examples of large numbers of solutions, original
estimates of the existence of at least 10500 flux vacua [1] have ballooned in recent years to
10272,000 flux vacua [2] on a fixed geometry. Furthermore, the number of geometries has
also grown, with an exact lower bound [3] of 10755 on the number of F-theory geometries,
which Monte Carlo estimates demonstrate is likely closer to 103000 in the toric case [4].1
In fact, in 1986 it was already anticipated [6] that there are over 101500 consistent chiral
heterotic compactifications. As examples of complexity, finding small cosmological constants in the Bousso-Polchinski model is NP-complete [7], constructing scalar potentials
in string theory and finding minima are both computationally hard [8], and the diversity
of Diophantine equations that arise in string theory (for instance, in index calculations)
raises the issue of undecidability in the landscape [9] by analogy to the negative solution
to Hilbert’s 10th problem. Finally, in addition to difficulties posed by size and complexity,
there are also critical formal issues related to the lack of a complete definition of string
theory and M-theory. Formal progress is therefore also necessary for fully understanding
the landscape.
For these reasons, in recent years it has been proposed to use techniques from data science, machine learning, and artificial intelligence to understand string theory broadly, and
string vacua in particular, beginning with [10–13]. Numerous techniques from two of the
three canonical types of machine learning have been applied to a variety physical problems:
• Unsupervised learning: another type of learning is unsupervised. In this case data is
not labelled, but the algorithm attempts to learn features that describe correlations
between data points.
St (...truncated)