Abstract representations emerge naturally in neural networks trained to perform multiple tasks (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41467-023-36583-0.pdf

Abstract representations emerge naturally in neural networks trained to perform multiple tasks

Article https://doi.org/10.1038/s41467-023-36583-0 Abstract representations emerge naturally in neural networks trained to perform multiple tasks Received: 14 June 2022 W. Jeffrey Johnston 1,2 & Stefano Fusi 1,2 Check for updates 1234567890():,; 1234567890():,; Accepted: 7 February 2023 Humans and other animals demonstrate a remarkable ability to generalize knowledge across distinct contexts and objects during natural behavior. We posit that this ability to generalize arises from a speciﬁc representational geometry, that we call abstract and that is referred to as disentangled in machine learning. These abstract representations have been observed in recent neurophysiological studies. However, it is unknown how they emerge. Here, using feedforward neural networks, we demonstrate that the learning of multiple tasks causes abstract representations to emerge, using both supervised and reinforcement learning. We show that these abstract representations enable few-sample learning and reliable generalization on novel tasks. We conclude that abstract representations of sensory and cognitive variables may emerge from the multiple behaviors that animals exhibit in the natural world, and, as a consequence, could be pervasive in high-level brain regions. We also make several speciﬁc predictions about which variables will be represented abstractly. The ability to generalize existing knowledge to novel stimuli or situations is essential to complex, rapid, and accurate behavior. As an example, when shopping for produce, humans make many different decisions about whether or not different pieces of produce are ripe— and, consequently, whether to purchase them. The knowledge we use in the store is often learned from experience with that fruit at home— thus, generalizing across distinct contexts. Further, the knowledge that we apply to a fruit that we buy for the ﬁrst time might be derived from similar fruits—generalizing, for instance, from an apple to a pear. The determinations themselves are often multi-dimensional and multisensory: both ﬁrmness and appearance are important for deciding whether an avocado is the right level of ripeness. Yet, at the end of this complex process, we make a binary decision about each piece of fruit: we add it to our cart, or do not—and get feedback later about whether that was the right decision. This produce shopping example is not unique. Humans and other animals exhibit an impressive ability to generalize across contexts and between different objects in many situations. The representational geometry of sensory and cognitive variables in a population of neurons provides insight into the computations that the representation may and may not facilitate1–3. We hypothesize that the ability to generalize described above is tied to this representational geometry. For instance, neural representations of sensory and cognitive variables are often nonlinearly mixed together. As a result, these representations have high-embedding dimension4–6. While this kind of nonlinear dimensionality expansion allows ﬂexible learning of new behaviors5 and provides metabolically efﬁcient and reliable representations7, the resulting representation often does not permit generalization across contexts or stimuli5,8. Alternatively, factorized, or even linear, representations of the relevant sensory or cognitive variables (i.e., representations that have no nonlinear mixing) often permit this generalization. Recent experimental work has shown that this kind of factorized—and approximately linear—representation exists at the apex of the primate ventral visual stream, for faces in inferotemporal cortex9–11. Further, experimental work in the hippocampus and prefrontal cortex has shown that representations of the sensory and 1 Center for Theoretical Neuroscience, Columbia University, New York, NY, USA. 2Mortimer B. Zuckerman Mind, Brain and Behavior Institute, Columbia e-mail: ; University, New York, NY, USA. Nature Communications | (2023)14:1040 1 Article https://doi.org/10.1038/s41467-023-36583-0 cognitive features related to a complex cognitive task, also support generalization8. We refer to representations of task-relevant sensory and cognitive variables that support generalization—like in these examples and others12–16—as abstract representations. In the machine learning literature, abstract representations are often referred to as factorized17 or disentangled10,17–20 representations of interpretable stimulus features. Deep learning has been used to produce abstract representations primarily in the form of unsupervised generative models18,21,22 (but see ref. 23). In this context, abstract representations are desirable because they allow potentially novel examples of existing stimulus classes to be produced by linear interpolation in the abstract representation space (for example, starting at a known exemplar and changing its orientation by moving linearly along a dimension in the abstract representation space that is known to correspond to orientation)18. Here, we ask how abstract representations—like those observed in higher brain regions8,9—can be constructed from the nonlinear and high-dimensional representations observed in early sensory areas6,24–28. To study this, we begin by mirroring these highdimensional and nonlinear representations in a learned model of continuous latent variables; then, we show that training feedforward neural network models to perform multiple distinct classiﬁcation tasks on these latent variables induces abstract representations in a wide variety of conditions. Experimental work on animals performing more than a couple of distinct behavioral tasks remains nearly nonexistent29. However, modeling work using recurrent neural networks has shown that the networks often develop representations that can be reused across distinct, but related tasks30–32—though the abstractness of these reusable representations was not measured. Thus, the behavioral constraint of multi-tasking may encourage the learning of abstract representations of stimulus features that are relevant to multiple tasks. To investigate this hypothesis, we train feedforward neural network models to perform multiple distinct tasks on a common stimulus space. Previous work in machine learning has shown that similar multi- tasking networks can achieve lower loss from the same number of samples than networks trained independently on each task33 (and see ref. 34), and that they can quickly learn novel, but related, tasks that are introduced after training35. Both of these properties are hallmarks of abstract representations—however, to our knowledge, the representational geometry developed by these multi-tasking networks has not been characterized. We begin by introducing the multi-tasking model and show that it produces fully abstract representations that are surprisingly robust to heterogeneity and context dependence in the learned tasks. These representations also emerge in (...truncated)