Probabilistic coherence, logical consistency, and Bayesian learning: Neural language models as epistemic agents
PLOS ONE
RESEARCH ARTICLE
Probabilistic coherence, logical consistency,
and Bayesian learning: Neural language
models as epistemic agents
Gregor Betz ID1*, Kyle Richardson2
1 Department of Philosophy, Karlsruhe Institute of Technology, Karlsruhe, Germany, 2 Aristo, Allen Institute
for AI, Seattle, WA, United States of America
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Betz G, Richardson K (2023) Probabilistic
coherence, logical consistency, and Bayesian
learning: Neural language models as epistemic
agents. PLoS ONE 18(2): e0281372. https://doi.
org/10.1371/journal.pone.0281372
Editor: Anu Sayal, Taylor’s University - Lakeside
Campus: Taylor’s University, MALAYSIA
Received: June 1, 2022
Accepted: January 22, 2023
Published: February 9, 2023
Peer Review History: PLOS recognizes the
benefits of transparency in the peer review
process; therefore, we enable the publication of
all of the content of peer review and author
responses alongside final, published articles. The
editorial history of this article is available here:
https://doi.org/10.1371/journal.pone.0281372
Copyright: © 2023 Betz, Richardson. This is an
open access article distributed under the terms of
the Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: The data is now
available via wandb under the following links/repos:
https://wandb.ai/doxlm2/doxlm2_model_runs
https://wandb.ai/doxlm2/doxlm2_finetuning
https://wandb.ai/doxlm2/dataset_versions.
*
Abstract
It is argued that suitably trained neural language models exhibit key properties of epistemic
agency: they hold probabilistically coherent and logically consistent degrees of belief, which
they can rationally revise in the face of novel evidence. To this purpose, we conduct computational experiments with RANKERS: T5 models [Raffel et al. 2020] that are pretrained on carefully designed synthetic corpora. Moreover, we introduce a procedure for eliciting a model’s
degrees of belief, and define numerical metrics that measure the extent to which given
degrees of belief violate (probabilistic, logical, and Bayesian) rationality constraints. While
pretrained RANKERS are found to suffer from global inconsistency (in agreement with, e.g.,
[Jang et al. 2021]), we observe that subsequent self-training on auto-generated texts allows
RANKERS to gradually obtain a probabilistically coherent belief system that is aligned with logical constraints. In addition, such self-training is found to have a pivotal role in rational evidential learning, too, for it seems to enable RANKERS to propagate a novel evidence item
through their belief systems, successively re-adjusting individual degrees of belief. All this,
we conclude, confirms the Rationality Hypothesis, i.e., the claim that suitable trained NLMs
may exhibit advanced rational skills. We suggest that this hypothesis has empirical, yet also
normative and conceptual ramifications far beyond the practical linguistic problems NLMs
have originally been designed to solve.
Introduction
Neural language models (NLMs) are powerful natural language processing systems which
have sparked a scientific revolution in the field of AI & NLP [1–4] and excel at such diverse
tasks as, e.g., machine translation [5], text summarization [6], question answering [7, 8], or
natural-language inference [9, 10]. The performance of these systems has exploded with the
advent of the so-called Transformer network architecture [11] and has been increasing steadily
over the last years (e.g., [12]) through further optimizations of machine learning algorithms
and system design, increases in model size, or quantitatively and qualitatively improved training datasets. Technically, and leaving aside all the details, NLMs are essentially probabilistic
PLOS ONE | https://doi.org/10.1371/journal.pone.0281372 February 9, 2023
1 / 29
PLOS ONE
Funding: This work is supported by the Helmholtz
Association Initiative and Networking Fund on the
HAICORE@KIT partition. The funders had no role in
study design, data collection and analysis, decision
to publish, or preparation of the manuscript.
Competing interests: The authors have declared
that no competing interests exist.
Neural language models as epistemic agents
word prediction machines. They are, first and foremost, trained to fill in missing or next
words in a text; and they do predict a word by assigning probabilities to all words available in a
given vocabulary.
The strong performance of NLMs in natural language understanding tasks triggers the
more fundamental question whether NLMs are rational agents:
(Rationality Hypothesis) Suitably designed and trained NLMs may systematically display
advanced rational skills.
By discussing the (Rationality Hypothesis), we put the more specific questions addressed
in this study (see Q1–Q4 below) in a broader scientific context, sketching their potential relevance for a variety of disciplines and fields.
Rationality is arguably a contested concept (like justice). So what exactly does it mean that a
NLM posseses advanced rational skills? We take it that such skills would include, more specifically, the abilities to reason correctly (infer, argue, and explain), to produce linguistic output
that is sufficiently stable and globally consistent, and to adjust a former output in the light of
novel evidence (or, more precisely, a linguistic representation of novel evidence). Moreover,
advanced rational behavior of NLMs would allow one to adopt an “intentional stance” [13]
towards these systems and to treat them as doxastic, if not epistemic agents holding beliefs and
acquiring knowledge. In this study, we focus on, further specify, and operationalize the aforementioned epistemic competences. In doing so, we don’t, however, intend to imply that all
dimensions of rationality can be reduced to such theoretical or epistemic skills.
To say that future NLMs (trained on linguistic data) may exhibit artificial general intelligence (AGI) means to endorse the Rationality Hypothesis.
The Rationality Hypothesis has ramifications far beyond the practical linguistic problems
NLMs have been developed (and are used) to solve. Normatively and conceptually, its investigation may shed new light on the notion of rationality itself (see [14]), helping us to see
whether reason is an emergent property [15]: Is reliable rational behavior a cognitive macro
pattern that emerges when agents exercise basic linguistic skill (predicting missing words)? Or,
to give this a normative twist: The Rationality Hypothesis asks which, if any, rational practices
are grounded in elementary language norms. Empirically, an investigation of the Rationality
Hypothesis will potentially alter our scientific understanding of human cognition (see also [16,
17], especially so as NLMs are found to accurately predict humans’ behavioral and neural (...truncated)