A model based on Bayesian confirmation and machine learning algorithms to aid archaeological interpretation by integrating incompatible data
PLOS ONE
RESEARCH ARTICLE
A model based on Bayesian confirmation and
machine learning algorithms to aid
archaeological interpretation by integrating
incompatible data
Daniella Vos ID1¤*, Richard Stafford2, Emma L. Jenkins1, Andrew Garrard3
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
1 Department of Archaeology and Anthropology, Faculty of Science and Technology, Bournemouth
University, Poole, United Kingdom, 2 Department of Life and Environmental Sciences, Faculty of Science
and Technology, Bournemouth University, Poole, United Kingdom, 3 Institute of Archaeology, University
College London, London, United Kingdom
¤ Current Address: Department of Cultural Geography, Faculty of Spatial Sciences, University of Groningen,
Groningen, The Netherlands
*
Abstract
OPEN ACCESS
Citation: Vos D, Stafford R, Jenkins EL, Garrard A
(2021) A model based on Bayesian confirmation
and machine learning algorithms to aid
archaeological interpretation by integrating
incompatible data. PLoS ONE 16(3): e0248261.
https://doi.org/10.1371/journal.pone.0248261
Editor: Peter F. Biehl, University at Buffalo - The
State University of New York, UNITED STATES
Received: June 10, 2020
Accepted: February 23, 2021
Published: March 31, 2021
Peer Review History: PLOS recognizes the
benefits of transparency in the peer review
process; therefore, we enable the publication of
all of the content of peer review and author
responses alongside final, published articles. The
editorial history of this article is available here:
https://doi.org/10.1371/journal.pone.0248261
Copyright: © 2021 Vos et al. This is an open access
article distributed under the terms of the Creative
Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in
any medium, provided the original author and
source are credited.
Data Availability Statement: All relevant data are
within the manuscript and its Supporting
Information files.
The interpretation of archaeological features often requires a combined methodological
approach in order to make the most of the material record, particularly from sites where this
may be limited. In practice, this requires the consultation of different sources of information
in order to cross validate findings and combat issues of ambiguity and equifinality. However,
the application of a multiproxy approach often generates incompatible data, and might
therefore still provide ambiguous results. This paper explores the potential of a simple digital
framework to increase the explanatory power of multiproxy data by enabling the incorporation of incompatible, ambiguous datasets in a single model. In order to achieve this,
Bayesian confirmation was used in combination with decision trees. The results of phytolith
and geochemical analyses carried out on soil samples from ephemeral sites in Jordan are
used here as a case study. The combination of the two datasets as part of a single model
enabled us to refine the initial interpretation of the use of space at the archaeological sites
by providing an alternative identification for certain activity areas. The potential applications
of this model are much broader, as it can also help researchers in other domains reach an
integrated interpretation of analysis results by combining different datasets.
Introduction
The archaeological record comprises various forms of artefacts and ecofacts which, when studied in combination, offer a better interpretative understanding of past human lifeways, than
when studied in isolation. The range of archaeological material that can be studied, and the
amount of information that can be gained from it, is dependent upon the original concentration of material deposited into the archaeological record by its human inhabitants and the
state of preservation of the archaeological site itself. Ephemeral sites and/or their poor
PLOS ONE | https://doi.org/10.1371/journal.pone.0248261 March 31, 2021
1 / 25
PLOS ONE
Funding: This research was supported by an
AHRC/Bournemouth University (BU - https://www.
bournemouth.ac.uk/) PhD studentship and by an
Arts and Humanities Research Council (AHRC https://ahrc.ukri.org/) grant number AH/K002902/1
awarded to ELJ. The funders had no role in study
design, data collection and analysis, decision to
publish, or preparation of the manuscript.
Competing interests: The authors have declared
that no competing interests exist.
Applying Bayesian confirmation and machine learning to combine incompatible archaeological data
preservation represent a challenge that archaeologists can embrace by finding novel and innovative ways to maximise the amount of information that be gained from the archaeological
record.
In recent years, the interpretation and understanding of the use of space in archaeology has
been increasingly aided by the incorporation of geoarchaeological techniques, such as geochemistry, phytolith analysis, micromorphology and lipid residue analysis alongside artefactual analysis [1–8]. These techniques have the advantage of considering in situ signals of
activity, which are less prone to post depositional disturbance [4, 9]. The downside, however,
is that they may produce results which are equivocal, subtle, or distinct, which can limit their
interpretation potential. Therefore, while these methods greatly increase our understanding of
archaeological sites, their interpretative value is greater when used in combination as a multiproxy approach than in isolation [2].
Inconsistencies in the type of results produced by different techniques and their resulting
data structure, however, often do not favour their incorporation within a single comprehensive
statistical model [10, 11]. For example, while phytolith data is recorded in count form and may
be further aggregated into several categories representing plant genus, plant parts or weight of
phytolith material per gram, measurements of geochemical elements are often recorded in
parts per million, producing continuous data. Many geoarchaeological techniques produce
complex, high dimensional data which are difficult to interpret in relation to one another. The
results of multiple analysis techniques are therefore often quantitatively analysed separately
even when they are used within a combined methodology, and then described side by side to
form a qualitative synthesis [7, 11–15]. One way to approach this issue is through standardization and normalization of the data prior to their integration in multivariate statistics [16–18].
However, while the scale of the data is brought to the same, continuous level, very different
phenomena are measured which do not necessarily all translate well into a continuous scale.
This paper offers a new approach to integrate analysis results from multiple sources of
information through the application of machine learning algorithms alongside other statistical
and traditional analytical methods within a single quantitative framework. It is different to previous ones in th (...truncated)