A model based on Bayesian confirmation and machine learning algorithms to aid archaeological interpretation by integrating incompatible data (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0248261&type=printable

A model based on Bayesian confirmation and machine learning algorithms to aid archaeological interpretation by integrating incompatible data

PLOS ONE RESEARCH ARTICLE A model based on Bayesian confirmation and machine learning algorithms to aid archaeological interpretation by integrating incompatible data Daniella Vos ID1¤*, Richard Stafford2, Emma L. Jenkins1, Andrew Garrard3 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 1 Department of Archaeology and Anthropology, Faculty of Science and Technology, Bournemouth University, Poole, United Kingdom, 2 Department of Life and Environmental Sciences, Faculty of Science and Technology, Bournemouth University, Poole, United Kingdom, 3 Institute of Archaeology, University College London, London, United Kingdom ¤ Current Address: Department of Cultural Geography, Faculty of Spatial Sciences, University of Groningen, Groningen, The Netherlands * Abstract OPEN ACCESS Citation: Vos D, Stafford R, Jenkins EL, Garrard A (2021) A model based on Bayesian confirmation and machine learning algorithms to aid archaeological interpretation by integrating incompatible data. PLoS ONE 16(3): e0248261. https://doi.org/10.1371/journal.pone.0248261 Editor: Peter F. Biehl, University at Buffalo - The State University of New York, UNITED STATES Received: June 10, 2020 Accepted: February 23, 2021 Published: March 31, 2021 Peer Review History: PLOS recognizes the benefits of transparency in the peer review process; therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. The editorial history of this article is available here: https://doi.org/10.1371/journal.pone.0248261 Copyright: © 2021 Vos et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: All relevant data are within the manuscript and its Supporting Information files. The interpretation of archaeological features often requires a combined methodological approach in order to make the most of the material record, particularly from sites where this may be limited. In practice, this requires the consultation of different sources of information in order to cross validate findings and combat issues of ambiguity and equifinality. However, the application of a multiproxy approach often generates incompatible data, and might therefore still provide ambiguous results. This paper explores the potential of a simple digital framework to increase the explanatory power of multiproxy data by enabling the incorporation of incompatible, ambiguous datasets in a single model. In order to achieve this, Bayesian confirmation was used in combination with decision trees. The results of phytolith and geochemical analyses carried out on soil samples from ephemeral sites in Jordan are used here as a case study. The combination of the two datasets as part of a single model enabled us to refine the initial interpretation of the use of space at the archaeological sites by providing an alternative identification for certain activity areas. The potential applications of this model are much broader, as it can also help researchers in other domains reach an integrated interpretation of analysis results by combining different datasets. Introduction The archaeological record comprises various forms of artefacts and ecofacts which, when studied in combination, offer a better interpretative understanding of past human lifeways, than when studied in isolation. The range of archaeological material that can be studied, and the amount of information that can be gained from it, is dependent upon the original concentration of material deposited into the archaeological record by its human inhabitants and the state of preservation of the archaeological site itself. Ephemeral sites and/or their poor PLOS ONE | https://doi.org/10.1371/journal.pone.0248261 March 31, 2021 1 / 25 PLOS ONE Funding: This research was supported by an AHRC/Bournemouth University (BU - https://www. bournemouth.ac.uk/) PhD studentship and by an Arts and Humanities Research Council (AHRC https://ahrc.ukri.org/) grant number AH/K002902/1 awarded to ELJ. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. Applying Bayesian confirmation and machine learning to combine incompatible archaeological data preservation represent a challenge that archaeologists can embrace by finding novel and innovative ways to maximise the amount of information that be gained from the archaeological record. In recent years, the interpretation and understanding of the use of space in archaeology has been increasingly aided by the incorporation of geoarchaeological techniques, such as geochemistry, phytolith analysis, micromorphology and lipid residue analysis alongside artefactual analysis [1–8]. These techniques have the advantage of considering in situ signals of activity, which are less prone to post depositional disturbance [4, 9]. The downside, however, is that they may produce results which are equivocal, subtle, or distinct, which can limit their interpretation potential. Therefore, while these methods greatly increase our understanding of archaeological sites, their interpretative value is greater when used in combination as a multiproxy approach than in isolation [2]. Inconsistencies in the type of results produced by different techniques and their resulting data structure, however, often do not favour their incorporation within a single comprehensive statistical model [10, 11]. For example, while phytolith data is recorded in count form and may be further aggregated into several categories representing plant genus, plant parts or weight of phytolith material per gram, measurements of geochemical elements are often recorded in parts per million, producing continuous data. Many geoarchaeological techniques produce complex, high dimensional data which are difficult to interpret in relation to one another. The results of multiple analysis techniques are therefore often quantitatively analysed separately even when they are used within a combined methodology, and then described side by side to form a qualitative synthesis [7, 11–15]. One way to approach this issue is through standardization and normalization of the data prior to their integration in multivariate statistics [16–18]. However, while the scale of the data is brought to the same, continuous level, very different phenomena are measured which do not necessarily all translate well into a continuous scale. This paper offers a new approach to integrate analysis results from multiple sources of information through the application of machine learning algorithms alongside other statistical and traditional analytical methods within a single quantitative framework. It is different to previous ones in th (...truncated)