Sieving the weeds from the grains: an R based package for classifying archaeobotanical samples of cereals and pulses according to crop processing stages
Vegetation History and Archaeobotany
https://doi.org/10.1007/s00334-024-01006-7
ORIGINAL ARTICLE
Sieving the weeds from the grains: an R based package for classifying
archaeobotanical samples of cereals and pulses according to crop
processing stages
Elizabeth Stroud1
· Glynis Jones2 · Michael Charles1
· Amy Bogaard1
Received: 29 September 2023 / Accepted: 12 July 2024
© The Author(s) 2024
Abstract
The R package CropPro is an open-access resource to classify archaeobotanical samples as products and by-products of
different stages of the crop processing sequence for large-seeded cereal and pulse crops in south west Asia, Europe and
other Mediterranean regions. It builds on ethnographic research and analysis conducted by Jones (Plants and ancient man:
studies in palaeoethnobotany. Balkema, Rotterdam, pp 43–61, 1984), (J Archaeol Sci 14:311–323, 1987), (Circaea 6:91–
96, 1990) and a modified method by Charles (Environ Archaeol 1:111–122, 1998). CropPro provides functions, which
allow users to construct triplots, to conduct discriminant analysis comparing archaeobotanical samples with ethnographic
crop processing stages and to plot the discriminant analysis results. This paper provides two worked examples of the use
of CropPro: the early medieval site of Stafford in the UK and the Bronze Age site of Tell Brak in Syria. These examples
illustrate the use of the package for identifying crop-processing stages, and for assessing the relevance of taphonomic
pathways other than crop processing.
Keywords Crop-processing · Discriminant analysis · Weed seed attributes · R package · Cereal and pulse processing
Introduction
Understanding the crop processing stages represented by
archaeobotanical remains is essential for identifying activity areas, seasonal activities, and storage protocols at early
agricultural sites. The series of steps required to convert
harvested crop material into clean grain has been recognized as one of the causes of variation in archaeobotanical samples (Dennell 1972, 1974, 1976; Hillman 1973).
For this reason, determining the crop processing status of
archaeobotanical samples is necessary in order to recognise
the biases imposed by such activities on the composition of
archaeobotanical samples, and to consider this bias during
Communicated by F. Antolín.
Elizabeth Stroud
1
School of Archaeology, University of Oxford, Oxford, UK
2
Department of Archaeology, University of Sheffield,
Sheffield, UK
interpretation. This includes changes in the proportions of
different weed species, which can be particularly important when using weed species as indicators of cultivation
regimes (e.g. Bogaard et al. 2005).
Ethnobotanical studies on crop processing highlight
how crop-processing sequences alter both the crop and
weed composition of a sample (Hillman 1981; Jones 1984,
1987, 1990). Several archaeobotanists have conducted or
used ethnographic research to understand the processing
sequence of a range of crop species (see for example Hillman 1981, 1984a, 1985; Jones 1984; D’Andrea and Haile
2002; Peña-Chocarro and Zapata Peña 2003 for temperate
cereals and pulses; Reddy 1997, 2003; Thompson 1998;
Lundström-Baudais et al. 2002; Harvey and Fuller 2005 for
millets and rice). Such research has been taken further, with
the proportions and ratios of particular items within such
ethnographic data used to infer the crop processing status of
archaeobotanical material (see for example Hillman 1984b;
Jones 1984, 1990). Jones (1984, 1987) used ethnographic
data of the weed seed characteristics as a discriminant
model, which provides a way of recognising the effect of
crop processing on archaeobotanical samples. Ethnographic
13
Vegetation History and Archaeobotany
work, conducted on the Greek island of Amorgos in the
1980s laid the foundation for statistical models used to
identify archaeobotanical samples as the products and byproducts of different stages in the traditional crop processing sequence for large-seeded cereal and pulse crops in
south west Asia, Europe, and other Mediterranean regions
(Jones 1984, 1987). By collecting and characterising these
(by-)products of processing, data were obtained for three
different statistical models that allow a comparison between
ethnographic and archaeobotanical data. Although the processing of these crops is applicable to a wide range of cereals and pulses, these models are not suitable for all crops,
such as small-seeded cereals like millets, or those that are
harvested without weeds like maize. The full details of this
model is described in Jones (1984, 1987).
This paper presents the R package CropPro, which provides, for the first time, openly accessible tools to conduct
the same types of analysis as Jones (1984, 1987) and Charles
(1998), as well as open access to the dataset behind the models, allowing anyone to use this method (ESM 1). CropPro
enables the classification and comparison of archaeobotanical samples against the ethnographic data from Amorgos
(ESM 1, Jones 1990). Three methods can be employed: triangular plotting, which compares the proportions of grains,
rachis nodes and weed seeds, in order to gain insight into
the processing of free-threshing cereals (see Jones 1990);
a discriminant analysis that utilises the attributes of weed
seeds to identify the products and by-products of cereal and
pulse crop-processing (see Jones 1984, 1987); and another
application of discriminant analysis, which again employs
the attributes of wild/weed seeds, to assess the relevance
of crop-processing versus alternative taphonomic pathways
such as dung burning (see Charles 1998).
Background
Using the ethnographic data collected on Amorgos, Jones
(1984, 1987) introduced a method for characterising products and by-products of the crop processing sequence from
which archaeobotanical material is derived. Data from the
processing of cereals and pulses (bread and macaroni wheat,
six rowed hulled barley, oat, pea, lentil, common vetch, and
grass pea) has been used to create predictive models to classify suitable archaeobotanical samples (e.g. those with a sufficient number of items). Three by-products and one product
were selected for sampling because these would most likely
be kept for later use, and so potentially recovered archaeologically. Discriminant analysis, a multivariate statistical
technique and form of machine learning, was used to create a model based on key physical characteristics of the
weed seeds accompanying the crop during processing. This
model was subsequently used to classify the archaeobotanical samples. The three characteristics of the weed seeds
used are: (1) the size of the seeds relative to the fine sieve
mesh used to separate small weed seeds from cereal grain,
(2) the tendency of the seeds to remain in seed heads, spikes
or clusters after threshing and (3) aerodynamic properties
(see Table 1) (Jones 1984). By utilizing these characteristics
instead of specific species to distinguish crop-processing
stages (...truncated)