Batch effects account for the main findings of an in utero human intestinal bacterial colonization study
Goffau et al. Microbiome
(2021) 9:6
https://doi.org/10.1186/s40168-020-00949-z
LETTER TO THE EDITOR
Open Access
Batch effects account for the main findings
of an in utero human intestinal bacterial
colonization study
Marcus C. de Goffau1, D. Stephen Charnock-Jones2,3, Gordon C. S. Smith2,3 and Julian Parkhill1*
Abstract
A recent study by Rackaityte et al. reported evidence for a low level of bacterial colonization, specifically of Micrococcus
luteus, in the intestine of second trimester human fetuses. We have re-analyzed their sequence data and identified a
batch effect which violates the underlying assumptions of the bioinformatic method used for contamination removal.
This batch effect resulted in Micrococcus not being identified as a contaminant in the original work and being falsely
assigned to the fetal samples. We further provide evidence that the micrographs presented by Rackaityte et al. are
unlikely to show Micrococci or other bacteria as the size of the particles shown exceeds that of related bacterial cells.
Finally, phylogenetic analysis showed that the microbes cultured from the fetal samples differed significantly from
those detected by sequencing. Overall, our findings show that the presence of Micrococcus in the fetal gut is not
supported by the primary sequence data. Our findings underline important aspects of the nature of contamination for
both sequencing and culture approaches in microbiome studies and the appropriate use of automated contamination
identification tools.
Keywords: Batch effects, Decontam, Colonization in utero, 16S rRNA
Main text
A recent study by Rackaityte et al. [1] reported evidence for a
low level of bacterial colonization of the fetal intestine from
second trimester human fetuses. The authors reported V4
16S rRNA gene amplification sequence data from both meconium samples and various negative controls, including several
types of swabs and fetal kidney samples. They used the R
package decontam [2] to account for reagent contamination
and, after filtering, found several signals of potential interest
that appeared to be enriched in fetal meconium compared to
their controls. Quantitative PCR, fluorescent in situ
hybridization (FISH), scanning electron microscopy (SEM),
phenotypic characterization of lamina propria T cells, RNAseq of fetal intestinal epithelial cells and culture were
* Correspondence:
1
Department of Veterinary Medicine, University of Cambridge, Cambridge,
UK
Full list of author information is available at the end of the article
performed and appeared to support the presence of microbes,
possibly including Micrococcus luteus. Our re-analysis of the
data however provides strong evidence that several of the
findings are caused by an unrecognized batch effect.
Batch effect in 16S rRNA gene amplicon
sequencing data
We reanalyzed their V4 16S rRNA gene amplification
data using metadata reported in Supplemental Table 2
(the unfiltered OTU table) excluding samples with fewer
than 100 reads. Read numbers for each OTU were normalized into a percentage of the total number of reads
per sample. Principal component analysis (PCA) was
performed to identify whether the main sources of variation in the data were associated with the sample type
or were due to sample-independent (batch) effects.
Interestingly, PC1 (72%), PC2 (13%) (Fig. 1a, b), and PC3
(4%) demonstrated that the first 80 samples (as ordered
© The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if
changes were made. The images or other third party material in this article are included in the article's Creative Commons
licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons
licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain
permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the
data made available in this article, unless otherwise stated in a credit line to the data.
Goffau et al. Microbiome
(2021) 9:6
Page 2 of 7
Fig. 1 Batch effect analysis of V4 16S rRNA gene amplification data with a focus on Micrococcus (OTU10). Source data: Supplemental Table 2 of
Rackaityte et al. [2]. a, b Principal components 1 and 2 (PC1 and PC2), respectively, show a distinct microbial profile shift after sample 80 (ID 1616
L), as indicated with a dashed vertical line and designated as batch 1 and batch 2 as indicated in a. A sub-analysis of PC1 and of PC2 with the
meconium samples only is shown to the right of both figures. c The Micrococcus signal (OTU10) is part of this profile shift. d Micrococcus (OTU10)
signals from batch 1 (samples 1–80 all meconium) vs meconium only samples from batch 2 or vs all negative controls from batch 2 show that
the OTU10 signal is batch associated. e The phenotypic characterization of the lamina propria (LP) T cells shows that samples corresponding to
batch 1 had significantly higher proportions of PLZF+CD161+T cells compared to samples corresponding to batch 2. Interquartile ranges are
shown, and comparisons indicated by brackets have P values shown above them (Mann-Whitney U test)
by the authors’ identifier) appeared to have a different
microbial profile to the next 130, irrespective of the actual source of the sample. This analysis suggests that
some aspect of the sample collection or processing was
performed in at least two batches. The authors state in
their methods, and have confirmed to us, that all the sequencing was performed in a single batch. However, it is
apparent that there was some change in their technical
procedures, for example, a change in sample collection
procedures coinciding with a switch from sampling
meconium only to sampling meconium and additional
controls, or a different lot of one or more collection reagents used during the period over which the samples
were collected. Importantly, this switch coincides with a
clear change in the microbial profile. Before the switch,
samples were solely composed of meconium (from 28
fetal donors) and had been taken from the proximal,
mid, and distal sections of the small intestine (ID numbers 1519–1616, D, J, and L, respectively) and included
no controls. After the switch, the samples included
meconium from 22 fetal donors (three sites each) and
four negative controls for 19 of the fetal samples (ID
numbers 1633–1660): a procedural swab, a room air
swab, a moistened swab, and a kidney sample (S, A, N,
and K, respectively). The remainder of t (...truncated)