Maternal smoking impacts key biological pathways in newborns through epigenetic modification in Utero
Rotroff et al. BMC Genomics (2016) 17:976
DOI 10.1186/s12864-016-3310-1
RESEARCH ARTICLE
Open Access
Maternal smoking impacts key biological
pathways in newborns through epigenetic
modification in Utero
Daniel M. Rotroff1,2, Bonnie R. Joubert3, Skylar W. Marvel1, Siri E. Håberg4, Michael C. Wu5, Roy M. Nilsen6,
Per M. Ueland7,8, Wenche Nystad4, Stephanie J. London3* and Alison Motsinger-Reif1,2,9
Abstract
Background: Children exposed to maternal smoking during pregnancy exhibit increased risk for many adverse
health effects. Maternal smoking influences methylation in newborns at specific CpG sites (CpGs). Here, we extend
evaluation of individual CpGs to gene-level and pathway-level analyses among 1062 participants in the Norwegian
Mother and Child Cohort Study (MoBa) using the Illumina 450 K platform to measure methylation in newborn DNA
and maternal smoking in pregnancy, assessed using the biomarker, plasma cotinine. We used novel implementations
of bioinformatics tools to collapse epigenome-wide methylation data into gene- and pathway-level effects to test
whether exposure to maternal smoking in utero differentially methylated CpGs in genes enriched in biologic pathways.
Unlike most pathway analysis applications, our approach allows replication in an independent cohort.
Results: Data on 485,577 CpGs, mapping to a total of 20,199 genes, were used to create gene scores that were tested
for association with maternal plasma cotinine levels using Sequence Kernel Association Test (SKAT), and 15 genes were
found to be associated (q < 0.25). Six of these 15 genes (GFI1, MYO1G, CYP1A1, RUNX1, LCTL, and AHRR) contained
individual CpGs that were differentially methylated with regards to cotinine levels (p < 1.06 × 10−7). Nine of the 15
genes (FCRLA, MIR641, SLC25A24, TRAK1, C1orf180, ITLN2, GLIS1, LRFN1, and MIR451) were associated with cotinine at the
gene-level (q < 0.25) but had no genome-wide significant individual CpGs (p > 1.06 × 10−7). Pathway analyses using
gene scores resulted in 51 significantly associated pathways, which we tested for replication in an independent cohort
(q < 0.05). Of those 32 replicated in an independent cohort, which clustered into six groups. The largest cluster
consisted of pathways related to cancer, cell cycle, ERα receptor signaling, and angiogenesis. The second cluster,
organized into five smaller pathway groups, related to immune system function, such as T-cell regulation and other
white blood cell related pathways.
Conclusions: Here we use novel implementations of bioinformatics tools to determine biological pathways impacted
through epigenetic changes in utero by maternal smoking in 1062 participants in the MoBa, and successfully replicate
these findings in an independent cohort. The results provide new insight into biological mechanisms that may
contribute to adverse health effects from exposure to tobacco smoke in utero.
Keywords: Smoking, Epigenetics, Pathway analysis, Cancer, In utero
* Correspondence:
3
Division of Intramural Research, National Institute of Environmental Health
Sciences, National Institutes of Health, Department of Health and Human
Services, PO Box 12233, MD A3-05, Research Triangle Park, NC 27709, USA
Full list of author information is available at the end of the article
© The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Rotroff et al. BMC Genomics (2016) 17:976
Background
Although many adverse effects of maternal smoking on
offspring have been well identified, little is known about
the underlying biological mechanisms. [1, 2] One proposed mechanism for how in utero exposure to tobacco
smoke may impact health is through epigenetic effects
including DNA methylation. Previously, Joubert et al.
collected genome-wide methylation data from 1062
MoBa mother-offspring pairs and demonstrated that
maternal smoking, assessed objectively by cotinine
levels, is significantly associated with 1) differential DNA
methylation in genes involved in metabolism of tobacco
smoke compounds, and 2) novel genes involved in diverse developmental processes not previously linked to
tobacco response [3]. These findings have since been
widely replicated [3–6].
It has been recognized that genome wide association
studies, using single nucleotide polymorphisms, that rely
on single locus variation explain little of the overall heritability of complex traits [7, 8]. While there are many
potential sources of this “missing heritability”, single
locus analysis typically ignores a large number of loci
with moderate effects, due to stringent significance
thresholds. Gene-based association analysis takes a gene
as basic unit for association analysis. As this method can
combine genetic information given by all the markers in
a gene, it can obtain more informative results and increase the capability of finding novel genes and gene
sets. This method has been used as a novel complement
method for SNP-based GWAS in identifying disease susceptibility genes [9, 10], and we extend such an approach
to methylation data here.
Page 2 of 12
Additionally, To investigate the biological processes
(i.e. pathways) impacted by maternal smoking during
pregnancy and associated altered fetal methylation, we
performed gene set/pathway analysis to further dissect
the biological impact of maternal smoking. We applied a
novel approach that combines analysis tools for collapsing epigenome-wide methylation data into gene- and
pathway-based effects (Fig. 1). Pathway analysis combines significant genes into sets of genes, or pathways,
that are thought to have coordinated effects on a biological endpoint.
A number of pathway analysis methods have been developed, and have been widely applied in human genetics and genomics. The majority of pathway analysis
methods were originally developed for microarray, gene
expression data, and the most popular methods perform
enrichment analysis for gene sets defined by external
knowledge bases [11]. In the current study, we modified
the bioinformatics approaches that have been developed
in other contexts to be valid for epigenome-wide data
analysis.
Importantly, we performed a two stage study, performing both discovery and replication of the gene-based and
pathway-based associations. While replication is standard
in genetic association studies for individual variants it is
rarely performed for pathway analyses. Whether due to
the limited availability of proper validation cohorts in
many stu (...truncated)