GEneSTATION 1.0: a synthetic resource of diverse evolutionary and functional genomic data for studying the evolution of pregnancy-associated tissues and phenotypes
Mara Kim 2
Brian A. Cooper 2
Rohit Venkat 2
Julie B. Phillips 2
Haley R. Eidem 2
Jibril Hirbo 2
Sashank Nutakki 2
Scott M. Williams 1
Louis J. Muglia 0
J. Anthony Capra 2 4
Kenneth Petren 3
Patrick Abbot 2
Antonis Rokas 2 4
Kriston L. McGary 2
0 Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children's Hospital Medical Center , Cincinnati, OH 45229 , USA
1 Department of Genetics, Geisel School of Medicine, Dartmouth College , Hanover, NH 03755 , USA
2 Department of Biological Sciences, Vanderbilt University , Nashville, TN 37235 , USA
3 Department of Biological Sciences, University of Cincinnati , Cincinnati, OH 45221 , USA
4 Department of Biomedical Informatics, Vanderbilt University Medical Center , Nashville, TN 37235 , USA
Mammalian gestation and pregnancy are fast
evolving processes that involve the interaction of the
fetal, maternal and paternal genomes. Version 1.0 of
the GEneSTATION database (http://genestation.org)
integrates diverse types of omics data across
mammals to advance understanding of the genetic
basis of gestation and pregnancy-associated
phenotypes and to accelerate the translation of
discoveries from model organisms to humans.
GEneSTATION is built using tools from the Generic Model
Organism Database project, including the
biologyaware database CHADO, new tools for rapid data
integration, and algorithms that streamline synthesis
and user access. GEneSTATION contains curated life
history information on pregnancy and reproduction
from 23 high-quality mammalian genomes. For every
human gene, GEneSTATION contains diverse
evolutionary (e.g. gene age, population genetic and
molecular evolutionary statistics), organismal (e.g.
tissuespecific gene and protein expression, differential
gene expression, disease phenotype), and
molecular data types (e.g. Gene Ontology Annotation,
protein interactions), as well as links to many
general (e.g. Entrez, PubMed) and pregnancy
diseasespecific (e.g. PTBgene, dbPTB) databases. By
facilitating the synthesis of diverse functional and
evolutionary data in pregnancy-associated tissues and
phenotypes and enabling their quick, intuitive,
accurate and customized meta-analysis, GEneSTATION
provides a novel platform for comprehensive
investigation of the function and evolution of mammalian
Placental mammals, which originated 160 million years ago,
uniformly share a conserved set of reproductive traits
related to embryonic development within a uterus and
nutrient provisioning through a chorioallantoic placenta (1).
Paradoxically, this conservation of reproductive mode and
function during mammalian evolution is starkly juxtaposed
with the evolution of the placenta, one of the most
variable of all mammalian organs (2,3). At present, there is no
comprehensive explanation for the diversity of
evolutionary tempos and modes exhibited by the processes
associated with mammalian gestation and pregnancy. The
consequences are important not only for our understanding of
mammalian pregnancy (4), but also for major features of
human evolution, such as the encephalization and
bipedalism (5), and how natural selection has acted on and shaped
human biology (6,7). And clinically, complications of
pregnancy in humans are a major cause of infant mortality
around the world (8); for example, complications stemming
from birth before term (pre-term birth or PTB), defined in
humans as birth before 37 completed weeks of gestation (9),
are the leading cause of death in newborns and in children
under the age of five (10,11).
Several funding agencies have recognized both the
seriousness of pregnancy associated medical problems and
the persistence of many unanswered questions about the
process. Consequently, they are currently increasing their
investments in the study of the biology and pathologies
of pregnancy, which will lead to the generation of large
amounts of diverse types of data in the next few years. Two
notable examples are the NIH-sponsored Human Placenta
Project (12), aimed to ‘understand the role of the placenta
in health and disease’, and the March of Dimes-sponsored
Prematurity Research Centers (http://prematurityresearch.
org/), ‘dedicated to solving the mysteries of premature
birth’. Because PTB has a significant genetic component
(13,14), there is general consensus that emerging
molecular and genomic resources provide new opportunities to not
only make fundamental advances in our understanding of
the evolution and function of mammalian pregnancy (4,15–
20), but to also make breakthroughs in treating its diseases
At present, however, such advances are limited by the fact
that such data and resources are dispersed either in many
different journals’ supplements or across several different
databases, making synthesis of available information slow
and costly, and hampering powerful system approaches that
involve overlaying diverse data types and analyses in the
treatment of disease (25,26). To facilitate this synthesis, we
have developed GEneSTATION (http://genestation.org), a
database that integrates diverse types of -omics data across
mammals to advance understanding of the genetic basis
of pregnancy-associated phenotypes and to accelerate the
translation of discoveries from model organisms to humans.
The database’s name, GEneSTATION, is a compound word
created by blending together ‘gene’ and ‘gestation’; it can be
read as ‘gestation’ if the reader considers only the
capitalized letters, or as ‘gene station’ if the reader considers all
letters, and is intended to highlight the fact that this database
is focused on synthesizing information on genes related to
GEneSTATION provides the data and tools to easily
explore pregnancy from three complementary perspectives,
EVOLUTIONARY, ORGANISMAL, and MOLECULAR, at three
levels of synthesis. At the first level, individual gene pages
integrate the EVOLUTIONARY, ORGANISMAL, and
MOLECULAR perspectives in three easily accessible tabs, providing
a comprehensive picture of the breadth of the data
available for a single gene and introducing researchers to
analyses and data that they have not previously considered. At
the second level, individual analysis pages provide access to
genome-wide information from a single perspective, such as
natural selection in the human lineage, differential
expression in complications of pregnancy, or protein-protein
interactions among genes known to be involved in pregnancy.
At the final level of synthesis, the Gene Set Analysis tool and
the novel ‘SynTHy’ (Synthesis and Testing of Hypotheses)
tool enable researchers to synthesize on-the-fly the many
types of information available through the development and
evaluation of testable hypotheses.
DATA SOURCES AND DATA ORGANIZATION
Organism life history data
GEneSTATION contains information on pregnancy- and
reproduction-associated characteristics for every mammal
genome present in the database: human (Homo sapiens),
elephant (Loxodonta africana), chimpanzee (Pan troglodytes),
cow (Bos taurus), macaque (Macaca fascicularis), cat
(Felis catus), dog (Canis lupus), goat (Capra hircus), guinea pig
(Cavia porcellus), horse (Equus caballus), mouse (Mus
musculus), gibbon (Nomascus leucogenys), rat (Rattus
norvegicus), baboon (Papio anubis), vole (Microtus ochrogaster),
rabbit (Oryctolagus cuniculus), rhesus monkey (Macaca
mulatta), sheep (Ovis aries), orangutan (Pongo abelii),
gorilla (Gorilla gorilla), marmoset (Callithrix jacchus), wild
boar (Sus scrofa), and platypus (Ornithorhynchus anatinus).
Specifically, life history characteristics including mean
gestation length, neonate development, placental structure and
shape, litter size, interbirth interval, adult body mass,
maximum longevity and the timing of neonatal brain growth,
an interesting characteristic relevant to potential
complications of early parturition (27), are provided for each species
(for data sources see Materials and Methods).
In addition to the life history data for the 23 mammals,
every gene in each mammalian genome has a page on
GEneSTATION that depicts the available EVOLUTIONARY,
ORGANISMAL and MOLECULAR knowledge for that gene,
with data from each category reported in a separate tab
(Figure 1). The juxtaposition of diverse data is designed to
guide users toward a more comprehensive understanding
of genes of interest and facilitate serendipitous
construction of novel hypotheses. For example, a GEneSTATION
user may look up a gene of interest with enriched
expression in the placenta and quickly discover that this gene
additionally: (i) is often differentially expressed in studies on
preeclampsia, a complication of pregnancy characterized
by high blood pressure (both of these data types are
reported in the ORGANISMAL tab), (ii) originated
coincidentally with the placental mammals (reported in the
EVOLUTIONARY tab) and (iii) interacts with known pregnancy
related genes (reported in the MOLECULAR tab). Collectively,
these associations would suggest that the gene would be a
good candidate for further exploration.
The EVOLUTIONARY category contains a variety of
population and evolutionary data on human genes, and in some
instances (e.g. ancient selection, orthology) on genes from
diverse mammals (see Materials and Methods). These
include the strength of recent selection (measured by FST)
and ancient selection (measured by dN/dS), a gene’s
estimated date and lineage of origin, the SNPs from every
human gene, and mammalian orthology relationships.
The data in the ORGANISMAL category include the
Online Mendelian Information in Man (OMIM) phenotypes
for each human gene, if available, RNA and protein
expression across many tissues from Protein Atlas,
including several pregnancy related tissues, as well as all
differentially expressed genes from 106 genome-wide comparisons
from pregnancy studies across gestational tissues
(including placenta, cervix, myometrium, decidua, chorion,
amnion) and pathologies (including preeclampsia,
intrauterine growth restriction, chorioamnionitis and spontaneous
preterm birth) (see Materials and Methods).
The data in the MOLECULAR category include the gene
annotation (28) information for human and seven other
mammalian genomes, and the protein interactions, through
STRING (29), for proteins from humans and 15 other
In addition to the data sets in the three categories
listed above, GEneSTATION displays RefSeq summaries
for genes as well as gene-specific links to a wide
variety of general databases, e.g. Entrez Gene, PubMed,
UniProt, TreeFam (30–33), where available. In
addition, GEneSTATION contains links to two pregnancy
disease-specific databases, dbPTB (http://ptbdb.cs.brown.
edu/dbPTBv1.php) and PTBgene (http://ric.einstein.yu.
edu/ptbgene/). Both databases are focused on human PTB;
dbPTB contains the output from a computational mining of
the literature as well as of the KEGG and dbSNP databases
to identify studies, pathways and variants associated with
candidate disease-risk genes (34), whereas PTBgene
represents the summary of the fewer than 100 genes that show
genetic association with preterm birth (35).
A number of general purpose or species-centric databases,
e.g., Genecards, SGD, WormBase and FlyBASE (36–39),
provide access to diverse data sets on individual genes. By
design, such databases aim to present the breadth of all
available data for a given gene (e.g. http://www.genecards.
org/cgi-bin/carddisp.pl?gene=BRCA1), which means that
pages of genes that have been extensively studied can
become either cumbersome to navigate or saturated with large
amounts of different types of data, potentially obscuring the
biological interpretation of relationships among the various
resources proffered. For ease of access, each major category
on the gene page, e.g. EVOLUTIONARY, is presented on a
separate tab that is divided into subsections, e.g. Evolution in
Mammals. These subsections are intended to expand and
include additional related types of data in future versions
Aided by its focus on a specific biological process,
GEneSTATION was developed using state-of-the-art web
frameworks to provide a clean visual layout that is easy to
interpret and efficient to navigate efficiently (Figures 1–3).
For faster access to the data, GEneSTATION is designed
to be highly responsive to users and focuses on
providing low latency interaction and feedback. We have
implemented multiple custom-made visualizations to allow users
to quickly grasp the various types of available data and
analyses, both for individual genes and across the genome, such
as creating interactive summary figures that chart the
distributions of the underlying data or studies. For example,
the gene expression page (Figure 2) shows the number of
studies available by keyword (e.g. ‘myometrium’ or
‘spontaneous preterm birth’), providing instantaneous and
meaningful filters of the data, while simultaneously highlighting
deficiencies in the number of publically available data sets
and identifying opportunities for meta-analyses, as recently
described by Eidem et al. (40).
Custom-made visualizations are a key design feature of
GEneSTATION pages. Examples include a density plot
of each analysis in the SynTHy tool (Figure 3), which
allows users to quickly select an appropriate cutoff based
on the distribution of the values (http://www.genestation.
org/SynTHy); the distribution of gene ages plot (http:
//www.genestation.org/analysis/gene/age), and the
distribution of available pregnancy related expression studies
by keyword plot (http://www.genestation.org/analysis/gene/
expression). To support these custom visualizations,
additional html/css/js libraries were included (see Materials and
SYNTHESIS AND ANALYSIS
The promise of GEneSTATION is that the rapid
exploration of its diverse data types will allow users to generate
a synthetic view of the genetic networks underlying
pregnancy and its pathologies. Users with lists of genes obtained
from experimental results, e.g. differential expression using
RNA-seq, but short of fully developed hypotheses, can
submit lists of candidate genes for enrichment analysis (http:
//www.genestation.org/analysis/gene/set) across the various
data types (e.g. gene age, tissue expression, differential
expression and methylation in disease, GO annotation,
protein interactions) and examine statistically significant
associations (see Materials and Methods). It has long been
recognized that such interactive data exploration phases such
as GEneSTATION provides are not only important in the
analysis of complex data sets, but also in the formation of
new hypotheses (41).
Alternatively, users may visit GEneSTATION with a
specific hypothesis about genes involved in a particular process
or pathology, or develop one while browsing through the
gene pages. With GEneSTATION, finding candidate genes
that test a hypothesis has been rendered intuitive and quick
by the development of SynTHy (after Synthesize and Test
Hypotheses; http://www.genestation.org/SynTHy), a novel
tool that goes far beyond typical search tools by visualizing
the distribution of the underlying data, giving immediate
visual feedback and showing how the various components of
a hypothesis impact the resulting list of genes. Rapid
exploration of multiple variations on a hypothesis facilitates the
development of an integrated view of the genetic
relationships underlying the many different data types. For example,
a user could ask whether genes with SNPs that have very
different frequencies (high FST) in populations with high
preterm birth rates versus populations with lower preterm
birth rates (42) are also preferentially expressed in the
placenta or arose in the mammalian ancestor. Any gene list
results generated using SynTHy can be easily transferred to
the gene set analysis tool for further refinement and
exploration. SynTHy thus allows users to ‘find the question’ as
readily as to find the answers (41). SynTHy is not intended
to fully replace careful statistical analysis using original data
but rather to synthesize disparate but high-quality data and
analyses and facilitate rapid exploration.
To facilitate specifically tailored statistical analyses,
GEneSTATION makes all data available for easy
download in JSON format using the download button on the
bottom left of each gene page, analysis page and SynTHy
SUMMARY AND FUTURE PERSPECTIVES
Understanding the complex functional landscape of
pregnancy, how abnormalities of pregnancy arise, or how the
biological mechanisms of gestation evolve and translate
between species can be greatly augmented by the integration
and synthesis of multiple types of experimental data,
genomic data, and evolutionary analyses. Importantly, such
genome-scale data sets are becoming more frequent and
current funding priorities will only accelerate this trend.
Consequently, populating GEneSTATION with additional
high-quality evolutionary, organismal and molecular data
sets is an active and ongoing process, with transcriptomic,
proteomic, and imaging data being a high priority. In
parallel, we are developing algorithms that will point users
interested in a particular gene to other genes or biological
processes with similar functional or evolutionary
characteristics. Furthermore, GEneSTATION’s integration of
evolutionary and experimental data will support the development
of algorithms that evaluate the likelihood that those specific
biological systems or medical interventions that work in a
model organism such as mouse or macaque will also work
in human pregnancy.
In summary, GEneSTATION facilitates integrative
analyses that draw from many types of data, providing a novel
platform and paradigm for comprehensive
understanding of pregnancy across mammals. It is our hope that
GEneSTATION’s synthesis becomes a catalyst for the
identification and evaluation of candidate genes by biologists
interested in the function and evolution of mammalian
pregnancy, as well as its complications. More generally,
GEneSTATION’s synthetic focus on a specific biological
process has the potential to become a model for databases
aimed at synthesizing the diverse types of biology’s ‘big
data’ for a wide variety of biological processes.
MATERIALS AND METHODS
Publically available data on GEneSTATION
Publically available data that did not need reanalysis or
normalization (e.g. gene age, dN/dS, Protein Atlas) were
added to GEneSTATION without modification. Details
about these data are listed in the online methods page along
with links to the original data source. In addition, the
methods page describes in more detail data sets that may be
difficult to interpret for non-specialists or that has potential
caveats for interpretation. Small information icons on each
analysis page or in the relevant subsection of gene pages
provide links to both the methods page and the original
Sources of organism life history data
Mean gestation length and standard deviations were taken
from primary accounts in the research literature focused
on reproductive characteristics of each individual species.
Neonate development state was either recorded directly
from the literature or inferred using average litter size as a
proxy (43). For all species, placental type and shape were
taken from (1). For non-primates, adult body mass was
taken from the PanTHERIA database, as well as average
litter sizes, interbirth intervals and maximum longevity for
many species (44). For primates, body mass data specifically
reflect adult female body mass (45). Finally, data on the
timing of brain growth across mammals were taken from (46).
To provide consistent genome wide analyses of human
genetic variation, variant call format (VCF) files, with
coordinates lifted to genome build GRCh38, were downloaded
from the 1000 Genomes Project FTP site (ftp://ftp-trace.
ncbi.nih.gov/1000genomes/ftp/), representing variants for
all 2504 unrelated individuals in the 1000 Genome Project
Phase 3 cohort (47). Variants were filtered to exclude
nonSNP variants, fixed sites, and sites with uncalled or
unphased genotypes. Variants were validated against dbSNP
build 144 (48) using the ValidateVariants tool in Genome
Analysis Toolkit (49). VCFtools (50) was used to calculate
pairwise FST statistics (51).
We reanalyzed all microarray datasets that were
downloaded from NCBI’s Gene Expression Omnibus (GEO) (52)
using the R package GEOquery. Ambiguous probes that
map to multiple genes were discarded. For multiple probes
mapping to a single gene, the probe with median
significance value was reported. Pairwise differential expression
statistics were computed using the eBayes algorithm in the
The data for GEneSTATION is stored in PostgreSQL,
a highly reliable and durable open source database. For
consistent integration of multiple types of data, we use
a highly normalized and non-redundant biology focused
schema, Chado (53), which is collaboratively developed by
the Generic Model Organism Project (54). We have
written extensions to this schema so that GEneSTATION can
handle the large numbers of genomes and diverse associated
data (currently 3.4Tb) already loaded as well as those to be
added in the future.
GEneSTATION’s large datasets have required the
development of high-performance custom tools for data loading,
which are built using C++ with libpq, and allow loading
of genome-wide datasets in seconds and SNP datasets (e.g.
1000 genomes) in minutes. Python with SQLAlchemy
provides a flexible pipeline for loading the highly varied data
sources in GEneSTATION, annotations, database
crossreferences, and controlled vocabularies. GEneSTATION
loads both standard data formats, e.g. GFF, FSTA, OBO,
and generic formats using JSON files for metadata and
tabdelimited files for the data, which facilitates integration of
diverse analysis pipelines.
GEneSTATION also uses Elasticsearch, a distributed,
in-memory search engine for full-text search and as
a store for precalculated, denormalized SQL queries
that are computationally intensive. Elasticsearch allows
GEneSTATION to respond to advanced queries from users,
including boolean and full-text, with lower latency and
higher throughput than is possible with PostgreSQL alone.
The web interface to the GEneSTATION database is
delivered by a high performance custom server written in Go, a
new language developed at Google to power their web
services. The server handles querying both PostgreSQL (using
sqlx) and Elasticsearch (using elastic), parsing user input,
and performing custom analyses on-the-fly (e.g., analysis
of user submitted gene sets). The server provides an
NCBIlike query language to query the Elasticsearch search engine
and provides uniform access to all GEneSTATION data via
a REST interface. The server caches most pages, reducing
latency to much less than a second in most cases.
The foundation of the user interface is Bootstrap, an
which allows consistent visual layouts even on mobile
devices and provides tools for rich user interaction, e.g. tabs,
tooltips, popovers, while supporting users on both
handheld devices (e.g. iPads and iPhones) and older browsers.
Additional functionality was achieved with specialized
html/css/js libraries, which include: autocompeter.js for
immediate search results feedback and autocomplete;
multiple libraries, selectize.js, autosize.js, d3.js, venn.js, and
react.js where used to build the SynThy tool;
multiple libraries, jquery.js, jquery.form.js,
responsive-bootstraptoolkit.js, jquery-highlighttextarea.js, jquery-hoverIntent.js
and richer user interaction.
Visualization Data in charts and graphs are presented using highcharts.js, data in table formats are displayed with jquery.dataTables.js and the interactive organisms page uses jquery.mixitup.js.
Gene set analysis
P-values for gene set analyses are calculated using the
cumulative hypergeometric distribution (similar to Fisher’s Exact
Test). The background is adjusted for each set to match the
number of genes reported in each data set, either analysis
We are grateful to the investigators of the March of
Dimes Prematurity Research Center Ohio Collaborative,
the March of Dimes leadership and boards of external
reviewers, as well as to members of the Rokas lab for their
constructive feedback during the construction and
development of GEneSTATION. This work was conducted in part
using the resources of the Advanced Computing Center for
Research and Education at Vanderbilt University.
March of Dimes, as part of the March of Dimes
Prematurity Research Center Ohio Collaborative. Funding for
open access charge: March of Dimes, as part of the March
of Dimes Prematurity Research Center Ohio Collaborative
Conflict of interest statement. None declared.
1. Mossman , H.W. ( 1987 ) Vertebrate Fetal Membranes: Comparative Ontogeny and Morphology; Evolution; Phylogenetic Significance; Basic Functions . Research Opportunities Rutgers University Press, New Brunswick, N.J.
2. Carter , A.M. and Mess , A. ( 2007 ) Evolution of the placenta in eutherian mammals . Placenta , 28 , 259 - 262 .
3. Gundling , W.E. and Wildman , D.E. ( 2015 ) A review of inter- and intraspecific variation in the eutherian placenta . Philos. Trans. R. Soc. Lond. B. Biol. Sci. , 370 , 20140072 .
4. Wildman , D.E. , Chen , C. , Erez , O. , Grossman , L.I. , Goodman , M. and Romero , R. ( 2006 ) Evolution of the mammalian placenta revealed by phylogenetic analysis . Proc. Natl. Acad. Sci. U.S.A. , 103 , 3203 - 3208 .
5. Wittman , A.B. and Wall , L.L. ( 2007 ) The evolutionary origins of obstructed labor: bipedalism, encephalization, and the human obstetric dilemma . Obstet. Gynecol. Surv. , 62 , 739 - 748 .
6. Haig , D. ( 2007 ) Intimate relations: Evolutionary conflicts of pregnancy and childhood . In: Stearns,SC and Koella ,JC (eds). Evolution in Health and Disease . Oxford University Press, pp. 65 - 76 .
7. Brown , E.A. , Ruvolo , M. and Sabeti , P.C. ( 2013 ) Many ways to die, one way to arrive: how selection acts through pregnancy . Trends Genet. TIG , 29 , 585 - 592 .
8. Romero , R. , Dey , S.K. and Fisher , S.J. ( 2014 ) Preterm labor: one syndrome, many causes . Science , 345 , 760 - 765 .
9. Spong , C.Y. ( 2013 ) Defining 'term' pregnancy: recommendations from the Defining 'Term' Pregnancy Workgroup . JAMA, 309 , 2445 - 2446 .
10. Beck , S. , Wojdyla , D. , Say , L. , Betran , A.P. , Merialdi , M. , Requejo , J.H. , Rubens , C. , Menon , R. and Van Look , P.F.A. ( 2010 ) The worldwide incidence of preterm birth: a systematic review of maternal mortality and morbidity . Bull. World Health Organ ., 88 , 31 - 38 .
11. Martin , J.A. , Hamilton , B.E. , Ventura , S.J. , Osterman , M.J.K. and Mathews , T.J. ( 2013 ) Births: final data for 2011 . Natl. Vital Stat. Rep. Cent. Dis. Control Prev. Natl. Cent. Health Stat. Natl. Vital Stat. Syst. , 62 , 1 - 69 .
12. Guttmacher , A.E. , Maddox , Y.T. and Spong , C.Y. ( 2014 ) The Human Placenta Project: placental structure, development, and function in real time . Placenta , 35 , 303 - 304 .
13. Treloar , S.A. , Macones ,G.A., Mitchell, L.E. and Martin , N.G. ( 2000 ) Genetic influences on premature parturition in an Australian twin sample . Twin Res. Off. J. Int. Soc. Twin Stud ., 3 , 80 - 82 .
14. Allen , C.M. and Founds , S.A. ( 2013 ) Genetics and preterm birth . J. Obstet. Gynecol. Neonatal Nurs. JOGNN NAACOG , 42 , 730 - 736 .
15. Knox , K. and Baker , J.C. ( 2008 ) Genomic evolution of the placenta using co-option and duplication and divergence . Genome Res ., 18 , 695 - 705 .
16. Hannibal , R.L. , Chuong , E.B. , Rivera-Mulia , J.C. , Gilbert , D.M. , Valouev , A. and Baker , J.C. ( 2014 ) Copy Number Variation Is a Fundamental Aspect of the Placental Genome . PLoS Genet , 10 , e1004290.
17. Elliot , M.G. and Crespi , B.J. ( 2015 ) Genetic recapitulation of human pre-eclampsia risk during convergent evolution of reduced placental invasiveness in eutherian mammals . Philos. Trans. R. Soc. Lond. B Biol. Sci. , 370 , 20140069 .
18. Gallant , J.R. , Traeger , L.L. , Volkening , J.D. , Moffett , H. , Chen , P.-H. , Novina , C.D. , Phillips , G.N. , Anand , R. , Wells , G.B. , Pinch , M. et al. ( 2014 ) Genomic basis for the convergent evolution of electric organs . Science , 344 , 1522 - 1525 .
19. Lynch , V.J. , Nnamani , M.C. , Kapusta , A. , Brayer , K. , Plaza , S.L. , Mazur , E.C. , Emera , D. , Sheikh , S.Z. , Gr u¨tzner, F. , Bauersachs , S. et al. ( 2015 ) Ancient transposable elements transformed the uterine regulatory landscape and transcriptome during the evolution of mammalian pregnancy . Cell Rep. , 10 , 551 - 561 .
20. Phillips ,J.B., Abbot , P. and Rokas , A. ( 2015 ) Is preterm birth a human-specific syndrome? Evol . Med . Public Health , 2015 , 136 - 148 .
21. Chaudhari , B.P. , Plunkett , J. , Ratajczak , C.K. , Shen , T.T. , DeFranco , E.A. and Muglia , L.J. ( 2008 ) The genetics of birth timing: insights into a fundamental component of human development . Clin. Genet ., 74 , 493 - 501 .
22. Bezold , K.Y. , Karjalainen , M.K. , Hallman , M. , Teramo , K. and Muglia , L.J. ( 2013 ) The genomics of preterm birth: from animal models to human studies . Genome Med ., 5 , 34 .
23. Ouyang , Y. , Mouillet , J.-F. , Coyne ,C.B. and Sadovsky , Y. ( 2014 ) Review: placenta-specific microRNAs in exosomes - good things come in nano-packages . Placenta , 35 (Suppl), S69 - S73 .
24. Kosova , G. , Stephenson , M.D. , Lynch , V.J. and Ober , C. ( 2015 ) Evolutionary forward genomics reveals novel insights into the genes and pathways dysregulated in recurrent early pregnancy loss . Hum. Reprod. Oxf. Engl. , 30 , 519 - 529 .
25. Gracie , S. , Pennell , C. , Ekman-Ordeberg , G. , Lye , S. , McManaman , J. , Williams , S. , Palmer , L. , Kelley , M. , Menon , R. , Gravett , M. et al. ( 2011 ) An integrated systems biology approach to the study of preterm birth using '-omic' technology-a guideline for research . BMC Pregnancy Childbirth , 11 , 71 .
26. Furlong , L.I. ( 2013 ) Human diseases through the lens of network biology . Trends Genet. TIG , 29 , 150 - 159 .
27. Rosenberg , K. and Trevathan , W. ( 2002 ) Birth, obstetrics and human evolution . BJOG Int. J. Obstet. Gynaecol. , 109 , 1199 - 1206 .
28. Gene Ontology Consortium . ( 2015 ) Gene Ontology Consortium: going forward . Nucleic Acids Res ., 43 , D1049 - D1056 .
29. Szklarczyk , D. , Franceschini , A. , Wyder , S. , Forslund , K. , Heller , D. , Huerta-Cepas , J. , Simonovic , M. , Roth , A. , Santos , A. , Tsafou , K.P. et al. ( 2015 ) STRING v10: protein-protein interaction networks, integrated over the tree of life . Nucleic Acids Res ., 43 , D447 - D452 .
30. Brown , G.R. , Hem , V. , Katz , K.S. , Ovetsky , M. , Wallin , C. , Ermolaeva , O. , Tolstoy , I. , Tatusova , T. , Pruitt , K.D. , Maglott , D.R. et al. ( 2015 ) Gene: a gene-centered information resource at NCBI . Nucleic Acids Res ., 43 , D36 - D42 .
31. NCBI Resource Coordinators . ( 2015 ) Database resources of the National Center for Biotechnology Information . Nucleic Acids Res ., 43 , D6 - D17 .
32. The UniProt Consortium . ( 2015 ) UniProt: a hub for protein information . Nucleic Acids Res ., 43 , D204 - D212 .
33. Schreiber , F. , Patricio , M. , Muffato , M. , Pignatelli , M. and Bateman , A. ( 2014 ) TreeFam v9: a new website, more species and orthology-on-the-fly . Nucleic Acids Res ., 42 , D922 - D925 .
34. Uzun , A. , Laliberte , A. , Parker , J. , Andrew , C. , Winterrowd , E. , Sharma , S. , Istrail , S. and Padbury , J.F. ( 2012 ) dbPTB: a database for preterm birth . Database J. Biol. Databases Curation , 2012 , bar069.
35. Dolan , S.M. , Hollegaard , M.V. , Merialdi , M. , Betran , A.P. , Allen , T. , Abelow , C. , Nace , J. , Lin , B.K. , Khoury , M.J. , Ioannidis , J.P.A. et al. ( 2010 ) Synopsis of preterm birth genetic association studies: the preterm birth genetics knowledge base (PTBGene) . Public Health Genomics , 13 , 514 - 523 .
36. Safran , M. , Dalah , I. , Alexander , J. , Rosen , N. , Stein , T.I. , Shmoish , M. , Nativ , N. , Bahir , I. , Doniger , T. , Krug , H. et al. ( 2010 ) GeneCards Version 3: the human gene integrator . Database , 2010 , baq020.
37. Costanzo , M.C. , Engel , S.R. , Wong , E.D. , Lloyd , P. , Karra , K. , Chan ,E.T., Weng , S. , Paskov , K.M. , Roe , G.R. , Binkley , G. et al. ( 2014 ) Saccharomyces genome database provides new regulation data . Nucleic Acids Res ., 42 , D717 - D725 .
38. Harris , T.W. , Baran , J. , Bieri , T. , Cabunoc , A. , Chan,J., Chen , W.J. , Davis , P. , Done , J. , Grove , C. , Howe , K. et al. ( 2014 ) WormBase 2014: new views of curated biology . Nucleic Acids Res ., 42 , D789 - D793 .
39. dos Santos, G. , Schroeder , A.J. , Goodman , J.L. , Strelets , V.B. , Crosby , M.A. , Thurmond , J. , Emmert , D.B. , Gelbart , W.M. and FlyBase Consortium . ( 2015 ) FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations . Nucleic Acids Res ., 43 , D690 - D697 .
40. Eidem , H.R. , Ackerman , W.E. , McGary , K.L. , Abbot , P. and Rokas , A. ( 2015 ) Gestational tissue transcriptomics in term and preterm human pregnancies: a systematic review and meta-analysis . BMC Med. Genomics , 8 , 27 .
41. Tukey , J.W. ( 1980 ) We Need Both Exploratory and Confirmatory . Am. Stat., 34 , 23 - 25 .
42. Patel , R.R. , Steer , P. , Doyle , P. , Little , M.P. and Elliott , P. ( 2004 ) Does gestation vary by ethnic group? A London-based study of over 122 000 pregnancies with spontaneous onset of labour . Int. J. Epidemiol. , 33 , 107 - 113 .
43. M u¨ller, D. , Codron , D. , Werner , J. , Fritz , J. and Hummel , J. ( 2012 ) Dichotomy of eutherian reproduction and metabolism . Oikos.
44. Kate , E. and Jones , J.B. ( 2009 ) PanTHERIA: A species-level database of life history, ecology, and geography of extant and recently extinct mammals . Ecology , 90 , 2648 .
45. Smith, R.J. and Jungers , W.L. ( 1997 ) Body mass in comparative primatology . J. Hum. Evol. , 32 , 523 - 559 .
46. Dobbing , J. ( 1990 ) Vulnerable Periods in Developing Brain . In: Dobbing,J (ed). Brain, Behaviour, and Iron in the Infant Diet . Springer, London, pp. 1 - 27 .
47. International HapMap 3 Consortium, Altshuler, D.M. , Gibbs , R.A. , Peltonen , L. , Altshuler , D.M. , Gibbs , R.A. , Peltonen , L. , Dermitzakis , E. , Schaffner , S.F. , Yu , F. et al. ( 2010 ) Integrating common and rare genetic variation in diverse human populations . Nature , 467 , 52 - 58 .
48. Sherry , S.T. , Ward , M.H. , Kholodov , M. , Baker , J. , Phan , L. , Smigielski , E.M. and Sirotkin , K. ( 2001 ) dbSNP: the NCBI database of genetic variation . Nucleic Acids Res ., 29 , 308 - 311 .
49. McKenna , A. , Hanna , M. , Banks , E. , Sivachenko , A. , Cibulskis , K. , Kernytsky , A. , Garimella , K. , Altshuler , D. , Gabriel , S. , Daly , M. et al. ( 2010 ) The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data . Genome Res ., 20 , 1297 - 1303 .
50. Danecek , P. , Auton , A. , Abecasis , G. , Albers , C.A. , Banks , E. , DePristo , M.A. , Handsaker , R.E. , Lunter , G. , Marth , G.T. , Sherry , S. T . et al. ( 2011 ) The variant call format and VCFtools . Bioinforma . Oxf. Engl., 27 , 2156 - 2158 .
51. Cockerham ,C.C. and Weir , B.S. ( 1984 ) Covariances of relatives stemming from a population undergoing mixed self and random mating . Biometrics , 40 , 157 - 164 .
52. Barrett , T. , Wilhite , S.E. , Ledoux , P. , Evangelista , C. , Kim , I.F. , Tomashevsky , M. , Marshall , K.A. , Phillippy , K.H. , Sherman , P.M. , Holko , M. et al. ( 2013 ) NCBI GEO: archive for functional genomics data sets--update . Nucleic Acids Res ., 41 , D991 - D995 .
53. Mungall , C.J. , Emmert , D.B. and FlyBase Consortium . ( 2007 ) A Chado case study: an ontology-based modular schema for representing genome-associated biological information . Bioinforma. Oxf. Engl. , 23 , i337 - i346 .
54. Stein , L.D. , Mungall , C. , Shu , S. , Caudy , M. , Mangone , M. , Day , A. , Nickerson , E. , Stajich , J.E. , Harris , T.W. , Arva , A. et al. ( 2002 ) The generic genome browser: a building block for a model organism system database . Genome Res ., 12 , 1599 - 1610 .