Mining for novel candidate clock genes in the circadian regulatory network
Bhargava et al. BMC Systems Biology (2015) 9:78
DOI 10.1186/s12918-015-0227-2
RESEARCH ARTICLE
Open Access
Mining for novel candidate clock genes
in the circadian regulatory network
Anuprabha Bhargava1 , Hanspeter Herzel2 and Bharath Ananthasubramaniam1*
Abstract
Background: Most physiological processes in mammals are temporally regulated by means of a master circadian
clock in the brain and peripheral oscillators in most other tissues. A transcriptional-translation feedback network of
clock genes produces near 24 h oscillations in clock gene and protein expression. Here, we aim to identify novel
additions to the clock network using a meta-analysis of public chromatin immunoprecipitation sequencing (ChIP-seq),
proteomics and protein-protein interaction data starting from a published list of 1000 genes with robust
transcriptional rhythms and circadian phenotypes of knockdowns.
Results: We identified 20 candidate genes including nine known clock genes that received significantly high scores
and were also robust to the relative weights assigned to different data types. Our scoring was consistent with the
original ranking of the 1000 genes, but also provided novel complementary insights. Candidate genes were enriched
for genes expressed in a circadian manner in multiple tissues with regulation driven mainly by transcription factors
BMAL1 and REV-ERBα, β. Moreover, peak transcription of candidate genes was remarkably consistent across tissues.
While peaks of the 1000 genes were distributed uniformly throughout the day, candidate gene peaks were strongly
concentrated around dusk. Finally, we showed that binding of specific transcription factors to a gene promoter was
predictive of peak transcription at a certain time of day and discuss combinatorial phase regulation.
Conclusions: Combining complementary publicly-available data targeting different levels of regulation within the
circadian network, we filtered the original list and found 11 novel robust candidate clock genes. Using the criteria of
circadian proteomic expression, circadian expression in multiple tissues and independent gene knockdown data, we
propose six genes (Por, Mtss1, Dgat2, Pim3, Ppp1r3b, Upp2) involved in metabolism and cancer for further experimental
investigation. The availability of public high-throughput databases makes such meta-analysis a promising approach to
test consistency between sources and tap their entire potential.
Keywords: Mammalian circadian clock, Clock genes, Meta-analysis, Phase regulation
Background
The daily and seasonal geophysical variations have driven
the evolution of a circadian clock system in most organisms. These biological timekeepers permit organisms to
maintain near 24 h rhythms in most physiological processes and anticipate periodic changes in their environments. In mammals, the circadian system consists of
a master circadian timekeeper in the suprachiasmatic
nucleus (SCN) in the hypothalamus [1] and several slave
timekeepers distributed in multiple tissues throughout
*Correspondence:
1 Institute for Theoretical Biology, Charité Universitätsmedizin, Phillipstr. 13,
Haus 4, 10115 Berlin, Germany
Full list of author information is available at the end of the article
the body, such as the liver, lungs and kidney [2, 3].
Nevertheless, there is a common underlying mechanism
producing circadian rhythms in these tissues based on
transcriptional-translational feedback loops (TTFL) [4].
In the TTFL, the protein products of several genes inhibits
their own transcription after associated delays due to transcription, translation and post-translational modification.
After the identification of the SCN master clock [5], the
first component of the TTFL was identified as the gene
Clock [6].
Subsequently, other “core” members of the TTFL,
such as the repressors Period (Per1,Per2,Per3) and Cryptochrome (Cry1,Cry2,Cry3), activators Arntl (Bma1) and
Npas2, and nuclear receptors Rev-erb (Nr1d1, Nr1d2)
© 2015 Bhargava et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Bhargava et al. BMC Systems Biology (2015) 9:78
and Ror (Rora, Rorb, Rorc) were established. There has
been continued interest in finding new members of the
TTFL not only for better understanding the mammalian
circadian clock, but also because mutations in these
core genes have been linked to several disorders [7].
While Clock was discovered by costly and laborious forward genetic screens by Joseph Takahashi and colleagues
[6], current high-throughput data from genetics, transcriptomics and proteomics and availability of the entire
genome combined with system biological approaches
have tremendously accelerated our ability to find new
putative members of the TTFL [8]. Recently, Anafi et al.
used probabilistic machine learning to identify a putative
clock member and subsequently experimentally verified
it to discover the novel clock gene CHRONO [9]. Similar bioinformatic approaches were used to identify novel
circadian genes from microarray data [10] and using coexpression data and text-mining [11], to find circadian
genes disrupted in cancer cell lines [12] and to find health
implications of disrupted clock genes [13].
In this work, we aim to filter the list of a 1000 putative
clock genes from [9] to determine the strongest candidates for further experimental validation. We do this by
including other sources of high-throughput data, such as
chromatin immunoprecipitation sequencing (ChIP-seq),
proteomic and protein-protein interaction (PPI) data,
not included in the original machine-learning procedure
of [9]. We combined metrics for different data sources
using a simple scoring scheme and shortlisted P450
cytochrome oxidoreductase (Por), metastasis suppressor
1 (Mtss1), proviral integration site 3 (Pim3), Diacylglycerol
O-acyltransferase 2 (Dgat2), protein phosphatase 1 regulatory sub-unit 3b (Ppp1r3b) and uridine phosphorylase 2
(Upp2).
Method
We started our meta-analysis from the list of 1000 putative clock genes identified by Anafi et al. (Table S2 in [9]),
henceforth referred to as the ‘master list’. Anafi and colleagues compiled this master list by combining Bayesian
scores representing five features necessary in a clock gene:
(i) oscillating transcripts in liver, pituitary and NIH3T3
cells; (ii) a circadian phenotype in response to RNA
interference (RNAi) of the gene; (iii) significant number
of functional genetic interactions with an exemplar list
of known “core” clock genes ba (...truncated)