Flynet: a genomic resource for Drosophila melanogaster transcriptional regulatory networks
Feng Tian
1
2
Parantu K. Shah
0
3
Xiangjun Liu
1
2
Nicolas Negre
0
3
Jia Chen
1
Oleksiy Karpenko
1
Kevin P. White
0
3
Robert L. Grossman
0
1
Associate Editor: Limsoon Wong
0
Institute for Genomics & Systems Biology, The University of Chicago, Cummings Life Sciences Center 431A
, 920 East 58th Street,
Chicago, IL 60637
1
National Center for Data Mining, University of Illinois at Chicago
, MC 249, 851 South Morgan Street,
Chicago, IL 60607-7045
2
School of Medicine, Tsinghua University
,
Beijing, China 100084
3
Department of Human Genetics and Department of Ecology and Evolution, Cummings Life Sciences Center 5th Floor
, 920 East 58th Street,
Chicago, IL 60637, USA
Motivation: The highly coordinated expression of thousands of genes in an organism is regulated by the concerted action of transcription factors, chromatin proteins and epigenetic mechanisms. High-throughput experimental data for genome wide in vivo protein-DNA interactions and epigenetic marks are becoming available from large projects, such as the model organism ENCyclopedia Of DNA Elements (modENCODE) and from individual labs. Dissemination and visualization of these datasets in an explorable form is an important challenge. Results: To support research on Drosophila melanogaster transcription regulation and make the genome wide in vivo proteinDNA interactions data available to the scientific community as a whole, we have developed a system called Flynet. Currently, Flynet contains 101 datasets for 38 transcription factors and chromatin regulator proteins in different experimental conditions. These factors exhibit different types of binding profiles ranging from sharp localized peaks to broad binding regions. The proteinDNA interaction data in Flynet was obtained from the analysis of chromatin immunoprecipitation experiments on one color and two color genomic tiling arrays as well as chromatin immunoprecipitation followed by massively parallel sequencing. A web-based interface, integrated with an AJAX based genome browser, has been built for queries and presenting analysis results. Flynet also makes available the cis-regulatory modules reported in literature, known and de novo identified sequence motifs across the genome, and other resources to study gene regulation. Contact: Availability: Flynet is available at https://www.cistrack.org/flynet/. Supplementary information: Supplementary data are available at Bioinformatics online. The Author(s) 2009. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 INTRODUCTION
Metazoan genomes contain thousands of protein-coding and
noncoding RNA genes, whose expression needs to be precisely
controlled. Approximately 310% of the proteins in the metazoan
known proteome are sequence specific transcription factors (TFs)
(Kummerfeld and Teichmann, 2006), which bind to specific
cis-regulatory DNA sequences and modulate the expression of their
target genes. These cis-regulatory sequences are organized into
cis-regulatory modules (CRM) containing one or more binding sites
for a particular set of TFs. One example of CRMs are enhancers
that determine a specific temporal-spatial expression pattern of their
target gene (Wang et al., 2007).
The various proteins that form the chromatin participate in the
regulation of genes (Sims and Reinberg, 2008). For example,
the histones forming the nucleosomes can be post-translationally
modified to create a chromatin environment that will repress or
activate the genes around them. The different associations of TFs
with their cis-regulatory elements on the DNA can trigger, counteract
or modulate these regulatory states of genes. Although detailed
studies of individual genes have identified many of the components
and basic principles that control transcription, we still lack an
understanding of the global architecture of transcription regulatory
networks (Babu et al., 2004).
Drosophila melanogaster has been used extensively as a
model organism to identify components and basic principles of
transcription regulation. However, even after decades of research
only 661 CRM sequences corresponding to 235 Drosophila genes
and 778 transcription factor binding sites (TFBSs) are annotated in
the Drosophila Cis-Regulatory Database (http://www.comp.nus.edu
.sg/bioinfo/Drosophila/) that combines information from sources
such as RedFly (Halfon et al., 2008), DNAse footprint database
(Bergman et al., 2005) and Drosophila Cis-Regulatory Database
(Narang et al., 2006).
Chromatin Immunoprecipitation (ChIP) followed by microarray
hybridization on the whole genome tiling arrays (ChIP-chip; Iyer
et al., 2001; Ren et al., 2000) or followed by massively parallel
DNA sequencing (ChIP-seq) (Johnson et al., 2007), are now
established as powerful methods to identify all of the genomic
regions bound by a protein of interest in a given condition
(Keles, 2007). Genome wide proteinDNA interaction data and
epigenetic marks are now available for many transcription factors
and chromatin regulators for D.melanogaster as well as other
species that are providing details on transcription regulation (Kim
and Ren, 2006). Moreover, the National Human Genome Research
Institute sponsored model organism ENCyclopedia Of DNA
Elements (modENCODE; http://www.modencode.org) Project aims
to identify the majority of the sequence-based functional elements
in the Caenorhabditis elegans and D.melanogaster genomes. It is
important therefore to develop tools for storing, organizing and
analyzing these data sets and to make them available to the scientific
community in a usable format.
We have built Flynet as a part of our data management and
visualization efforts for the modENCODE project, whose goal is to
map the genome wide associations of a large set of the Drosophila
sequence-specific TFs and chromatin regulator proteins. Flynet
is the first public database for D.melanogaster in vivo protein
DNA interaction data identified on the whole genome tiling arrays
using ChIP-chip as well as ChIP-seq for a variety of transcription
factors and chromatin regulator proteins in different experimental
conditions. It also makes available known CRMs, well-known
and de novo identified sequence motifs across the genome, and
a list of transcription factors and chromatin regulator proteins
in D.melanogaster genome, their domain assignments and their
orthologs and paralogs across 12 Drosophila genomes in the form of
multiple sequence alignments. In the following sections we describe
the query interface, system architecture, and AJAX based genome
browser, as well as tools and resources available as a part of Flynet.
Flynet system architecture
The Flynet data system is designed to be a general system for storing,
anno (...truncated)