Flynet: a genomic resource for Drosophila melanogaster transcriptional regulatory networks

Bioinformatics, Nov 2009

Motivation: The highly coordinated expression of thousands of genes in an organism is regulated by the concerted action of transcription factors, chromatin proteins and epigenetic mechanisms. High-throughput experimental data for genome wide in vivo protein–DNA interactions and epigenetic marks are becoming available from large projects, such as the model organism ENCyclopedia Of DNA Elements (modENCODE) and from individual labs. Dissemination and visualization of these datasets in an explorable form is an important challenge. Results: To support research on Drosophila melanogaster transcription regulation and make the genome wide in vivo protein–DNA interactions data available to the scientific community as a whole, we have developed a system called Flynet. Currently, Flynet contains 101 datasets for 38 transcription factors and chromatin regulator proteins in different experimental conditions. These factors exhibit different types of binding profiles ranging from sharp localized peaks to broad binding regions. The protein–DNA interaction data in Flynet was obtained from the analysis of chromatin immunoprecipitation experiments on one color and two color genomic tiling arrays as well as chromatin immunoprecipitation followed by massively parallel sequencing. A web-based interface, integrated with an AJAX based genome browser, has been built for queries and presenting analysis results. Flynet also makes available the cis-regulatory modules reported in literature, known and de novo identified sequence motifs across the genome, and other resources to study gene regulation. Contact: grossman{at}uic.edu Availability: Flynet is available at https://www.cistrack.org/flynet/. Supplementary information: Supplementary data are available at Bioinformatics online.

Article PDF cannot be displayed. You can download it here:

https://bioinformatics.oxfordjournals.org/content/25/22/3001.full.pdf

Flynet: a genomic resource for Drosophila melanogaster transcriptional regulatory networks

Feng Tian 1 2 Parantu K. Shah 0 3 Xiangjun Liu 1 2 Nicolas Negre 0 3 Jia Chen 1 Oleksiy Karpenko 1 Kevin P. White 0 3 Robert L. Grossman 0 1 Associate Editor: Limsoon Wong 0 Institute for Genomics & Systems Biology, The University of Chicago, Cummings Life Sciences Center 431A , 920 East 58th Street, Chicago, IL 60637 1 National Center for Data Mining, University of Illinois at Chicago , MC 249, 851 South Morgan Street, Chicago, IL 60607-7045 2 School of Medicine, Tsinghua University , Beijing, China 100084 3 Department of Human Genetics and Department of Ecology and Evolution, Cummings Life Sciences Center 5th Floor , 920 East 58th Street, Chicago, IL 60637, USA Motivation: The highly coordinated expression of thousands of genes in an organism is regulated by the concerted action of transcription factors, chromatin proteins and epigenetic mechanisms. High-throughput experimental data for genome wide in vivo protein-DNA interactions and epigenetic marks are becoming available from large projects, such as the model organism ENCyclopedia Of DNA Elements (modENCODE) and from individual labs. Dissemination and visualization of these datasets in an explorable form is an important challenge. Results: To support research on Drosophila melanogaster transcription regulation and make the genome wide in vivo proteinDNA interactions data available to the scientific community as a whole, we have developed a system called Flynet. Currently, Flynet contains 101 datasets for 38 transcription factors and chromatin regulator proteins in different experimental conditions. These factors exhibit different types of binding profiles ranging from sharp localized peaks to broad binding regions. The proteinDNA interaction data in Flynet was obtained from the analysis of chromatin immunoprecipitation experiments on one color and two color genomic tiling arrays as well as chromatin immunoprecipitation followed by massively parallel sequencing. A web-based interface, integrated with an AJAX based genome browser, has been built for queries and presenting analysis results. Flynet also makes available the cis-regulatory modules reported in literature, known and de novo identified sequence motifs across the genome, and other resources to study gene regulation. Contact: Availability: Flynet is available at https://www.cistrack.org/flynet/. Supplementary information: Supplementary data are available at Bioinformatics online. The Author(s) 2009. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. 1 INTRODUCTION Metazoan genomes contain thousands of protein-coding and noncoding RNA genes, whose expression needs to be precisely controlled. Approximately 310% of the proteins in the metazoan known proteome are sequence specific transcription factors (TFs) (Kummerfeld and Teichmann, 2006), which bind to specific cis-regulatory DNA sequences and modulate the expression of their target genes. These cis-regulatory sequences are organized into cis-regulatory modules (CRM) containing one or more binding sites for a particular set of TFs. One example of CRMs are enhancers that determine a specific temporal-spatial expression pattern of their target gene (Wang et al., 2007). The various proteins that form the chromatin participate in the regulation of genes (Sims and Reinberg, 2008). For example, the histones forming the nucleosomes can be post-translationally modified to create a chromatin environment that will repress or activate the genes around them. The different associations of TFs with their cis-regulatory elements on the DNA can trigger, counteract or modulate these regulatory states of genes. Although detailed studies of individual genes have identified many of the components and basic principles that control transcription, we still lack an understanding of the global architecture of transcription regulatory networks (Babu et al., 2004). Drosophila melanogaster has been used extensively as a model organism to identify components and basic principles of transcription regulation. However, even after decades of research only 661 CRM sequences corresponding to 235 Drosophila genes and 778 transcription factor binding sites (TFBSs) are annotated in the Drosophila Cis-Regulatory Database (http://www.comp.nus.edu .sg/bioinfo/Drosophila/) that combines information from sources such as RedFly (Halfon et al., 2008), DNAse footprint database (Bergman et al., 2005) and Drosophila Cis-Regulatory Database (Narang et al., 2006). Chromatin Immunoprecipitation (ChIP) followed by microarray hybridization on the whole genome tiling arrays (ChIP-chip; Iyer et al., 2001; Ren et al., 2000) or followed by massively parallel DNA sequencing (ChIP-seq) (Johnson et al., 2007), are now established as powerful methods to identify all of the genomic regions bound by a protein of interest in a given condition (Keles, 2007). Genome wide proteinDNA interaction data and epigenetic marks are now available for many transcription factors and chromatin regulators for D.melanogaster as well as other species that are providing details on transcription regulation (Kim and Ren, 2006). Moreover, the National Human Genome Research Institute sponsored model organism ENCyclopedia Of DNA Elements (modENCODE; http://www.modencode.org) Project aims to identify the majority of the sequence-based functional elements in the Caenorhabditis elegans and D.melanogaster genomes. It is important therefore to develop tools for storing, organizing and analyzing these data sets and to make them available to the scientific community in a usable format. We have built Flynet as a part of our data management and visualization efforts for the modENCODE project, whose goal is to map the genome wide associations of a large set of the Drosophila sequence-specific TFs and chromatin regulator proteins. Flynet is the first public database for D.melanogaster in vivo protein DNA interaction data identified on the whole genome tiling arrays using ChIP-chip as well as ChIP-seq for a variety of transcription factors and chromatin regulator proteins in different experimental conditions. It also makes available known CRMs, well-known and de novo identified sequence motifs across the genome, and a list of transcription factors and chromatin regulator proteins in D.melanogaster genome, their domain assignments and their orthologs and paralogs across 12 Drosophila genomes in the form of multiple sequence alignments. In the following sections we describe the query interface, system architecture, and AJAX based genome browser, as well as tools and resources available as a part of Flynet. Flynet system architecture The Flynet data system is designed to be a general system for storing, anno (...truncated)


This is a preview of a remote PDF: https://bioinformatics.oxfordjournals.org/content/25/22/3001.full.pdf
Article home page: http://bioinformatics.oxfordjournals.org/content/25/22/3001.abstract

Feng Tian, Parantu K. Shah, Xiangjun Liu, Nicolas Negre, Jia Chen, Oleksiy Karpenko, Kevin P. White, Robert L. Grossman. Flynet: a genomic resource for Drosophila melanogaster transcriptional regulatory networks, Bioinformatics, 2009, pp. 3001-3004, 25/22, DOI: 10.1093/bioinformatics/btp469