Choosing a genome browser for a Model Organism Database: surveying the Maize community (pdf)

Article PDF cannot be displayed. You can download it here:

https://database.oxfordjournals.org/content/2010/baq007.full.pdf

Choosing a genome browser for a Model Organism Database: surveying the Maize community

Taner Z. Sen 1 2 Lisa C. Harper 0 5 Mary L. Schaeffer 3 4 Carson Darwin A. Campbell 2 Carolyn J. Lawrence 1 2 M. Andorf 2 Trent E. Seigfried 2 0 USDA-ARS Plant Gene Expression Center , 800 Buchanan Street, Albany, CA 94710 1 Department of Genetics, Development and Cell Biology, Bioinformatics and Computational Biology Program, Iowa State University , Ames, IA 50011 2 USDA-ARS Corn Insects and Crop Genetics Research Unit 3 Division of Plant Sciences, University of Missouri , Columbia, MO 65211, USA 4 USDA-ARS Plant Genetics Research Unit 5 Department of Molecular and Biology, University of California Berkeley , Berkeley, CA 94720 As the B73 maize genome sequencing project neared completion, MaizeGDB began to integrate a graphical genome browser with its existing web interface and database. To ensure that maize researchers would optimally benefit from the potential addition of a genome browser to the existing MaizeGDB resource, personnel at MaizeGDB surveyed researchers' needs. Collected data indicate that existing genome browsers for maize were inadequate and suggest implementation of a browser with quick interface and intuitive tools would meet most researchers' needs. Here, we document the survey's outcomes, review functionalities of available genome browser software platforms and offer our rationale for choosing the GBrowse software suite for MaizeGDB. Because the genome as represented within the MaizeGDB Genome Browser is tied to detailed phenotypic data, molecular marker information, available stocks, etc., the MaizeGDB Genome Browser represents a novel mechanism by which the researchers can leverage maize sequence information toward crop improvement directly. Database URL: http://gbrowse.maizegdb.org/ Introduction A genome browser is to genomic sequence data as a web browser is to the World Wide Web: both offer logical access to datastreams that are otherwise unintelligible. With the advent of new DNA sequencing technologies and the availability of copious amounts of sequence-based data from many species, genome browsers have been developed as a means for researchers to view, interact with, search through and display sequenced genomes as well as to compare syntenic or similar regions of genomes among related species. Various genome browsers have been created over the years, each with particular strengths and weaknesses. Many provide independent solutions for integrating and visualizing sequence-based data alongside genetic and phenotypic information. Community resources including Model Organism Databases (MODs) [e.g. TAIR (1), FlyBase (2), etc.], Clade-Oriented Databases (CODs) [e.g. Gramene (3), SGN (4), etc.], Automatic Annotation Shops [e.g. PlantGDB (5), JCVI (6, 7), etc.] and others have a responsibility to provide timely access to sequence data well-integrated with existing traditional biological data. Determining how best to choose genome browser software to meet the needs of users within the context of a groups maintenance capabilities is a major challenge for the groups working to build and maintain these community resources. Described here are the methodologies we used to determine which genome browser to implement at MaizeGDB (810), the MOD for maize. The need for a genome browser at MaizeGDB These are exciting times for maize researchers and breeders. Not only is maize a major crop worldwide; a reference genome sequence for the inbred line, B73, has been released [www.maizesequence.org; (11)]. As of August 2009, the minimum tiling path included 16 910 sequenced Bacterial Artificial Chromosome (BAC) and fosmid clones and encompassed 2.12 Gb or 93% of the 2.3 Gb B73 genome (12). The B73 pseudomolecules (12) are available through the Arizona Genomics Institute website (http:// www2.genome.arizona.edu/genomes/maize). Other whole-genome sequences include the shotgun sequences of an ancient popcorn landrace, Palomero Toluquen o (13) and the maize inbred line Mo17 (from JGI- the Joint Genome Institute, with D. Rohksar leading the group, http://www.phytozome.net/). In addition, an extensive haplotype map has been published for 27 lines of maize, enabling researchers to establish novel relations between genetic, physical and diversity data (14, 15). Other sequence-based resources include over 2 million public ESTs (http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary .html) and a large number of genic sequences from gene-enriched libraries (16, 17). Various research groups and consortia integrate large portions of these data sets, each in their own way. Examples include PlantGDB [(5); www.plantgdb.org], the Dana Farber [http://compbio.dfci.harvard.edu/tgi/tgipage.html; (18)], MAGI [http://magi.plantgenomics.iastate.edu/; (19)], NCBI RefSeq (20) and Uniprot (www.unitprot.org; The UniProt Consortium 2009). Integration of the large data sets, at a single location, with the information about the position, orientation and sequence of genes, genetic markers, variations and their association with phenotypic data would allow for a detailed understanding of the maize genome within its biological context, when presented as centrally accessible and simultaneously viewable. At the completion of the Maize Sequencing Project, it is anticipated that genomic data and gene models will be transferred from the Maize Genome Sequencing Consortiums project database MaizeSequence.org to MaizeGDB (810) and Gramene (3). As a federally funded, long-lived resource, MaizeGDB is tasked to serve maize geneticists and breeders longitudinal data access and analysis needs. To accomplish these tasks, MaizeGDB primarily relies on direct participation by members of the maize research community including the Maize Genetics Executive Committee (MGEC; a group tasked to identify both the needs and the opportunities for maize genetics and to communicate this information to the broadest possible life science community), the MaizeGDB Working Group (a panel that offers guidance for MaizeGDBs continued development), and direct interaction with individual researchers. Other databases, such as TAIR (1) and SGN (4) also rely on similar means to interact with and receive feedback from their communities. However, to the best of our knowledge, the MaizeGDB Working Group is fairly unique for a few reasons: the group (i) meets at least once yearly: many other database groups advisory boards are formed then fail to meet, (ii) documents guidance online (see http://www.maizegdb.org/working_group.php) and (iii) routinely allows representatives from other database groups and various funding agencies to observe their meetings. The successful guidance provided by the MaizeGDB Working Group has even inspired others including Soybase (21) and GRIN (http://www.ars-grin.gov/npgs/) to create similar guidance committees. Currently, MaizeGDB stores information on: loci (genes and other genetically-defined genomic regions including QTLs), variations (alleles and other sorts of polymorphisms), stocks, molecular markers and p (...truncated)