Choosing a genome browser for a Model Organism Database: surveying the Maize community
Taner Z. Sen
1
2
Lisa C. Harper
0
5
Mary L. Schaeffer
3
4
Carson Darwin A. Campbell
2
Carolyn J. Lawrence
1
2
M. Andorf
2
Trent E. Seigfried
2
0
USDA-ARS Plant Gene Expression Center
, 800 Buchanan Street,
Albany, CA 94710
1
Department of Genetics, Development and Cell Biology, Bioinformatics and Computational Biology Program, Iowa State University
, Ames,
IA 50011
2
USDA-ARS Corn Insects and Crop Genetics Research Unit
3
Division of Plant Sciences, University of Missouri
,
Columbia, MO 65211, USA
4
USDA-ARS Plant Genetics Research Unit
5
Department of Molecular and Biology, University of California Berkeley
,
Berkeley, CA 94720
As the B73 maize genome sequencing project neared completion, MaizeGDB began to integrate a graphical genome browser with its existing web interface and database. To ensure that maize researchers would optimally benefit from the potential addition of a genome browser to the existing MaizeGDB resource, personnel at MaizeGDB surveyed researchers' needs. Collected data indicate that existing genome browsers for maize were inadequate and suggest implementation of a browser with quick interface and intuitive tools would meet most researchers' needs. Here, we document the survey's outcomes, review functionalities of available genome browser software platforms and offer our rationale for choosing the GBrowse software suite for MaizeGDB. Because the genome as represented within the MaizeGDB Genome Browser is tied to detailed phenotypic data, molecular marker information, available stocks, etc., the MaizeGDB Genome Browser represents a novel mechanism by which the researchers can leverage maize sequence information toward crop improvement directly. Database URL: http://gbrowse.maizegdb.org/
Introduction
A genome browser is to genomic sequence data as a web
browser is to the World Wide Web: both offer logical access
to datastreams that are otherwise unintelligible. With the
advent of new DNA sequencing technologies and the
availability of copious amounts of sequence-based data from
many species, genome browsers have been developed as
a means for researchers to view, interact with, search
through and display sequenced genomes as well as to
compare syntenic or similar regions of genomes among related
species. Various genome browsers have been created over
the years, each with particular strengths and weaknesses.
Many provide independent solutions for integrating and
visualizing sequence-based data alongside genetic and
phenotypic information.
Community resources including Model Organism
Databases (MODs) [e.g. TAIR (1), FlyBase (2), etc.],
Clade-Oriented Databases (CODs) [e.g. Gramene (3), SGN
(4), etc.], Automatic Annotation Shops [e.g. PlantGDB (5),
JCVI (6, 7), etc.] and others have a responsibility to provide
timely access to sequence data well-integrated with
existing traditional biological data. Determining how best to
choose genome browser software to meet the needs of
users within the context of a groups maintenance
capabilities is a major challenge for the groups working to
build and maintain these community resources. Described
here are the methodologies we used to determine which
genome browser to implement at MaizeGDB (810),
the MOD for maize.
The need for a genome browser at MaizeGDB
These are exciting times for maize researchers and
breeders. Not only is maize a major crop worldwide; a reference
genome sequence for the inbred line, B73, has been
released [www.maizesequence.org; (11)]. As of August
2009, the minimum tiling path included 16 910 sequenced
Bacterial Artificial Chromosome (BAC) and fosmid clones
and encompassed 2.12 Gb or 93% of the 2.3 Gb B73
genome (12). The B73 pseudomolecules (12) are available
through the Arizona Genomics Institute website (http://
www2.genome.arizona.edu/genomes/maize). Other
whole-genome sequences include the shotgun sequences
of an ancient popcorn landrace, Palomero Toluquen o (13)
and the maize inbred line Mo17 (from JGI- the Joint
Genome Institute, with D. Rohksar leading the group,
http://www.phytozome.net/). In addition, an extensive
haplotype map has been published for 27 lines of maize,
enabling researchers to establish novel relations between
genetic, physical and diversity data (14, 15). Other
sequence-based resources include over 2 million public
ESTs (http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary
.html) and a large number of genic sequences from
gene-enriched libraries (16, 17). Various research groups
and consortia integrate large portions of these data sets,
each in their own way. Examples include PlantGDB [(5);
www.plantgdb.org], the Dana Farber
[http://compbio.dfci.harvard.edu/tgi/tgipage.html; (18)], MAGI
[http://magi.plantgenomics.iastate.edu/; (19)], NCBI RefSeq (20) and
Uniprot (www.unitprot.org; The UniProt Consortium
2009). Integration of the large data sets, at a single
location, with the information about the position, orientation
and sequence of genes, genetic markers, variations and
their association with phenotypic data would allow for a
detailed understanding of the maize genome within its
biological context, when presented as centrally accessible and
simultaneously viewable.
At the completion of the Maize Sequencing Project, it
is anticipated that genomic data and gene models will
be transferred from the Maize Genome Sequencing
Consortiums project database MaizeSequence.org to
MaizeGDB (810) and Gramene (3). As a federally funded,
long-lived resource, MaizeGDB is tasked to serve maize
geneticists and breeders longitudinal data access and
analysis needs. To accomplish these tasks, MaizeGDB primarily
relies on direct participation by members of the maize
research community including the Maize Genetics Executive
Committee (MGEC; a group tasked to identify both the
needs and the opportunities for maize genetics and to
communicate this information to the broadest possible life
science community), the MaizeGDB Working Group (a panel
that offers guidance for MaizeGDBs continued
development), and direct interaction with individual
researchers. Other databases, such as TAIR (1) and SGN (4)
also rely on similar means to interact with and receive
feedback from their communities. However, to the best of our
knowledge, the MaizeGDB Working Group is fairly unique
for a few reasons: the group (i) meets at least once yearly:
many other database groups advisory boards are formed
then fail to meet, (ii) documents guidance online (see
http://www.maizegdb.org/working_group.php) and (iii)
routinely allows representatives from other database
groups and various funding agencies to observe their
meetings. The successful guidance provided by the MaizeGDB
Working Group has even inspired others including
Soybase (21) and GRIN (http://www.ars-grin.gov/npgs/) to
create similar guidance committees.
Currently, MaizeGDB stores information on: loci (genes
and other genetically-defined genomic regions including
QTLs), variations (alleles and other sorts of polymorphisms),
stocks, molecular markers and p (...truncated)