Surveying the Maize community for their diversity and pedigree visualization needs to prioritize tool development and curation
Database, 2017, 1–7
doi: 10.1093/database/bax031
Original article
Original article
Surveying the Maize community for their
diversity and pedigree visualization needs to
prioritize tool development and curation
Taner Z. Sen1,2,3,*, Bremen L. Braun1, David A. Schott1,4,
John L. Portwood, II1, Mary L. Schaeffer5,6, Lisa C. Harper1,
Jack M. Gardiner7, Ethalinda K. Cannon1,4 and Carson M. Andorf1,4
1
U.S. Department of Agriculture- Agricultural Research Service (USDA-ARS) Corn Insects and Crop
Genetics Research Unit, Iowa State University, Ames, IA 50011, USA, 2Department of Genetics,
Development and Cell Biology, Iowa State University, Ames, IA 50011, USA, 3Bioinformatics and
Computational Biology Program, Iowa State University, Ames, IA 50011, USA, 4Department of Computer
Science, Iowa State University, Ames, IA 50011, USA, 5USDA-ARS Plant Genetics Research Unit,
University of Missouri, Columbia, MO 65211, USA, 6Division of Plant Sciences, Department of
Agronomy, University of Missouri, Columbia, MO 65211, USA and 7Division of Animal Sciences,
University of Missouri, Columbia, MO 65211, USA
*Corresponding author: Tel.: +1 (510) 559-5982; Email:
Present address: Taner Z. Sen, USDA-ARS, Western Regional Research Center, Crop Improvement and Genetics
Research Unit, 800 Buchanan St., Albany, CA 94710, USA.
Citation details: Sen,T.Z., Braun,B.L., Schott,D.A., et al. Surveying the Maize community for their diversity and pedigree
visualization needs to prioritize tool development and curation. Database (2017) Vol. 2017: article ID bax031; doi:10.1093/
database/bax031
Received 26 October 2016; Revised 20 March 2017; Accepted 25 March 2017
Abstract
The Maize Genetics and Genomics Database (MaizeGDB) team prepared a survey to
identify breeders’ needs for visualizing pedigrees, diversity data and haplotypes in order
to prioritize tool development and curation efforts at MaizeGDB. The survey was distributed to the maize research community on behalf of the Maize Genetics Executive
Committee in Summer 2015. The survey garnered 48 responses from maize researchers,
of which more than half were self-identified as breeders. The survey showed that the
maize researchers considered their top priorities for visualization as: (i) displaying single
nucleotide polymorphisms in a given region for a given list of lines, (ii) showing haplotypes for a given list of lines and (iii) presenting pedigree relationships visually. The survey also asked which populations would be most useful to display. The following two
populations were on top of the list: (i) 3000 publicly available maize inbred lines used in
Romay et al. (Comprehensive genotyping of the USA national maize inbred seed bank.
Genome Biol, 2013;14:R55) and (ii) maize lines with expired Plant Variety Protection Act
(ex-PVP) certificates. Driven by this strong stakeholder input, MaizeGDB staff are currently working in four areas to improve its interface and web-based tools: (i) presenting
Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
Page 1 of 7
(page number not for citation purposes)
Page 2 of 7
Database, Vol. 2017, Article ID bax031
immediate progenies of currently available stocks at the MaizeGDB Stock pages, (ii) displaying the most recent ex-PVP lines described in the Germplasm Resources Information
Network (GRIN) on the MaizeGDB Stock pages, (iii) developing network views of pedigree relationships and (iv) visualizing genotypes from SNP-based diversity datasets.
These survey results can help other biological databases to direct their efforts according
to user preferences as they serve similar types of data sets for their communities.
Database URL: https://www.maizegdb.org
The cost of generating DNA-based biological data types is
declining continuously, enabling individual researchers and
research groups to generate large amounts of accurate and
specific data about their systems of interests (1). After the initial wave of sequencing ‘reference’ genomes, i.e. high-quality
nucleotide sequences of representative lines for a given species, the focus is now being shifted to sequencing multiple accessions for each species, and identifying genomic regions
showing diversity in the nucleotide sequences (2). Not all
nucleotide-level variations are functionally meaningful. Some
are only remnants of random mutations that transpired during each species’ evolutionary journey, and do not have any
obvious function. Conversely, other variations play a significant biological role in controlling agronomically important
traits such as drought or pest resistance. The challenge for
many research groups is to sift through collections of diverse
regions and identify genotypes that control specific aspects of
plant development that are useful to agronomy (3).
As centuries-long genetic research demonstrates, the
identification of trait-determining genotypes has not been
an easy task (4). This task is now facilitated by the abundance of data generated by more affordable DNA sequencing technologies. However, an increase in data also brings
a new set of challenges toward statistical evaluation of
genotypes and associated phenotypes. The regions identified
by quantitative trait loci (QTL) studies, may span a few million nucleotides, whereas current association studies can
zero in to the single nucleotide level, and support hypotheses
that a single (rarely) or several defined loci play an important part in determining a specific trait (5). When there is a
long list of putative regions and with favorable alleles possibly belonging to multiple germplasm accessions, extracting
relevant information is challenging.
Deploying appropriate visualization applications that
allow facile interpretation of experimental and computational outcomes may significantly facilitate this discovery
process. Depending on specific research questions, visualization of biological data can take a wide range of forms
(6). In most cases, multiple visualization methods are
required to explore the data from multiple perspectives.
Another challenge for researchers is access to visualization tools. Multiple visualization applications are available
as desktop applications for personal computers using
popular operating systems, such as Windows or Linux.
Some of those applications, especially the ones that can be
installed on GUI-based systems (e.g. Windows or Mac OS
systems), are easier to use for researchers. That being said,
most bioinformatics applications are increasingly being
built for Linux systems, and require some knowledge of
command-line operations to be installed and used, creating
a sometimes insurmountable barrier for researchers to harness the powerful features afforded by these applications.
An alternative to desktop applications is applications that
are accessible and functional through web browsers, but
these applications require specialized skills to build and ongoing funds and personnel to (...truncated)