Surveying the Maize community for their diversity and pedigree visualization needs to prioritize tool development and curation (pdf)

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/database/article-pdf/doi/10.1093/database/bax031/19233537/bax031.pdf

Surveying the Maize community for their diversity and pedigree visualization needs to prioritize tool development and curation

Database, 2017, 1–7 doi: 10.1093/database/bax031 Original article Original article Surveying the Maize community for their diversity and pedigree visualization needs to prioritize tool development and curation Taner Z. Sen1,2,3,*, Bremen L. Braun1, David A. Schott1,4, John L. Portwood, II1, Mary L. Schaeffer5,6, Lisa C. Harper1, Jack M. Gardiner7, Ethalinda K. Cannon1,4 and Carson M. Andorf1,4 1 U.S. Department of Agriculture- Agricultural Research Service (USDA-ARS) Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA, 2Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA, 3Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA, 4Department of Computer Science, Iowa State University, Ames, IA 50011, USA, 5USDA-ARS Plant Genetics Research Unit, University of Missouri, Columbia, MO 65211, USA, 6Division of Plant Sciences, Department of Agronomy, University of Missouri, Columbia, MO 65211, USA and 7Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA *Corresponding author: Tel.: +1 (510) 559-5982; Email: Present address: Taner Z. Sen, USDA-ARS, Western Regional Research Center, Crop Improvement and Genetics Research Unit, 800 Buchanan St., Albany, CA 94710, USA. Citation details: Sen,T.Z., Braun,B.L., Schott,D.A., et al. Surveying the Maize community for their diversity and pedigree visualization needs to prioritize tool development and curation. Database (2017) Vol. 2017: article ID bax031; doi:10.1093/ database/bax031 Received 26 October 2016; Revised 20 March 2017; Accepted 25 March 2017 Abstract The Maize Genetics and Genomics Database (MaizeGDB) team prepared a survey to identify breeders’ needs for visualizing pedigrees, diversity data and haplotypes in order to prioritize tool development and curation efforts at MaizeGDB. The survey was distributed to the maize research community on behalf of the Maize Genetics Executive Committee in Summer 2015. The survey garnered 48 responses from maize researchers, of which more than half were self-identified as breeders. The survey showed that the maize researchers considered their top priorities for visualization as: (i) displaying single nucleotide polymorphisms in a given region for a given list of lines, (ii) showing haplotypes for a given list of lines and (iii) presenting pedigree relationships visually. The survey also asked which populations would be most useful to display. The following two populations were on top of the list: (i) 3000 publicly available maize inbred lines used in Romay et al. (Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol, 2013;14:R55) and (ii) maize lines with expired Plant Variety Protection Act (ex-PVP) certificates. Driven by this strong stakeholder input, MaizeGDB staff are currently working in four areas to improve its interface and web-based tools: (i) presenting Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US. Page 1 of 7 (page number not for citation purposes) Page 2 of 7 Database, Vol. 2017, Article ID bax031 immediate progenies of currently available stocks at the MaizeGDB Stock pages, (ii) displaying the most recent ex-PVP lines described in the Germplasm Resources Information Network (GRIN) on the MaizeGDB Stock pages, (iii) developing network views of pedigree relationships and (iv) visualizing genotypes from SNP-based diversity datasets. These survey results can help other biological databases to direct their efforts according to user preferences as they serve similar types of data sets for their communities. Database URL: https://www.maizegdb.org The cost of generating DNA-based biological data types is declining continuously, enabling individual researchers and research groups to generate large amounts of accurate and specific data about their systems of interests (1). After the initial wave of sequencing ‘reference’ genomes, i.e. high-quality nucleotide sequences of representative lines for a given species, the focus is now being shifted to sequencing multiple accessions for each species, and identifying genomic regions showing diversity in the nucleotide sequences (2). Not all nucleotide-level variations are functionally meaningful. Some are only remnants of random mutations that transpired during each species’ evolutionary journey, and do not have any obvious function. Conversely, other variations play a significant biological role in controlling agronomically important traits such as drought or pest resistance. The challenge for many research groups is to sift through collections of diverse regions and identify genotypes that control specific aspects of plant development that are useful to agronomy (3). As centuries-long genetic research demonstrates, the identification of trait-determining genotypes has not been an easy task (4). This task is now facilitated by the abundance of data generated by more affordable DNA sequencing technologies. However, an increase in data also brings a new set of challenges toward statistical evaluation of genotypes and associated phenotypes. The regions identified by quantitative trait loci (QTL) studies, may span a few million nucleotides, whereas current association studies can zero in to the single nucleotide level, and support hypotheses that a single (rarely) or several defined loci play an important part in determining a specific trait (5). When there is a long list of putative regions and with favorable alleles possibly belonging to multiple germplasm accessions, extracting relevant information is challenging. Deploying appropriate visualization applications that allow facile interpretation of experimental and computational outcomes may significantly facilitate this discovery process. Depending on specific research questions, visualization of biological data can take a wide range of forms (6). In most cases, multiple visualization methods are required to explore the data from multiple perspectives. Another challenge for researchers is access to visualization tools. Multiple visualization applications are available as desktop applications for personal computers using popular operating systems, such as Windows or Linux. Some of those applications, especially the ones that can be installed on GUI-based systems (e.g. Windows or Mac OS systems), are easier to use for researchers. That being said, most bioinformatics applications are increasingly being built for Linux systems, and require some knowledge of command-line operations to be installed and used, creating a sometimes insurmountable barrier for researchers to harness the powerful features afforded by these applications. An alternative to desktop applications is applications that are accessible and functional through web browsers, but these applications require specialized skills to build and ongoing funds and personnel to (...truncated)