Multifunctional crop trait ontology for breeders (pdf)

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/aobpla/article-pdf/doi/10.1093/aobpla/plq008/17460996/plq008.pdf

Multifunctional crop trait ontology for breeders

AoB PLANTS http://aobplants.oxfordjournals.org/ Open access – Technical article Multifunctional crop trait ontology for breeders’ data: field book, annotation, data discovery and semantic enrichment of the literature Rosemary Shrestha 1*, Elizabeth Arnaud 2*, Ramil Mauleon 3, Martin Senger 3, Guy F. Davenport 1, David Hancock4, Norman Morrison 4, Richard Bruskiewich3 and Graham McLaren 5 1 IRRI-CIMMYT Crop Research Informatics Laboratory (CRIL), Centro Internacional de Mejoramiento de Máiz y Trigo (CIMMYT), Apdo. Postal 6-641, 06600 Mexico, D.F., Mexico 2 Bioversity International, via dei Tre Denari, 472/a, 00057 Maccarese, Rome, Italy 3 IRRI-CIMMYT Crop Research Informatics Laboratory (CRIL), International Rice Research Institute (IRRI), DAPO Box 7777, Metro Manila, Philippines 4 Department of Computer Science, University of Manchester, Oxford Road, Manchester, UK 5 Generation Challenge Programme (GCP), c/o Centro Internacional de Mejoramiento de Máiz y Trigo (CIMMYT), Apdo. Postal 6-641, 06600 Mexico, D.F., Mexico Received: 26 February 2010; Returned for revision: 19 April 2010; Accepted: 21 May 2010; Published: 27 May 2010 Citation details: Shrestha R, Arnaud E, Mauleon R, Senger M, Davenport GF, Hancock D, Morrison N, Bruskiewich R, McLaren G. 2010. Multifunctional crop trait ontology for breeders’ data: field book, annotation, data discovery and semantic enrichment of the literature. AoB PLANTS 2010: plq008, doi:10.1093/aobpla/plq008 Abstract Background and aims Agricultural crop databases maintained in gene banks of the Consultative Group on International Agricultural Research (CGIAR) are valuable sources of information for breeders. These databases provide comparative phenotypic and genotypic information that can help elucidate functional aspects of plant and agricultural biology. To facilitate data sharing within and between these databases and the retrieval of information, the crop ontology (CO) database was designed to provide controlled vocabulary sets for several economically important plant species. Methodology Existing public ontologies and equivalent catalogues of concepts covering the range of crop science information and descriptors for crops and crop-related traits were collected from breeders, physiologists, agronomists, and researchers in the CGIAR consortium. For each crop, relationships between terms were identified and crop-specific trait ontologies were constructed following the Open Biomedical Ontologies (OBO) format standard using the OBO-Edit tool. All terms within an ontology were assigned a globally unique CO term identifier. Principal results The CO currently comprises crop-specific traits for chickpea (Cicer arietinum), maize (Zea mays), potato (Solanum tuberosum), rice (Oryza sativa), sorghum (Sorghum spp.) and wheat (Triticum spp.). Several plant-structure and anatomy-related terms for banana (Musa spp.), wheat and maize are also included. In addition, multi-crop passport terms are included as controlled vocabularies for sharing information on germplasm. Two web-based online resources were built to make these COs available to the scientific community: the ‘CO Lookup Service’ for browsing the CO; and the ‘Crops Terminizer’, an ontology text mark-up tool. * Corresponding author’s e-mail address: ; AoB PLANTS Vol. 2010, plq008, doi:10.1093/aobpla/plq008, available online at www.aobplants.oxfordjournals.org & The Authors 2010. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited. AoB PLANTS Vol. 2010, plq008, doi:10.1093/aobpla/plq008 & The Authors 2010 1 Shrestha et al. — Crop ontology for plant breeders Conclusions The controlled vocabularies of the CO are being used to curate several CGIAR centres’ agronomic databases. The use of ontology terms to describe agronomic phenotypes and the accurate mapping of these descriptions into databases will be important steps in comparative phenotypic and genotypic studies across species and gene-discovery experiments. Introduction The challenge of addressing climate change for food security and adaptation of agricultural systems led, in 2004, to the launch of the 10-year Generation Challenge Programme (GCP). This is an agricultural research consortium hosted by international agricultural research centres of the Consultative Group on International Agricultural Research (CGIAR). The GCP involves 22 research institutes in partnership with external collaborators. The GCP research agenda focuses on producing droughttolerant varieties through comparative genomics-driven improvement and high-throughput molecular characterization of genetic resources in order to introduce favourable alleles into plant-breeding programmes. For decades, CGIAR centres and their gene banks have accumulated considerable amounts of valuable data on germplasm traits. The GCP is now adding new data sets related to genotype and phenotype, which need to be released and made accessible to breeders online. Scientists are overwhelmed by data: the amount of biological and genetic information has increased dramatically with the advent of high-throughput data collection in the fields of molecular biology and biotechnology. Researchers need a multidisciplinary approach to understand the biological processes from genes to the expression of traits in crops. This approach requires the extraction of biological data sets from a wide range of sources. The interoperability between these sources enables scientists to exploit comparative genomic information, elucidate functional aspects of plant biology and conduct studies of synteny and homology. However, the GCP has not yet achieved the level of interoperability required for providing access to comprehensive sets of biological data. One obstacle to the seamless combination of genetic trait and experimental data is the variability of the terms and concepts used to describe comparable objects across databases. In agronomy, phenotype information has traditionally been captured in a free-text manner. In addition, many traits are crop specific and some have complex trait names, thus making it difficult to understand their exact meaning without further description. Developing trait ontology for economically important crops is crucial to overcoming the inconsistencies between 2 GCP data sources and sharing this knowledge among researchers. In bioinformatics, an ontology is a formal representation of a set of concepts within a specific discipline or domain and the relationship between those concepts. It provides a shared and controlled vocabulary that can be used to model the domain in terms of the types of object or concept, and their properties and relationships. Ontology is more complex than (...truncated)