Multifunctional crop trait ontology for breeders
AoB PLANTS
http://aobplants.oxfordjournals.org/
Open access – Technical article
Multifunctional crop trait ontology for breeders’ data: field
book, annotation, data discovery and semantic enrichment
of the literature
Rosemary Shrestha 1*, Elizabeth Arnaud 2*, Ramil Mauleon 3, Martin Senger 3, Guy F. Davenport 1,
David Hancock4, Norman Morrison 4, Richard Bruskiewich3 and Graham McLaren 5
1
IRRI-CIMMYT Crop Research Informatics Laboratory (CRIL), Centro Internacional de Mejoramiento de Máiz y Trigo (CIMMYT),
Apdo. Postal 6-641, 06600 Mexico, D.F., Mexico
2
Bioversity International, via dei Tre Denari, 472/a, 00057 Maccarese, Rome, Italy
3
IRRI-CIMMYT Crop Research Informatics Laboratory (CRIL), International Rice Research Institute (IRRI), DAPO Box 7777,
Metro Manila, Philippines
4
Department of Computer Science, University of Manchester, Oxford Road, Manchester, UK
5
Generation Challenge Programme (GCP), c/o Centro Internacional de Mejoramiento de Máiz y Trigo (CIMMYT), Apdo. Postal 6-641,
06600 Mexico, D.F., Mexico
Received: 26 February 2010; Returned for revision: 19 April 2010; Accepted: 21 May 2010; Published: 27 May 2010
Citation details: Shrestha R, Arnaud E, Mauleon R, Senger M, Davenport GF, Hancock D, Morrison N, Bruskiewich R, McLaren G. 2010.
Multifunctional crop trait ontology for breeders’ data: field book, annotation, data discovery and semantic enrichment of the
literature. AoB PLANTS 2010: plq008, doi:10.1093/aobpla/plq008
Abstract
Background
and aims
Agricultural crop databases maintained in gene banks of the Consultative Group on International Agricultural Research (CGIAR) are valuable sources of information for breeders.
These databases provide comparative phenotypic and genotypic information that can help
elucidate functional aspects of plant and agricultural biology. To facilitate data sharing
within and between these databases and the retrieval of information, the crop ontology
(CO) database was designed to provide controlled vocabulary sets for several economically
important plant species.
Methodology
Existing public ontologies and equivalent catalogues of concepts covering the range of crop
science information and descriptors for crops and crop-related traits were collected from
breeders, physiologists, agronomists, and researchers in the CGIAR consortium. For each
crop, relationships between terms were identified and crop-specific trait ontologies were constructed following the Open Biomedical Ontologies (OBO) format standard using the OBO-Edit
tool. All terms within an ontology were assigned a globally unique CO term identifier.
Principal results
The CO currently comprises crop-specific traits for chickpea (Cicer arietinum), maize (Zea mays),
potato (Solanum tuberosum), rice (Oryza sativa), sorghum (Sorghum spp.) and wheat (Triticum
spp.). Several plant-structure and anatomy-related terms for banana (Musa spp.), wheat and
maize are also included. In addition, multi-crop passport terms are included as controlled
vocabularies for sharing information on germplasm. Two web-based online resources were
built to make these COs available to the scientific community: the ‘CO Lookup Service’ for
browsing the CO; and the ‘Crops Terminizer’, an ontology text mark-up tool.
* Corresponding author’s e-mail address: ;
AoB PLANTS Vol. 2010, plq008, doi:10.1093/aobpla/plq008, available online at www.aobplants.oxfordjournals.org
& The Authors 2010. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative
Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
AoB PLANTS Vol. 2010, plq008, doi:10.1093/aobpla/plq008 & The Authors 2010
1
Shrestha et al. — Crop ontology for plant breeders
Conclusions
The controlled vocabularies of the CO are being used to curate several CGIAR centres’ agronomic databases. The use of ontology terms to describe agronomic phenotypes and the accurate mapping of these descriptions into databases will be important steps in comparative
phenotypic and genotypic studies across species and gene-discovery experiments.
Introduction
The challenge of addressing climate change for food
security and adaptation of agricultural systems led, in
2004, to the launch of the 10-year Generation Challenge
Programme (GCP). This is an agricultural research consortium hosted by international agricultural research
centres of the Consultative Group on International Agricultural Research (CGIAR). The GCP involves 22 research
institutes in partnership with external collaborators.
The GCP research agenda focuses on producing droughttolerant varieties through comparative genomics-driven
improvement and high-throughput molecular characterization of genetic resources in order to introduce
favourable alleles into plant-breeding programmes. For
decades, CGIAR centres and their gene banks have accumulated considerable amounts of valuable data on
germplasm traits. The GCP is now adding new data
sets related to genotype and phenotype, which need
to be released and made accessible to breeders online.
Scientists are overwhelmed by data: the amount of
biological and genetic information has increased dramatically with the advent of high-throughput data collection in the fields of molecular biology and
biotechnology. Researchers need a multidisciplinary
approach to understand the biological processes from
genes to the expression of traits in crops. This approach
requires the extraction of biological data sets from a
wide range of sources. The interoperability between
these sources enables scientists to exploit comparative
genomic information, elucidate functional aspects of
plant biology and conduct studies of synteny and homology. However, the GCP has not yet achieved the level
of interoperability required for providing access to comprehensive sets of biological data. One obstacle to the
seamless combination of genetic trait and experimental
data is the variability of the terms and concepts used to
describe comparable objects across databases. In
agronomy, phenotype information has traditionally
been captured in a free-text manner. In addition,
many traits are crop specific and some have complex
trait names, thus making it difficult to understand their
exact meaning without further description. Developing
trait ontology for economically important crops is
crucial to overcoming the inconsistencies between
2
GCP data sources and sharing this knowledge among
researchers.
In bioinformatics, an ontology is a formal representation of a set of concepts within a specific discipline
or domain and the relationship between those concepts.
It provides a shared and controlled vocabulary that can
be used to model the domain in terms of the types of
object or concept, and their properties and relationships.
Ontology is more complex than (...truncated)