SciData: a data model and ontology for semantic representation of scientific data

Journal of Cheminformatics, Oct 2016

With the move toward global, Internet enabled science there is an inherent need to capture, store, aggregate and search scientific data across a large corpus of heterogeneous data silos. As a result, standards development is needed to create an infrastructure capable of representing the diverse nature of scientific data. This paper describes a fundamental data model for scientific data that can be applied to data currently stored in any format, and an associated ontology that affords semantic representation of the structure of scientific data (and its metadata), upon which discipline specific semantics can be applied. Application of this data model to experimental and computational chemistry data are presented, implemented using JavaScript Object Notation for Linked Data. Full examples are available at the project website (Chalk in SciData: a scientific data model. http://stuchalk.github.io/scidata/, 2016).

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1186%2Fs13321-016-0168-9.pdf

SciData: a data model and ontology for semantic representation of scientific data

Chalk J Cheminform SciData: a data model and ontology for semantic representation of scientific data Stuart J. Chalk 0 0 Department of Chemistry, University of North Florida , Jacksonville, FL 32224 , USA With the move toward global, Internet enabled science there is an inherent need to capture, store, aggregate and search scientific data across a large corpus of heterogeneous data silos. As a result, standards development is needed to create an infrastructure capable of representing the diverse nature of scientific data. This paper describes a fundamental data model for scientific data that can be applied to data currently stored in any format, and an associated ontology that affords semantic representation of the structure of scientific data (and its metadata), upon which discipline specific semantics can be applied. Application of this data model to experimental and computational chemistry data are presented, implemented using JavaScript Object Notation for Linked Data. Full examples are available at the project website (Chalk in SciData: a scientific data model. http://stuchalk.github.io/scidata/, 2016). Science data; Semantic annotation; Ontology; JSON-LD; RDF; Scientific data model - Background For almost 40  years, scientists have been storing scientific data on computers. With the advent of the Internet, research data could be shared between scientists, first via email and later using web pages, FTP sites, and online databases. With the advancement of Internet technologies and online and local storage capabilities, the options for collecting and stored scientific information have become unlimited. Yet, with all these advancements science faces an increasingly important issue of interoperability. Data are commonly stored in different formats, organized in different ways, and available via different tools/services severely impacting curation [2]. In addition, data is often without context (no metadata describing it), and if there is metadata it is minimal and often not based on standards. Though the Internet has promoted the creation of open standards in many areas, scientific data has, in a sense, been left behind because of its inherent complexity. The strange part about this scenario is that scientific data itself is not the biggest problem. The problem is the contextualization of the scientific data—the metadata that describes system that it applies to, the way it was investigated, the scientists that determined it, and the quality of the measurements. So, what is scientific data and where is the metadata? Peter Murray-Rust grappled with these questions in 2010 and concluded that it is “factual data that shows up in research papers” [3]. When writing scientific articles, researchers add most (in most cases not all) of the valuable metadata in the description of the research they have performed. The motivation of course is open sharing of knowledge for the advancement of science, with appropriate attribution and provenance of research work. As we move toward the fourth paradigm [4], where large aggregations of data are the key to discovery, it is imperative that the context of the data are articulated completely (or as completely as possible), not only to identify it’s origin and authenticity, but more importantly to allow the data to be located correctly on the “scientific data map”. To address these issue, this paper describes a generic scientific data model (SDM)/framework for scientific data derived from (1) the common structure of scientific articles, (2) the needs of electronic notebooks to capture scientific research data and metadata, and (3) the clear need to organize scientific data and its contextual descriptors (metadata). The SDM is intended to be data format/software agnostic and extremely flexible, so that © 2016 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. it can be implemented as the scientific research dictates. While the SDM is abstract in nature, it defines a concrete framework that can be easily implemented in any database and does not constrain the data and metadata that can be stored. It therefore serves as a backbone upon which data and its associated metadata can be ‘attached’. In addition, this paper describes an ontology that defines the terms in the SDM, which can be used to semantically annotate the structure of the data reported. In this way, scientific data can be integrated together by storage in Resource Description Fr (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1186%2Fs13321-016-0168-9.pdf

Stuart J. Chalk. SciData: a data model and ontology for semantic representation of scientific data, Journal of Cheminformatics, 2016, pp. 54, Volume 8, Issue 1, DOI: 10.1186/s13321-016-0168-9