Data Science Journal

https://datascience.codata.org

List of Papers (Total 372)

What Do We Know about the Stewardship Gap

In the 21st century, digital data drive innovation and decision-making in nearly every field. However, little is known about the total size, characteristics, and sustainability of these data. In the scholarly sphere, it is widely suspected that there is a gap between the amount of valuable digital data that is produced and the amount that is effectively stewarded and made...

Enhancing the Research Data Management of Computer-Based Educational Assessments in Switzerland

Since 2006 the education authorities in Switzerland have been obliged by the Constitution to harmonize important benchmarks in the educational system throughout Switzerland. With the development of national educational objectives in four disciplines an important basis for the implementation of this constitutional mandate was created. In 2013 the Swiss National Core Skills...

Automatic Acquisition and Sustainable Use of Political-Ecological Data

The sustainable management of anthropogenically-impacted ecosystems will require ongoing monitoring and advocacy by people across the globe. To this end, automatic methods are developed herein for acquiring several types of such political-ecological data. On the political side, a method is developed for gathering news articles about human actions that affect the ecosystem along...

Managing Digital Research Objects in an Expanding Science Ecosystem: 2017 Conference Summary

Digital research objects are packets of information that scientists can use to organize and store their data. There are currently many different methods in use for optimizing digital objects for research purposes. These methods have been applied to many scientific disciplines but differ in architecture and approach. The goals of this joint digital research object (DRO) conference...

A Conceptual Enterprise Framework for Managing Scientific Data Stewardship

Scientific data stewardship is an important part of long-term preservation and the use/reuse of digital research data. It is critical for ensuring trustworthiness of data, products, and services, which is important for decision-making. Recent U.S. federal government directives and scientific organization guidelines have levied specific requirements, increasing the need for a more...

Virtual Research Environment for Regional Climatic Processes Analysis: Ontological Approach to Spatial Data Systematization

This paper describes a Virtual Research Environment (VRE) based on a web GIS platform ‘Climate+’, which provides an access to analytic instruments processing 19 collections of meteorological and climate data of several international organizations. This environment provides systematization of spatial data and related climate information and allows a user getting analysis results...

Data Tracking Analysis of the Geomagnetic Fixed-Station Network in China

Data tracking analysis is an important mechanism for increasing data analysis capacity and eliminating interference from observational data. In this study, the technique was applied to the geomagnetic fixed-station network to improve the efficiency and accuracy of analysis to extract useful information. This paper introduces the scope, workflow, analysis platform, abnormal...

Text and Image Compression based on Data Mining Perspective

Data Compression has been one of the enabling technologies for the on-going digital multimedia revolution for decades which resulted in renowned algorithms like Huffman Encoding, LZ77, Gzip, RLE and JPEG etc. Researchers have looked into the character/word based approaches to Text and Image Compression missing out the larger aspect of pattern mining from large databases. The...

Ontology Usability Scale: Context-aware Metrics for the Effectiveness, Efficiency and Satisfaction of Ontology Uses

Both ontology builders and users need a way to evaluate ontologies in terms of usability, but existing ontology evaluation approaches do not fit this purpose. We propose the Ontology Usability Scale (OUS), a ten-item Likert scale derived from statements prepared according to a semiotic framework and an online poll in the Semantic Web community to provide a practical way of...

Marine Data Services at National Oceanographic Data Centre-India

In this paper we introduce about the marine data archived at Indian National Centre for Ocean Information Services (INCOIS), Ministry of Earth Sciences, India. Heterogeneous data from in situ, remote sensing and ocean models are archived. In-situ ocean observations includes data from Lagrangian as well Eulerian platforms like Argo floats, moored buoys etc, while remote sensing...

Unpacking the ‘Black Box’ of Public Expenditure Data in Africa: Quantification of Agricultural Spending Using Mozambique’s Budget Reports

This paper undertakes a detailed examination of the availability and quality of data on public expenditures in agriculture in Africa. We consider the case of Mozambique, a country characterised by low income and low administrative capacity, but also by a policy environment that has turned a focused lens on public funding to agriculture. We explore the extent to which domestic...

Disparity of Imputed Data from Small Area Estimate Approaches – A Case Study on Diabetes Prevalence at the County Level in the U.S.

This paper assesses concordance and inconsistency among three small area estimation methods that are currently providing county-level health indicators in the United States. The three methods are multi-level logistic regression, spatial logistic regression, and spatial Poison regression, all proposed since 2010. Diabetes prevalence is estimated for each county in the continental...

The State of Assessing Data Stewardship Maturity – An Overview

Data stewardship encompasses all activities that preserve and improve the information content, accessibility, and usability of data and metadata. Recent regulations, mandates, policies, and guidelines set forth by the U.S. government, federal other, and funding agencies, scientific societies and scholarly publishers, have levied stewardship requirements on digital scientific data...

Shapelet Classification Algorithm Based on Efficient Subsequence Matching

Shapelet classification algorithms are an accurate classification method for time series data. Existing shapelet classifying processes are relatively inefficient and slow due to the large amount of necessary complex distance computations. This paper therefore introduces piecewise aggregate approximation(PAA) representation and an efficient subsequence matching algorithm for...

Introduction: Open Data and Africa

This introduction outlines the contents of the special collection “Open Data and Africa”, which documents the goals and aspirations associated with Open Data means in Africa today: what opportunities they offer, what challenges they pose and what the implications follow from the increasing political and institutional support for this concept.

Multiset Analysis of Consequences of Natural Disasters Impacts on Large-Scale Industrial Systems

Paper is dedicated to the new approach to distributed industrial systems (IS) sustainability/vulnerability assessment. This approach is based on the unitary multiset grammars (UMG) as a flexible and convenient tool designed specially for large systems analysis and optimization. UMG description of IS technological base as well as multiset representation of order completed by the...

Science Metadata Management, Interoperability and Data Citations of the National Institute of Polar Research, Japan

The Polar Data Centre (PDC) of the National Institute of Polar Research (NIPR) has a responsibility to manage polar science data as part of the National Antarctic Data Centre and the Science Committee on Antarctic Research. During the International Polar Year (IPY 2007–2008), a remarkable number of data/metadata involving multi-disciplinary science activities were compiled...

Recommendations to Improve Downloads of Large Earth Observation Data

With the volume of Earth observation data expanding rapidly, cloud computing is quickly changing the way these data are processed, analyzed, and visualized. Collocating freely available Earth observation data on a cloud computing infrastructure may create opportunities unforeseen by the original data provider for innovation and value-added data re-use, but existing systems at...

The CODATA Role in Promotion of Data Quality

Promotion of data quality was a key feature of CODATA’s original charter and should continue to receive the highest priority.

Collaborations and Partnerships in NASA’s Earth Science Data Systems

NASA has been collecting Earth observation data from spaceborne instruments since 1960. Today, there are tens of satellites orbiting the Earth and collecting frequent global observations for the benefit of mankind. Collaboration between NASA and organizations in the US and other countries has been extremely important in maintaining the Earth observation capabilities as well as...

Weather Forecasts for Pastoralism in a Changing Climate: Navigating the Data Space in North Eastern Uganda

Efforts to support the building of resilient pastoralism have been stepped up in Uganda through a number of activities. One of the activity is the provision of seasonal and medium-range climate forecasts to enable decisions concerning livestock herding. Seasonal weather forecasts are critical but there are challenges of timeliness and usability of the forecasts. The challenges...

Genomic Research Data Generation, Analysis and Sharing – Challenges in the African Setting

Genomics is the study of the genetic material that constitutes the genomes of organisms. This genetic material can be sequenced and it provides a powerful tool for the study of human, plant and animal evolutionary history and diseases. Genomics research is becoming increasingly commonplace due to significant advances in and reducing costs of technologies such as sequencing. This...

The Northern Voice: Listening to Indigenous and Northern Perspectives on Management of Data in Canada

The Canadian Cryospheric Information Network and Polar Data Catalogue (CCIN/PDC) provide: (1) a trusted archive to store data from Canadian cryospheric research and (2) a public access portal to this information. The CCIN/PDC has since expanded its collection to include data from health, ecological, social, and other sciences. Since its inception, CCIN/PDC has engaged Indigenous...

The Oklahoma Mesonet: A Pilot Study of Environmental Sensor Data Citations

This pilot study of 110 scientific papers utilizing environmental sensor data from the Oklahoma Mesonet during its first two decades of operations demonstrates the diversity of potential purposes in scientific research for a robust, rigorously maintained, accessible source of environmental sensor data, as well as the challenges involved in identifying uses of that data within...