Open Data for Global Science

Data Science Journal, Jun 2007

he digital revolution has transformed the accumulation of properly curated public research data into an essential upstream resource whose value increases with use. The potential contributions of such data to the creation of new knowledge and downstream economic and social goods can in many cases be multiplied exponentially when the data are made openly available on digital networks. Most developed countries spend large amounts of public resources on research and related scientific facilities and instruments that generate massive amounts of data. Yet precious little of that investment is devoted to promoting the value of the resulting data by preserving and making them broadly available. The largely ad hoc approach to managing such data, however, is now beginning to be understood as inadequate to meet the exigencies of the national and international research enterprise. The time has thus come for the research community to establish explicit responsibilities for these digital resources. This article reviews the opportunities and challenges to the global science system associated with establishing an open data policy.

Article PDF cannot be displayed. You can download it here:

http://datascience.codata.org/articles/10.2481/dsj.6.OD36/galley/367/download/

Open Data for Global Science

Data Science Journal, Volume 6, Open Data Issue, 17 June 2007 OPEN DATA FOR GLOBAL SCIENCE Paul F. Uhlir1* and Peter Schröder2 *1 National Research Council, 2101 Constitution Avenue NW, Washington, DC 20418, USA. The views expressed in this paper are those of the authors and not necessarily those of their institutions of employment. Email: 2 Data Archiving and Networked Services (DANS), Anna van Saksenlaan 51, 2593 HW Den Haag, The Netherlands Email: ABSTRACT The digital revolution has transformed the accumulation of properly curated public research data into an essential upstream resource whose value increases with use.1 The potential contributions of such data to the creation of new knowledge and downstream economic and social goods can in many cases be multiplied exponentially when the data are made openly available on digital networks. Most developed countries spend large amounts of public resources on research and related scientific facilities and instruments that generate massive amounts of data. Yet precious little of that investment is devoted to promoting the value of the resulting data by preserving and making them broadly available. The largely ad hoc approach to managing such data, however, is now beginning to be understood as inadequate to meet the exigencies of the national and international research enterprise. The time has thus come for the research community to establish explicit responsibilities for these digital resources. This article reviews the opportunities and challenges to the global science system associated with establishing an open data policy. Keywords: Scientific data, Science policy, Information policy, Open access, Data management, Data licensing, International scientific cooperation, Cyberinfrastructure, e-Science, Internet 1 INTRODUCTION The global science system stands at a critical juncture. On the one hand, it is overwhelmed by a hidden avalanche of ephemeral bits that are central components of modern research and of the emerging “cyberinfrastructure”2 for e- 1 See generally, National Research Council (1997), Bits of Power: Issues in Global Access to Scientific Data, National Academy Press, Washington, DC. “Data” may be defined as “facts, numbers, letters, and symbols that describe an object, idea, condition, situation, or other factors”, National Research Council (1999), A Question of Balance: Private Rights and the Public Interest in Scientific Databases, National Academy Press, Washington, DC, p. 15. We define “public research data” as data that are generated through research within government organizations, or by academic or other not-for-profit entities, as well as public data used for research purposes, but not necessarily produced primarily for research (e.g., geographic or meteorological data, or socioeconomic statistics produced by or for government organizations). 2 The U.S. Blue Ribbon Advisory Panel on Cyberinfrastructure anticipated an information and communication technology (ICT) infrastructure of “…digital environments that become interactive and functionally complete for research communities in terms of people, data, information, tools and instruments and that operate at unprecedented levels of computational, storage and data transfer capacity…” in (2003) Revolutionizing Science and Engineering Trough Cyberinfrastructure: Report of the National Science Foundation Blue Ribbon Advisory Panel on Cyberinfrastructure, National Science Foundation, available at: http://www.communitytechnology.org/nsf_ci_report/. We use the terms cyberinfrastructure and ICT infrastructure interchangeably in this paper. OD36 Data Science Journal, Volume 6, Open Data Issue, 17 June 2007 science3. The rational management and exploitation of this cascade of digital assets offers boundless opportunities for research and applications. On the other hand, the ability to access and use this rising flood of data seems to lag behind, despite the rapidly growing capabilities of information and communication technologies (ICTs) to make much more effective use of those data. As long as the attention for data policies and data management by researchers, their organisations and their funders does not catch up with the rapidly changing research environment, the research policy and funding entities in many cases will perpetuate the systemic inefficiencies, and the resulting loss or underutilization of valuable data resources derived from public investments. There is thus an urgent need for rationalized national strategies and more coherent international arrangements for sustainable access to public research data, both to data produced directly by government entities and to data generated in academic and not-forprofit institutions with public funding. In this paper, we examine some of the implications of the “data driven” research and possible ways to overcome existing barriers to accessibility of public research data. Our perspective is framed in the context of the predominantly publicly funded global science system. We begin by reviewing the growing role of digital data in research and outlining the roles of stakeholders in the research community in developing data access regimes. We then discuss the hidden costs of closed data systems, the benefits and limitations of openness as the default principle for data access, and the emerging open access models that are beginning to form digitally networked commons. We conclude by examining the rationale and requirements for developing overarching international principles from the top down, as well as flexible, common-use contractual templates from the bottom up, to establish data access regimes founded on a presumption of openness, with the goal of better capturing the benefits from the existing and future scientific data assets. The ”Principles and Guidelines for Access to Research Data from Public Funding” from the Organisation for Economic Cooperation and Development (OECD), reported on in another article by Pilat and Fukasaku in this special issue of the CODATA Data Science Journal, are the most important recent example of the high-level (inter)governmental approach. The common-use licenses promoted by the Science Commons are a leading example of flexible arrangements originating within the community. Finally, we should emphasize that we focus almost exclusively on the policy—the institutional, socioeconomic, and legal aspects of data access—rather than on the technical and management practicalities that are also important, but beyond the scope of this article. 2 THE GROWING ROLE OF DIGITAL DATA IN THE RESEARCH PROCESS The evolution of scientific research may be characterized by an accelerating growth in scale, scope, and complexity. These developments in scientific research have been accompanied by a substantial rise in costs. Overall expenditures on research and development (R&D) in the OECD countries increased from $163.2 billion in 1981 to $679.8 in 2003 (in constant prices, 2000 dollars (...truncated)


This is a preview of a remote PDF: http://datascience.codata.org/articles/10.2481/dsj.6.OD36/galley/367/download/
Article home page: https://datascience.codata.org/articles/10.2481/dsj.6.OD36/

Paul Uhlir, Peter Schröder. Open Data for Global Science, Data Science Journal, 2007, pp. OD36-OD53, Volume 6, Issue 0, DOI: 10.2481/dsj.6.OD36