Discovering, Indexing and Interlinking Information Resources [version 2; referees: 1 approved, 2 approved with reservations]

F1000Research, Nov 2015

The social media revolution is having a dramatic effect on the world of scientific publication. Scientists now publish their research interests, theories and outcomes across numerous channels, including personal blogs and other thematic web spaces where ideas, activities and partial results are discussed. Accordingly, information systems that facilitate access to scientific literature must learn to cope with this valuable and varied data, evolving to make this research easily discoverable and available to end users. In this paper we describe the incremental process of discovering web resources in the domain of agricultural science and technology. Making use of Linked Open Data methodologies, we interlink a wide array of custom-crawled resources with the AGRIS bibliographic database in order to enrich the user experience of the AGRIS website. We also discuss the SemaGrow Stack, a query federation and data integration infrastructure used to estimate the semantic distance between crawled web resources and AGRIS.

Article PDF cannot be displayed. You can download it here:

https://f1000research.com/articles/4-432/v2/pdf

Discovering, Indexing and Interlinking Information Resources [version 2; referees: 1 approved, 2 approved with reservations]

F1000Research 2015, 4:432 Last updated: 27 OCT 2022 SOFTWARE TOOL ARTICLE Discovering, Indexing and Interlinking Information Resources [version 2; peer review: 3 approved] Fabrizio Celli1, Johannes Keizer1, Yves Jaques1, Stasinos Konstantopoulos2, Dušan Vudragović3 1Food and Agriculture Organization of the UN, Rome, Italy 2NCSR Demokritos, Athens, Greece 3Institute of Physics Belgrade, University of Belgrade, Belgrade, Serbia v2 First published: 30 Jul 2015, 4:432 https://doi.org/10.12688/f1000research.6848.1 Latest published: 17 Nov 2015, 4:432 https://doi.org/10.12688/f1000research.6848.2 Abstract The social media revolution is having a dramatic effect on the world of scientific publication. Scientists now publish their research interests, theories and outcomes across numerous channels, including personal blogs and other thematic web spaces where ideas, activities and partial results are discussed. Accordingly, information systems that facilitate access to scientific literature must learn to cope with this valuable and varied data, evolving to make this research easily discoverable and available to end users. In this paper we describe the incremental process of discovering web resources in the domain of agricultural science and technology. Making use of Linked Open Data methodologies, we interlink a wide array of custom-crawled resources with the AGRIS bibliographic database in order to enrich the user experience of the AGRIS website. We also discuss the SemaGrow Stack, a query federation and data integration infrastructure used to estimate the semantic distance between crawled web resources and AGRIS. Keywords Linked Data , Text Categorization , Recommender Systems , Web Crawling , AGRIS , SemaGrow Open Peer Review Approval Status 1 2 3 view view view view view view version 2 (revision) 17 Nov 2015 version 1 30 Jul 2015 1. Paolo Missier, Newcastle University, Newcastle upon Tyne, UK 2. Kei Kurakawa, National Institute of Informatics, Tokyo, Japan 3. Leonidas Papachristopoulos, Ionian University, Corfu, Greece Any reports and responses or comments on the article can be found at the end of the article. This article is included in the Agriculture, Food and Nutrition gateway. Page 1 of 23 F1000Research 2015, 4:432 Last updated: 27 OCT 2022 Corresponding author: Fabrizio Celli () Competing interests: No competing interests were disclosed. Grant information: This work was supported by the European Commission under EU FP7 project SemaGrow (Grant No. 318497), and in part by the Ministry of Education, Science, and Technological Development of the Republic of Serbia (under project ON171017). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Copyright: © 2015 Celli F et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. How to cite this article: Celli F, Keizer J, Jaques Y et al. Discovering, Indexing and Interlinking Information Resources [version 2; peer review: 3 approved] F1000Research 2015, 4:432 https://doi.org/10.12688/f1000research.6848.2 First published: 30 Jul 2015, 4:432 https://doi.org/10.12688/f1000research.6848.1 Page 2 of 23 F1000Research 2015, 4:432 Last updated: 27 OCT 2022 REVISED Amendments from Version 1 We conducted an evaluation study on a benchmark sample of AGRIS articles in order to determine the relevance between crawled web resources and the AGRIS database. We computed the precision of recommendations considered as “relevant” by our algorithm, commenting on some possible improvements to the process used and described in our work. Outcomes of our evaluation study are presented in the “Analysis of relevance” section, together with a new picture displaying the cumulative distribution of AGRIS records over the number of relevant recommendations. Furthermore, we created a separate section “Analyzing the algorithm performance” where we compared the execution time of the recommender system in both the “individual” and “federated” modes. The section “The output of the recommender system” was removed, since it contained only a sample RDF/XML fragment that was not very significant. Lastly, the definition of the custom algorithm was removed and minor improvements have been made to the text, as suggested by reviewers. See referee reports Introduction AGRIS (http://agris.fao.org/) is the International System for Agricultural Science and Technology, a collection of nearly 8 million multilingual bibliographic resources spanning the last forty years and produced by a network of more than 150 institutions from 65 countries. AGRIS is currently part of the CIARD initiative (http:// www.ciard.net/), a self-described “global movement dedicated to open agricultural knowledge”. Some AGRIS data sources are unique (http://aims.fao.org/activity/blog/agris-enriched-data-fao) to the system and AGRIS is the only way in which they can be accessed. The system’s goal is to make agricultural research globally discoverable, and as evidenced by Google Analytics it supports both developed and developing countries. Indeed, AGRIS is accessed from more than 200 countries and territories, reaching peaks of 250,000 visits per month. AGRIS users belong to two very different categories: the general public and agriculture professionals. In particular, a survey conducted at the end of 2014 helped to better describe the AGRIS audience [Celli et al., 2015]: researchers, professors, and graduate students looking for bibliographies, librarians, cataloguers, and people responsible for managing and disseminating research outcomes to the community and the rest of the world (including small and big journal publishers), and government officers asking for reports on specific topics. Since December 2013, AGRIS adopted a LOD (Linked Open Data) infrastructure [Anibaldi et al., 2015], which allowed the creation of mashup pages, where users looking for specific topics (e.g. impacts of climate change in a country) can access a publication from the AGRIS database, combined with other related resources extracted from other preselected datasets. External resources available in AGRIS mashup pages are not only bibliographic metadata, but also distribution maps, statistics, germplasm accessions, and so on. In this paper we explore a new data source available in AGRIS mashup pages: the web itself. Nowadays, scientists and researchers publish their results not only in journals or at conferences, but also via web 2.0 tools and other media [Kouper, 2010; Shema et al., 2012] in order to efficiently and broadly communicate their outcomes; this technique also helps scientific research reach the general public, since newspapers, magazines and science blogs are often the quickest way to reach people informally. Blogs and other websites may a (...truncated)


This is a preview of a remote PDF: https://f1000research.com/articles/4-432/v2/pdf
Article home page: https://doaj.org/article/a53f1d6dcd9546c298fe8e2c4102e1b1

Fabrizio Celli, Johannes Keizer, Yves Jaques, Stasinos Konstantopoulos, Dušan Vudragović. Discovering, Indexing and Interlinking Information Resources [version 2; referees: 1 approved, 2 approved with reservations], F1000Research, 2015, Issue 4, DOI: 10.12688/f1000research.6848.2