Discovering, Indexing and Interlinking Information Resources [version 2; referees: 1 approved, 2 approved with reservations]
F1000Research 2015, 4:432 Last updated: 27 OCT 2022
SOFTWARE TOOL ARTICLE
Discovering, Indexing and Interlinking Information
Resources [version 2; peer review: 3 approved]
Fabrizio Celli1, Johannes Keizer1, Yves Jaques1, Stasinos Konstantopoulos2,
Dušan Vudragović3
1Food and Agriculture Organization of the UN, Rome, Italy
2NCSR Demokritos, Athens, Greece
3Institute of Physics Belgrade, University of Belgrade, Belgrade, Serbia
v2 First published: 30 Jul 2015, 4:432
https://doi.org/10.12688/f1000research.6848.1
Latest published: 17 Nov 2015, 4:432
https://doi.org/10.12688/f1000research.6848.2
Abstract
The social media revolution is having a dramatic effect on the world of
scientific publication. Scientists now publish their research interests,
theories and outcomes across numerous channels, including personal
blogs and other thematic web spaces where ideas, activities and
partial results are discussed. Accordingly, information systems that
facilitate access to scientific literature must learn to cope with this
valuable and varied data, evolving to make this research easily
discoverable and available to end users. In this paper we describe the
incremental process of discovering web resources in the domain of
agricultural science and technology. Making use of Linked Open Data
methodologies, we interlink a wide array of custom-crawled resources
with the AGRIS bibliographic database in order to enrich the user
experience of the AGRIS website. We also discuss the SemaGrow
Stack, a query federation and data integration infrastructure used to
estimate the semantic distance between crawled web resources and
AGRIS.
Keywords
Linked Data , Text Categorization , Recommender Systems , Web
Crawling , AGRIS , SemaGrow
Open Peer Review
Approval Status
1
2
3
view
view
view
view
view
view
version 2
(revision)
17 Nov 2015
version 1
30 Jul 2015
1. Paolo Missier, Newcastle University,
Newcastle upon Tyne, UK
2. Kei Kurakawa, National Institute of
Informatics, Tokyo, Japan
3. Leonidas Papachristopoulos, Ionian
University, Corfu, Greece
Any reports and responses or comments on the
article can be found at the end of the article.
This article is included in the Agriculture, Food
and Nutrition gateway.
Page 1 of 23
F1000Research 2015, 4:432 Last updated: 27 OCT 2022
Corresponding author: Fabrizio Celli ()
Competing interests: No competing interests were disclosed.
Grant information: This work was supported by the European Commission under EU FP7 project SemaGrow (Grant No. 318497), and in
part by the Ministry of Education, Science, and Technological Development of the Republic of Serbia (under project ON171017).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Copyright: © 2015 Celli F et al. This is an open access article distributed under the terms of the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
How to cite this article: Celli F, Keizer J, Jaques Y et al. Discovering, Indexing and Interlinking Information Resources [version 2;
peer review: 3 approved] F1000Research 2015, 4:432 https://doi.org/10.12688/f1000research.6848.2
First published: 30 Jul 2015, 4:432 https://doi.org/10.12688/f1000research.6848.1
Page 2 of 23
F1000Research 2015, 4:432 Last updated: 27 OCT 2022
REVISED
Amendments from Version 1
We conducted an evaluation study on a benchmark sample
of AGRIS articles in order to determine the relevance between
crawled web resources and the AGRIS database. We computed
the precision of recommendations considered as “relevant” by
our algorithm, commenting on some possible improvements to
the process used and described in our work. Outcomes of our
evaluation study are presented in the “Analysis of relevance” section,
together with a new picture displaying the cumulative distribution
of AGRIS records over the number of relevant recommendations.
Furthermore, we created a separate section “Analyzing the
algorithm performance” where we compared the execution time of
the recommender system in both the “individual” and “federated”
modes. The section “The output of the recommender system” was
removed, since it contained only a sample RDF/XML fragment that
was not very significant. Lastly, the definition of the custom algorithm
was removed and minor improvements have been made to the text,
as suggested by reviewers.
See referee reports
Introduction
AGRIS (http://agris.fao.org/) is the International System for Agricultural Science and Technology, a collection of nearly 8 million
multilingual bibliographic resources spanning the last forty years
and produced by a network of more than 150 institutions from 65
countries. AGRIS is currently part of the CIARD initiative (http://
www.ciard.net/), a self-described “global movement dedicated
to open agricultural knowledge”. Some AGRIS data sources are
unique (http://aims.fao.org/activity/blog/agris-enriched-data-fao)
to the system and AGRIS is the only way in which they can be
accessed. The system’s goal is to make agricultural research globally discoverable, and as evidenced by Google Analytics it supports both developed and developing countries. Indeed, AGRIS is
accessed from more than 200 countries and territories, reaching
peaks of 250,000 visits per month. AGRIS users belong to two very
different categories: the general public and agriculture professionals. In particular, a survey conducted at the end of 2014 helped to
better describe the AGRIS audience [Celli et al., 2015]: researchers, professors, and graduate students looking for bibliographies,
librarians, cataloguers, and people responsible for managing and
disseminating research outcomes to the community and the rest of
the world (including small and big journal publishers), and government officers asking for reports on specific topics. Since December
2013, AGRIS adopted a LOD (Linked Open Data) infrastructure
[Anibaldi et al., 2015], which allowed the creation of mashup
pages, where users looking for specific topics (e.g. impacts of climate change in a country) can access a publication from the AGRIS
database, combined with other related resources extracted from
other preselected datasets. External resources available in AGRIS
mashup pages are not only bibliographic metadata, but also distribution maps, statistics, germplasm accessions, and so on. In this
paper we explore a new data source available in AGRIS mashup
pages: the web itself.
Nowadays, scientists and researchers publish their results not only
in journals or at conferences, but also via web 2.0 tools and other
media [Kouper, 2010; Shema et al., 2012] in order to efficiently
and broadly communicate their outcomes; this technique also helps
scientific research reach the general public, since newspapers,
magazines and science blogs are often the quickest way to reach
people informally. Blogs and other websites may a (...truncated)