Data issues at the Euro-Mediterranean Centre for Climate Change
S. Fiore
S. Vadacca
A. Negro
G. Aloisio
Climate Change research is even more becoming a data intensive and oriented scientific activity. Petabytes of climate data, big collections of datasets are continuously produced, delivered, accessed, processed by scientists and researchers at multiple sites at an international level. This work presents the Euro-Mediterranean Centre for Climate Change (CMCC) initiative, discussing data and metadata issues and dealing with both architectural and infrastructural aspects concerning the adopted grid enabled solution. A complete overview of the grid services deployed at the Centre is presented as well as the client side support (CMCC data portal and monitoring dashboard).
-
Climate change represents an important and critical
challenge for several scientists and researchers.
Increasingly complex simulation models, management of
petabytes of datasets (which are already too massive for
current storage devices) are issues that must be faced up
to in the related centres. Key elements that must be taken
into account are strongly connected both with data and
metadata management.
In this paper we introduce the Euro-Mediterranean
Centre for Climate Change (CMCC) initiative and the
adopted data grid solution for the management of climate
datasets. Despite the classical approaches,
data-gridenabled solutions (Berman et al. 2003; Foster 2005; Foster
et al. 2001) greatly address scalability (users, data, queries,
etc.), transparency (access, integration, management,
presentation) and efficiency (performance) allowing the
management of huge and distributed datasets.
The CMCC represents a fully distributed environment, is
comprised of several sites, partners, etc. and it is an
harmonic mix of different skills in the field of climate
modeling, economy, impact studies and information
technology. Taking into consideration the climate data growth
rate, it is our considered opinion that a full decentralized
schema for the management of data and metadata
(addressing data availability, scalability, site autonomy and
efficiency) represents the most suitable solution in the proposed
environment.
In this paper we present and discuss in detail the data
grid management solution adopted at the CMCC. First of
all, before presenting the overall architecture designed at
the Centre (a view in the large related to the involved data
and metadata services/components) we provide a complete
analysis concerning the main challenges driving our work
(secure, efficient and transparent distributed data
management, interoperability, metadata search and discovery, etc.).
Then we concentrate our attention and we delve into details
of three fundamental pillars: data management, metadata
management and user support providing technical
motivations behind our choices and additional information about
how the data/metadata related issues have been faced and
solved at the Centre. Concerning metadata management we
present the adopted CMCC metadata schema and the related
implementation, the CMCC metadata handling architecture
and infrastructure, the distributed metadata search, etc. On
the other hand, for the data management part, we deal with
data transfer, access, replication and management services
and issues. Security is also discussed from several points of
view. Moreover, we talk about the available user support
presenting the CMCC data portal, the available command
line interface and the CMCC monitoring dashboard.
Finally we discuss related works highlighting differences
and analogies with the proposed solution and we draw our
conclusions in the last section.
The CMCC initiative
In 2005, the Italian government, through the Ministry of the
Environment and Protection (MATT), the Ministry of
Education, University and Research (MIUR), and the
Ministry of Economy and Finance (MEF) started a
scientific initiative (namely the Euro-Mediterranean Centre
for Climate Change, CMCC) aimed at establishing a
national research centre devoted to climate change research.
The main partners of this initiative are six Italian research
Institutes (the National Institute of Geophysics and
Vulcanology, the Fondazione Eni Enrico Mattei, the University of
Salento, the Italian Aerospace Research Center, the
University of Sannio, and the Consorzio Venezia Ricerche).
As it can be argued this Centre is distributed in nature
among several sites at a geographical scale (see Fig. 1) and
is comprised of several research divisions which provide
support for computing and operations activities, numerical
modeling, impact studies (on health, energy, economy,
coastal zone, Mediterranean sea, agriculture, etc.), training
and dissemination.
This Centre represents the most ambitious initiative
undertaken in Italy, within the framework of the National
Research Plan, and specifically the National Research Plan
on Climate. One of the basic idea behind CMCC is to create
a unified environment able to concentrate in the same place
numerical models, sim (...truncated)