Data Science and Engineering

http://link.springer.com/journal/41019

List of Papers (Total 65)

Fine-Grained Access Control Within NoSQL Document-Oriented Datastores

The recent years have seen the birth of several NoSQL datastores, which are getting more and more popularity for their ability to handle high volumes of heterogeneous and unstructured data in a very efficient way. In several cases, NoSQL databases proved to outclass in terms of performance, scalability, and ease of use relational database management systems, meeting the...

Formal Modelling of Data Integration Systems Security Policies

Data Integration Systems (DIS) are concerned with integrating data from multiple data sources to resolve user queries. Typically, organisations providing data sources specify security policies that impose stringent requirements on the collection, processing, and disclosure of personal and sensitive data. If the security policies were not correctly enforced by the integration...

Landmark-Based Route Recommendation with Crowd Intelligence

Route recommendation is one of the most widely used location-based services nowadays, as it is vital for nice-driving experience and smooth public traffic. Given a pair of user-specified origin and destination, a route recommendation service aims to provide users with the routes of the best travelling experience according to given criteria. However, even the routes recommended by...

Medical Big Data: Neurological Diseases Diagnosis Through Medical Data Analysis

Diagnosis of neurological diseases is a growing concern and one of the most difficult challenges for modern medicine. According to the World Health Organisation’s recent report, neurological disorders, such as epilepsy, Alzheimer’s disease and stroke to headache, affect up to one billion people worldwide. An estimated 6.8 million people die every year as a result of neurological...

UniClip: Leveraging Web Search for Universal Clipping of Articles on Mobile

In this paper we address the difficulty of clipping articles from mobile apps. We propose a service called UniClip that allows a user to save the full content of an article by snapping a screenshot part of it. UniClip leverages a huge amount of indexed web data to mine the article by starting with a snapped screenshot. We propose approaches to solve three challenges: (1) how to...

A Data-Driven Evaluation for Insider Threats

Insiders are often legal users who are authorized to access system and data. If they misuse their privileges, it would bring great threat to system security. In practice, we could not have any knowledge about fraud pattern in advance, and most malicious behaviors are often in accordance with security rules; thus, it is difficult to predefine regulations for preventing all kinds...

A Storytelling-Driven Framework for Cultural Heritage Dissemination

This paper aims at introducing a new dissemination framework for cultural heritage (CH) making possible affordable solutions for small and medium museums to cooperate/collaborate in the creation of exhibitions. The framework also makes possible new data-based communication strategies able to combine content belonging to different cultural archives and accessed through an ontology...

Software-Defined Storage-Based Data Infrastructure Supportive of Hydroclimatology Simulation Containers: A Survey

Hydroclimatic research requires highly intensive resources in terms of computation and data to perform simulations. Setting up complex experiment environment and configurations to submit jobs in computational clusters as well as managing user’s limited storage spaces by transferring big size data into the secondary storage are complicated and time-consuming. As a possible answer...

Big Data Privacy: Challenges to Privacy Principles and Models

This paper explores the challenges raised by big data in privacy-preserving data management. First, we examine the conflicts raised by big data with respect to preexisting concepts of private data management, such as consent, purpose limitation, transparency and individual rights of access, rectification and erasure. Anonymization appears as the best tool to mitigate such...

On the Meaningfulness of “Big Data Quality” (Invited Paper)

In this paper, we discuss the application of concept of data quality to big data by highlighting how much complex is to define it in a general way. Already data quality is a multidimensional concept, difficult to characterize in precise definitions even in the case of well-structured data. Big data add two further dimensions of complexity: (i) being “very” source specific, and...

Clustering Embedded Approaches for Efficient Information Network Inference

Nowadays, the message diffusion links among users or Web sites drive the development of countless innovative applications. However, in reality, it is easier for us to observe the time stamps when different nodes in the network react on a message, while the connections empowering the diffusion of the message remain hidden. This motivates recent extensive studies on the network...

POS: A High-Level System to Simplify Real-Time Stream Application Development on Storm

Real-time stream computing becomes increasingly important due to the sheer amount of content continually generated in various kinds of social networks and e-commerce websites. Many distributed real-time computing systems have been built for different applications, and Storm is one of the most prominent systems with high-performance, fault-tolerance and low-latency features...