EPJ Data Science

http://link.springer.com/journal/13688

List of Papers (Total 131)

Understanding predictability and exploration in human mobility

Predictive models for human mobility have important applications in many fields including traffic control, ubiquitous computing, and contextual advertisement. The predictive performance of models in literature varies quite broadly, from over 90% to under 40%. In this work we study which underlying factors - in terms of modeling approaches and spatio-temporal characteristics of the ...

Understanding the predictability of user demographics from cyber-physical-social behaviours in indoor retail spaces

Understanding the association between customer demographics and behaviour is critical for operators of indoor retail spaces. This study explores such an association based on a combined understanding of customer Cyber (online), Physical, and (some aspects of) Social (CPS) behaviour, at the conjunction of corresponding CPS spaces. We combine the results of a traditional questionnaire ...

Sentiment analysis methods for understanding large-scale texts: a case for using continuum-scored words and word shift graphs

The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, an extraordinary capacity which has profound implications for our understanding of human behavior. Given the growing assortment of sentiment-measuring instruments, it is imperative to understand which aspects of sentiment dictionaries contribute to both ...

Are you getting sick? Predicting influenza-like symptoms using human mobility behaviors

Understanding and modeling the mobility of individuals is of paramount importance for public health. In particular, mobility characterization is key to predict the spatial and temporal diffusion of human-transmitted infections. However, the mobility behavior of a person can also reveal relevant information about her/his health conditions. In this paper, we study the impact of ...

Gaining historical and international relations insights from social media: spatio-temporal real-world news analysis using Twitter

The immense growth of the social Web, which has made a large amount of user data easily and publicly available, has opened a whole new spectrum for research in social behavioral sciences. However, as the volume of social media content increases at a very fast rate, it becomes extremely difficult to systematically obtain high-level information from this data. As a consequence, tasks ...

Estimating local commuting patterns from geolocated Twitter data

The emergence of large stores of transactional data generated by increasing use of digital devices presents a huge opportunity for policymakers to improve their knowledge of the local environment and thus make more informed and better decisions. A research frontier is hence emerging which involves exploring the type of measures that can be drawn from data stores such as mobile ...

The effect of Pokémon Go on the pulse of the city: a natural experiment

Pokémon Go, a location-based game that uses augmented reality techniques, received unprecedented media coverage due to claims that it allowed for greater access to public spaces, increasing the number of people out on the streets, and generally improving health, social, and security indices. However, the true impact of Pokémon Go on people’s mobility patterns in a city is still ...

Data-driven modeling of collaboration networks: a cross-domain analysis

We analyze large-scale data sets about collaborations from two different domains: economics, specifically 22,000 R&D alliances between 14,500 firms, and science, specifically 300,000 co-authorship relations between 95,000 scientists. Considering the different domains of the data sets, we address two questions: (a) to what extent do the collaboration networks reconstructed from the ...

Rapid rise and decay in petition signing

Contemporary collective action, much of which involves social media and other Internet-based platforms, leaves a digital imprint which may be harvested to better understand the dynamics of mobilization. Petition signing is an example of collective action which has gained in popularity with rising use of social media and provides such data for the whole population of petition ...

The shape of collaborations

The structure of scientific collaborations has been the object of intense study both for its importance for innovation and scientific advancement, and as a model system for social group coordination and formation thanks to the availability of authorship data. Over the last years, complex networks approach to this problem have yielded important insights and shaped our understanding ...

Comparison of traffic reliability index with real traffic data

Existing studies have developed different indices based on various approaches including network connectivity, delay time and flow capacity, estimating the traffic reliability states from different angles. However, these indices mainly estimate traffic reliability from single view and rarely consider the combined effect of city traffic dynamics and underlying network structure. ...

Classification of Westminster Parliamentary constituencies using e-petition data

In a representative democracy it is important that politicians have knowledge of the desires, aspirations and concerns of their constituents. Opportunities to gauge these opinions are however limited and, in the era of novel data, thoughts turn to what alternative, secondary, data sources may be available to keep politicians informed about local concerns. One such source of data ...

A roadmap for the computation of persistent homology

Persistent homology (PH) is a method used in topological data analysis (TDA) to study qualitative features of data that persist across multiple scales. It is robust to perturbations of input data, independent of dimensions and coordinates, and provides a compact representation of the qualitative features of the input. The computation of PH is an open area with numerous important ...

Instagram photos reveal predictive markers of depression

Using Instagram data from 166 individuals, we applied machine learning tools to successfully identify markers of depression. Statistical features were computationally extracted from 43,950 participant Instagram photos, using color analysis, metadata components, and algorithmic face detection. Resulting models outperformed general practitioners’ average unassisted diagnostic success ...

Prediction of employment and unemployment rates from Twitter daily rhythms in the US

By modeling macro-economical indicators using digital traces of human activities on mobile or social networks, we can provide important insights to processes previously assessed via paper-based surveys or polls only. We collected aggregated workday activity timelines of US counties from the normalized number of messages sent in each hour on the online social network Twitter. In ...

Early detection of promoted campaigns on social media

Social media expose millions of users every day to information campaigns - some emerging organically from grassroots activity, others sustained by advertising or other coordinated efforts. These campaigns contribute to the shaping of collective opinions. While most information campaigns are benign, some may be deployed for nefarious purposes, including terrorist propaganda, ...

An alternative approach to the limits of predictability in human mobility

Next place prediction algorithms are invaluable tools, capable of increasing the efficiency of a wide variety of tasks, ranging from reducing the spreading of diseases to better resource management in areas such as urban planning. In this work we estimate upper and lower limits on the predictability of human mobility to help assess the performance of competing algorithms. We do ...

Inferring social influence in transport mode choice using mobile phone data

A longitudinal mobile phone data that include both location and communication logs is analyzed to infer social influence in terms of ego-network effect in the commute mode choice. The results show that person’s strong ties are more important to determine if driving is the person’s transport mode choice, whereas weak ties are more important to determine if public transit is the ...

Individual position diversity in dependence socioeconomic networks increases economic output

The availability of big data recorded from massively multiplayer online role-playing games (MMORPGs) allows us to gain a deeper understanding of the potential connection between individuals’ network positions and their economic outputs. We use a statistical filtering method to construct dependence networks from weighted friendship networks of individuals. We investigate the 30 ...

Uncovering the relationships between military community health and affects expressed in social media

Military populations present a small, unique community whose mental and physical health impacts the security of the nation. Recent literature has explored social media’s ability to enhance disease surveillance and characterize distinct communities with encouraging results. We present a novel analysis of the relationships between influenza-like illnesses (ILI) clinical data and ...

Topological analysis of data

Propelled by a fast evolving landscape of techniques and datasets, data science is growing rapidly. Against this background, topological data analysis (TDA) has carved itself a niche for the analysis of datasets that present complex interactions and rich structures. Its distinctive feature, topology, allows TDA to detect, quantify and compare the mesoscopic structures of data, ...

Contact activity and dynamics of the social core

Humans interact through numerous communication channels to build and maintain social connections: they meet face-to-face, make phone calls or send text messages, and interact via social media. Although it is known that the network of physical contacts, for example, is distinct from the network arising from communication events via phone calls and instant messages, the extent to ...

Gender matters! Analyzing global cultural gender preferences for venues using social sensing

Gender differences is a phenomenon around the world actively researched by social scientists. Traditionally, the data used to support such studies is manually obtained, often through surveys with volunteers. However, due to their inherent high costs because of manual steps, such traditional methods do not quickly scale to large-size studies. We here investigate a particular aspect ...