EPJ Data Science

http://www.epjdatascience.com/

List of Papers (Total 276)

Segregation in religion networks

Religion is considered as a notable origin of interpersonal relations, as well as an effective and efficient tool to organize a huge number of people towards some challenging targets. At the same time, a believer prefers to make friend with other people of the same faith, and thus people of different faiths tend to form relatively isolated communities. The segregation between...

Activism via attention: interpretable spatiotemporal learning to forecast protest activities

The diffusion of new information and communication technologies—social media in particular—has played a key role in social and political activism in recent decades. In this paper, we propose a theory-motivated, spatiotemporal learning approach, ActAttn, that leverages social movement theories and a deep learning framework to examine the relationship between protest events and...

Inside 50,000 living rooms: an assessment of global residential ornamentation using transfer learning

The global community decorates their homes based on personal decisions and contextual influences of their larger cultural and economic surroundings. The extent to which spatial patterns emerge in residential decoration practices has been traditionally difficult to ascertain due to the private nature of interior home spaces. Yet, measuring these patterns can reveal the presence of...

Nowcasting earthquake damages with Twitter

The Modified Mercalli intensity scale (Mercalli scale for short) is a qualitative measure used to express the perceived intensity of an earthquake in terms of damages. Accurate intensity reports are vital to estimate the type of emergency response required for a particular earthquake. In addition, Mercalli scale reports are needed to estimate the possible consequences of strong...

The fragility of decentralised trustless socio-technical systems

The blockchain technology promises to transform finance, money and even governments. However, analyses of blockchain applicability and robustness typically focus on isolated systems whose actors contribute mainly by running the consensus algorithm. Here, we highlight the importance of considering trustless platforms within the broader ecosystem that includes social and...

Tracing patterns and shapes in remittance and migration networks via persistent homology

Pattern detection in network models provides insights to both global structure and local node interactions. In particular, studying patterns embedded within remittance and migration flow networks can be useful in understanding economic and sociologic trends and phenomena and their implications both in regional and global settings. We illustrate how topo-algebraic methods can be...

Tampering with Twitter’s Sample API

Social media data is widely analyzed in computational social science. Twitter, one of the largest social media platforms, is used for research, journalism, business, and government to analyze human behavior at scale. Twitter offers data via three different Application Programming Interfaces (APIs). One of which, Twitter’s Sample API, provides a freely available 1% and a costly 10...

Can co-location be used as a proxy for face-to-face contacts?

Technological advances have led to a strong increase in the number of data collection efforts aimed at measuring co-presence of individuals at different spatial resolutions. It is however unclear how much co-presence data can inform us on actual face-to-face contacts, of particular interest to study the structure of a population in social groups or for use in data-driven models...

Academic performance and behavioral patterns

Identifying the factors that influence academic performance is an essential part of educational research. Previous studies have documented the importance of personality traits, class attendance, and social network structure. Because most of these analyses were based on a single behavioral aspect and/or small sample sizes, there is currently no quantification of the interplay of...

Network of families in a contemporary population: regional and cultural assortativity

Using a large dataset with individual-level demographic information of almost 60,000 families in contemporary Finland, we analyse the regional variation and cultural assortativity by studying the network between families and the network between kins. For the network of families the largest connected component is found to consist of around 1000 families mostly originated from one...

Methods for quantifying effects of social unrest using credit card transaction data

Societal unrest and similar events are important for societies, but it is often difficult to quantify their effects on individuals, hindering a timely and effective policy-making in emergencies and in particular localized social shocks such as protests. Traditionally, effects are assessed through economic indicators or surveys with relatively low temporal and spatial resolutions...

Success in books: a big data approach to bestsellers

Reading remains the preferred leisure activity for most individuals, continuing to offer a unique path to knowledge and learning. As such, books remain an important cultural product, consumed widely. Yet, while over 3 million books are published each year, very few are read widely and less than 500 make it to the New York Times bestseller lists. And once there, only a handful of...

Discovering temporal regularities in retail customers’ shopping behavior

In this paper we investigate the regularities characterizing the temporal purchasing behavior of the customers of a retail market chain. Most of the literature studying purchasing behavior focuses on what customers buy while giving few importance to the temporal dimension. As a consequence, the state of the art does not allow capturing which are the temporal purchasing patterns...

Feature analysis of multidisciplinary scientific collaboration patterns based on PNAS

The features of collaboration patterns are often considered to be different from discipline to discipline. Meanwhile, collaborating among disciplines is an obvious feature emerged in modern scientific research, which incubates several interdisciplines. The features of collaborations in and among the disciplines of biological, physical and social sciences are analyzed based on 52...

A scalable method to quantify the relationship between urban form and socio-economic indexes

The world is undergoing a process of fast and unprecedented urbanisation. It is reported that by 2050 66% of the entire world population will live in cities. Although this phenomenon is generally considered beneficial, it is also causing housing crises and more inequality worldwide. In the past, the relationship between design features of cities and socio-economic levels of their...

Collective aspects of privacy in the Twitter social network

Preserving individual control over private information is one of the rising concerns in our digital society. Online social networks exist in application ecosystems that allow them to access data from other services, for example gathering contact lists through mobile phone applications. Such data access might allow social networking sites to create shadow profiles with information...

Understanding predictability and exploration in human mobility

Predictive models for human mobility have important applications in many fields including traffic control, ubiquitous computing, and contextual advertisement. The predictive performance of models in literature varies quite broadly, from over 90% to under 40%. In this work we study which underlying factors - in terms of modeling approaches and spatio-temporal characteristics of...

Understanding the predictability of user demographics from cyber-physical-social behaviours in indoor retail spaces

Understanding the association between customer demographics and behaviour is critical for operators of indoor retail spaces. This study explores such an association based on a combined understanding of customer Cyber (online), Physical, and (some aspects of) Social (CPS) behaviour, at the conjunction of corresponding CPS spaces. We combine the results of a traditional...

Measuring and monitoring collective attention during shocking events

There has been growing interest in leveraging Web-based social and communication technologies for better crisis response. How might the Web platforms be used as an observatory to systematically understand the dynamics of the public’s attention during disaster events? And how could we monitor such attention in a cost-effective way? In this work, we propose an ‘attention shift...

Measuring economic activity in China with mobile big data

Emerging trends in the use of smartphones, online mapping applications, and social media, in addition to the geo-located data they generate, provide opportunities to trace users’ socio-economic activities in an unprecedentedly granular and direct fashion and have triggered a revolution in empirical research. These vast mobile data offer new perspectives and approaches to measure...

Sentiment analysis methods for understanding large-scale texts: a case for using continuum-scored words and word shift graphs

The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, an extraordinary capacity which has profound implications for our understanding of human behavior. Given the growing assortment of sentiment-measuring instruments, it is imperative to understand which aspects of sentiment dictionaries contribute to both...

Are you getting sick? Predicting influenza-like symptoms using human mobility behaviors

Understanding and modeling the mobility of individuals is of paramount importance for public health. In particular, mobility characterization is key to predict the spatial and temporal diffusion of human-transmitted infections. However, the mobility behavior of a person can also reveal relevant information about her/his health conditions. In this paper, we study the impact of...

Novel and topical business news and their impact on stock market activity

We propose an indicator to measure the degree to which a particular news article is novel, as well as an indicator to measure the degree to which a particular news item attracts attention from investors. The novelty measure is obtained by comparing the extent to which a particular news article is similar to earlier news articles, and an article is regarded as novel if there was...

Gaining historical and international relations insights from social media: spatio-temporal real-world news analysis using Twitter

The immense growth of the social Web, which has made a large amount of user data easily and publicly available, has opened a whole new spectrum for research in social behavioral sciences. However, as the volume of social media content increases at a very fast rate, it becomes extremely difficult to systematically obtain high-level information from this data. As a consequence...

Estimating local commuting patterns from geolocated Twitter data

The emergence of large stores of transactional data generated by increasing use of digital devices presents a huge opportunity for policymakers to improve their knowledge of the local environment and thus make more informed and better decisions. A research frontier is hence emerging which involves exploring the type of measures that can be drawn from data stores such as mobile...