# EPJ Data Science

## List of Papers (Total 115)

#### Prediction of employment and unemployment rates from Twitter daily rhythms in the US

By modeling macro-economical indicators using digital traces of human activities on mobile or social networks, we can provide important insights to processes previously assessed via paper-based surveys or polls only. We collected aggregated workday activity timelines of US counties from the normalized number of messages sent in each hour on the online social network Twitter. In ...

#### Early detection of promoted campaigns on social media

Social media expose millions of users every day to information campaigns - some emerging organically from grassroots activity, others sustained by advertising or other coordinated efforts. These campaigns contribute to the shaping of collective opinions. While most information campaigns are benign, some may be deployed for nefarious purposes, including terrorist propaganda, ...

#### An alternative approach to the limits of predictability in human mobility

Next place prediction algorithms are invaluable tools, capable of increasing the efficiency of a wide variety of tasks, ranging from reducing the spreading of diseases to better resource management in areas such as urban planning. In this work we estimate upper and lower limits on the predictability of human mobility to help assess the performance of competing algorithms. We do ...

#### Inferring social influence in transport mode choice using mobile phone data

A longitudinal mobile phone data that include both location and communication logs is analyzed to infer social influence in terms of ego-network effect in the commute mode choice. The results show that person’s strong ties are more important to determine if driving is the person’s transport mode choice, whereas weak ties are more important to determine if public transit is the ...

#### Individual position diversity in dependence socioeconomic networks increases economic output

The availability of big data recorded from massively multiplayer online role-playing games (MMORPGs) allows us to gain a deeper understanding of the potential connection between individuals’ network positions and their economic outputs. We use a statistical filtering method to construct dependence networks from weighted friendship networks of individuals. We investigate the 30 ...

#### Uncovering the relationships between military community health and affects expressed in social media

Military populations present a small, unique community whose mental and physical health impacts the security of the nation. Recent literature has explored social media’s ability to enhance disease surveillance and characterize distinct communities with encouraging results. We present a novel analysis of the relationships between influenza-like illnesses (ILI) clinical data and ...

#### Topological analysis of data

Propelled by a fast evolving landscape of techniques and datasets, data science is growing rapidly. Against this background, topological data analysis (TDA) has carved itself a niche for the analysis of datasets that present complex interactions and rich structures. Its distinctive feature, topology, allows TDA to detect, quantify and compare the mesoscopic structures of data, ...

#### Contact activity and dynamics of the social core

Humans interact through numerous communication channels to build and maintain social connections: they meet face-to-face, make phone calls or send text messages, and interact via social media. Although it is known that the network of physical contacts, for example, is distinct from the network arising from communication events via phone calls and instant messages, the extent to ...

#### Gender matters! Analyzing global cultural gender preferences for venues using social sensing

Gender differences is a phenomenon around the world actively researched by social scientists. Traditionally, the data used to support such studies is manually obtained, often through surveys with volunteers. However, due to their inherent high costs because of manual steps, such traditional methods do not quickly scale to large-size studies. We here investigate a particular aspect ...

Most individuals in social networks experience a so-called Friendship Paradox: they are less popular than their friends on average. This effect may explain recent findings that widespread social network media use leads to reduced happiness. However the relation between popularity and happiness is poorly understood. A Friendship paradox does not necessarily imply a Happiness paradox ...

#### Improving official statistics in emerging markets using machine learning and mobile phone data

Mobile phones are one of the fastest growing technologies in the developing world with global penetration rates reaching 90%. Mobile phone data, also called CDR, are generated everytime phones are used and recorded by carriers at scale. CDR have generated groundbreaking insights in public health, official statistics, and logistics. However, the fact that most phones in developing ...

#### BiFold visualization of bipartite datasets

The emerging domain of data-enabled science necessitates development of algorithms and tools for knowledge discovery. Human interaction with data through well-constructed graphical representation can take special advantage of our visual ability to identify patterns. We develop a data visualization framework, called BiFold, for exploratory analysis of bipartite datasets that ...

#### Absence makes the heart grow fonder: social compensation when failure to interact risks weakening a relationship

Social networks require active relationship maintenance if they are to be kept at a constant level of emotional closeness. For primates, including humans, failure to interact leads inexorably to a decline in relationship quality, and a consequent loss of the benefits that derive from individual relationships. As a result, many social species compensate for weakened relationships by ...

#### Applying Hidden Markov Models to Voting Advice Applications

In recent times, a phenomenon that threatens the representative democracy of many developed countries is the low voter turnout. Voting Advice Applications (VAAs) are used to inform citizens about the political stances of the parties that involved in the upcoming elections, in an effort to facilitate their decision making process and increase their participation in this democratic ...

#### Generic temporal features of performance rankings in sports and games

Many complex phenomena, from trait selection in biological systems to hierarchy formation in social and economic entities, show signs of competition and heterogeneous performance in the temporal evolution of their components, which may eventually lead to stratified structures such as the worldwide wealth distribution. However, it is still unclear whether the road to hierarchical ...

#### Estimating suicide occurrence statistics using Google Trends

Data on the number of people who have committed suicide tends to be reported with a substantial time lag of around two years. We examine whether online activity measured by Google searches can help us improve estimates of the number of suicide occurrences in England before official figures are released. Specifically, we analyse how data on the number of Google searches for the ...

#### The emotional arcs of stories are dominated by six basic shapes

Advances in computing power, natural language processing, and digitization of text now make it possible to study a culture’s evolution through its texts using a ‘big data’ lens. Our ability to communicate relies in part upon a shared emotional experience, with stories often following distinct emotional trajectories and forming patterns that are meaningful to us. Here, by ...

#### Predicting human mobility through the assimilation of social media traces into mobility models

Predicting human mobility flows at different spatial scales is challenged by the heterogeneity of individual trajectories and the multi-scale nature of transportation networks. As vast amounts of digital traces of human behaviour become available, an opportunity arises to improve mobility models by integrating into them proxy data on mobility collected by a variety of digital ...

#### Convergence of economic growth and the Great Recession as seen from a Celestial Observatory

Macroeconomic theories of growth and wealth distribution have an outsized influence on national and international social and economic policies. Yet, due to a relative lack of reliable, system wide data, many such theories remain, at best, unvalidated and, at worst, misleading. In this paper, we introduce a novel economic observatory and framework enabling high resolution ...

#### Ground truth? Concept-based communities versus the external classification of physics manuscripts

Community detection techniques are widely used to infer hidden structures within interconnected systems. Despite demonstrating high accuracy on benchmarks, they reproduce the external classification for many real-world systems with a significant level of discrepancy. A widely accepted reason behind such outcome is the unavoidable loss of non-topological information (such as node ...

#### Quantifying decision making for data science: from data acquisition to modeling

Organizations, irrespective of their size and type, are increasingly becoming data-driven or aspire to become data-driven. There is a rush to quantify value of their own internal data or the value of integrating their internal data with external data, and performing modeling on such data. A question that analytics teams often grapple with is whether to acquire more data or expend ...

#### The classical origin of modern mathematics

This paper introduces a data-driven methodology to study the historical evolution of mathematical thinking and its spatial spreading. To do so, we have collected and integrated data from different online academic datasets. In its final form, the database includes a large number ($N\sim200\mbox{K}$) of advisor-student relationships, with affiliations and keywords on their research ...

#### Countrywide arrhythmia: emergency event detection using mobile phone data

Large scale social events that involve violence may have dramatic political, economic and social consequences. These events may result in higher crime rates, spreading of infectious diseases, economic crises, and even in migration phenomena (e.g., refugees across borders or internally displaced people). Hence, researchers have started using mobile phone data for developing tools to ...

#### A multilayer approach to multiplexity and link prediction in online geo-social networks

Online social systems are multiplex in nature as multiple links may exist between the same two users across different social media. In this work, we study the geo-social properties of multiplex links, spanning more than one social network and apply their structural and interaction features to the problem of link prediction across social networking services. Exploring the ...