Investigating causality in human behavior from smartphone sensor data: a quasi-experimental approach
Tsapeli and Musolesi EPJ Data Science (2015) 4:24
DOI 10.1140/epjds/s13688-015-0061-1
REGULAR ARTICLE
Open Access
Investigating causality in human behavior
from smartphone sensor data:
a quasi-experimental approach
Fani Tsapeli1* and Mirco Musolesi1,2
*
Correspondence:
1
School of Computer Science,
University of Birmingham,
Edgbaston, Birmingham, B15 2TT,
United Kingdom
Full list of author information is
available at the end of the article
Abstract
Smartphones and wearables have become an indispensable part of our daily life.
Their improved sensing and computing capabilities bring new opportunities for
human behavior monitoring and analysis. Most work so far has been focused on
detecting correlation rather than causation among features extracted from
smartphone data. However, pure correlation analysis does not offer sufficient
understanding of human behavior. Moreover, causation analysis could allow scientists
to identify factors that have a causal effect on health and well-being issues, such as
obesity, stress, depression and so on and suggest actions to deal with them. Finally,
detecting causal relationships in this kind of observational data is challenging since,
in general, subjects cannot be randomly exposed to an event.
In this article, we discuss the design, implementation and evaluation of a generic
quasi-experimental framework for conducting causation studies on human behavior
from smartphone data. We demonstrate the effectiveness of our approach by
investigating the causal impact of several factors such as exercise, social interactions
and work on stress level. Our results indicate that exercising and spending time
outside home and working environment have a positive effect on participants stress
level while reduced working hours only slightly impact stress.
Keywords: smartphone data; causality; human behavior; stress modeling
1 Introduction
Nowadays, people generate vast amounts of data through the devices they interact with
during their daily activities, leaving a rich variety of digital traces. Indeed, our mobile
phones have been transformed into powerful devices with increased computational and
sensing power, capable of capturing any communication activity, including both mediated
and face-to-face interactions. User location can be easily monitored and activities (e.g.,
running, walking, standing, traveling on public transit, etc.) can be inferred from raw accelerometer data captured by our smartphones [, ]. Even more complex information
such as our emotional state or our stress level can be inferred either by processing voice
signals captured by means of smartphone’s microphones [, ] or by combining information, extracted from several sensors, which correlates with our mood [–]. Moreover,
we keep track of our daily schedule by using digital calendars and we use social media to
share our experiences, opinions and emotions with our friends. Wearable devices that are
© 2015 Tsapeli and Musolesi. This article is distributed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any
medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons
license, and indicate if changes were made.
Tsapeli and Musolesi EPJ Data Science (2015) 4:24
Page 2 of 15
able to monitor physical indicators with a very high level of accuracy are also increasingly
popular.
Leveraging this rich variety of human-generated information could provide new insights
on a variety of open research questions and issues in several scientific domains such as sociology, psychology, behavioral finance and medicine. For example, several works have
demonstrated that online social media could act as crowd sensing platforms; the aggregated opinions posted in online social media have been used to predict movies revenues
[], elections results [] or even stock market prices []. Social influence effects in social networks have been also investigated in several projects either using observational
data [, ] or by conducting randomized trials [, ]. Other works also use mobility
traces in order to study social patterns [] or to model the spreading of contagious diseases []. Moreover, the use of smartphones is increasingly used to monitor and better
understand the causes of health problems such as addictions, obesity, stress and depression [, , ]. Smartphones enable continuous and unobtrusive monitoring of human
behavior and, therefore, could allow scientists to conduct large-scale studies using reallife data rather than lab constrained experiments. In this direction, in [] the authors
attempt to explain sleeping disorders reported by individuals, by investigating the correlations between sociability, mood and sleeping quality, based on data captured by mobile
phones sensors and surveys. Also, in [] the authors study the links between unhealthy
habits, such as poor-quality eating and lack of exercise, and the eating and exercise habits
of the user’s social network. However, both studies are based on correlation analysis and,
consequently, they are not sufficient for deriving valid conclusions about the causal links
between the examined variables. For example, an observed correlation between the eating
and exercising habits of a social group does not necessarily imply that eating and exercise
habits of individuals are influenced by their social group and, therefore, could be modified
by changing someone’s social group. Instead, the observed correlation could be due to the
fact that people tend to have social relationships with people with similar habits.
The efficient exploitation of human generated data in order to uncover causal links
among factors of interest remains an open research issue. Some works have proposed
the use of randomized trials [, ]. According to this technique, the causal effects of an
event or treatment are examined by exposing a randomly selected subset of participants
(treatment group) to this event and comparing the result with the corresponding outcome
on a control group (i.e., a subset of participants who have not been exposed to the event).
By randomly assigning participants to treatment and control groups it is assured that, on
average, there will be no systematic difference on the baseline characteristics of the participants between the two groups. Baseline characteristics are considered to be any characteristics of the subjects that could be related with the study (e.g. in a clinical study the
age and the previous health status of the subjects could be considered as baseline characteristics). While randomized trials represent a reliable way to detect causal relationships,
they require the direct intervention of scientists in participants’ life, which is sometimes
unethical or just not feasible. Moreover, such experimental studies cannot exploit the vast
am (...truncated)