Investigating causality in human behavior from smartphone sensor data: a quasi-experimental approach

EPJ Data Science, Dec 2015

Smartphones and wearables have become an indispensable part of our daily life. Their improved sensing and computing capabilities bring new opportunities for human behavior monitoring and analysis. Most work so far has been focused on detecting correlation rather than causation among features extracted from smartphone data. However, pure correlation analysis does not offer sufficient understanding of human behavior. Moreover, causation analysis could allow scientists to identify factors that have a causal effect on health and well-being issues, such as obesity, stress, depression and so on and suggest actions to deal with them. Finally, detecting causal relationships in this kind of observational data is challenging since, in general, subjects cannot be randomly exposed to an event. In this article, we discuss the design, implementation and evaluation of a generic quasi-experimental framework for conducting causation studies on human behavior from smartphone data. We demonstrate the effectiveness of our approach by investigating the causal impact of several factors such as exercise, social interactions and work on stress level. Our results indicate that exercising and spending time outside home and working environment have a positive effect on participants stress level while reduced working hours only slightly impact stress.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1140%2Fepjds%2Fs13688-015-0061-1.pdf

Investigating causality in human behavior from smartphone sensor data: a quasi-experimental approach

Tsapeli and Musolesi EPJ Data Science (2015) 4:24 DOI 10.1140/epjds/s13688-015-0061-1 REGULAR ARTICLE Open Access Investigating causality in human behavior from smartphone sensor data: a quasi-experimental approach Fani Tsapeli1* and Mirco Musolesi1,2 * Correspondence: 1 School of Computer Science, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom Full list of author information is available at the end of the article Abstract Smartphones and wearables have become an indispensable part of our daily life. Their improved sensing and computing capabilities bring new opportunities for human behavior monitoring and analysis. Most work so far has been focused on detecting correlation rather than causation among features extracted from smartphone data. However, pure correlation analysis does not offer sufficient understanding of human behavior. Moreover, causation analysis could allow scientists to identify factors that have a causal effect on health and well-being issues, such as obesity, stress, depression and so on and suggest actions to deal with them. Finally, detecting causal relationships in this kind of observational data is challenging since, in general, subjects cannot be randomly exposed to an event. In this article, we discuss the design, implementation and evaluation of a generic quasi-experimental framework for conducting causation studies on human behavior from smartphone data. We demonstrate the effectiveness of our approach by investigating the causal impact of several factors such as exercise, social interactions and work on stress level. Our results indicate that exercising and spending time outside home and working environment have a positive effect on participants stress level while reduced working hours only slightly impact stress. Keywords: smartphone data; causality; human behavior; stress modeling 1 Introduction Nowadays, people generate vast amounts of data through the devices they interact with during their daily activities, leaving a rich variety of digital traces. Indeed, our mobile phones have been transformed into powerful devices with increased computational and sensing power, capable of capturing any communication activity, including both mediated and face-to-face interactions. User location can be easily monitored and activities (e.g., running, walking, standing, traveling on public transit, etc.) can be inferred from raw accelerometer data captured by our smartphones [, ]. Even more complex information such as our emotional state or our stress level can be inferred either by processing voice signals captured by means of smartphone’s microphones [, ] or by combining information, extracted from several sensors, which correlates with our mood [–]. Moreover, we keep track of our daily schedule by using digital calendars and we use social media to share our experiences, opinions and emotions with our friends. Wearable devices that are © 2015 Tsapeli and Musolesi. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Tsapeli and Musolesi EPJ Data Science (2015) 4:24 Page 2 of 15 able to monitor physical indicators with a very high level of accuracy are also increasingly popular. Leveraging this rich variety of human-generated information could provide new insights on a variety of open research questions and issues in several scientific domains such as sociology, psychology, behavioral finance and medicine. For example, several works have demonstrated that online social media could act as crowd sensing platforms; the aggregated opinions posted in online social media have been used to predict movies revenues [], elections results [] or even stock market prices []. Social influence effects in social networks have been also investigated in several projects either using observational data [, ] or by conducting randomized trials [, ]. Other works also use mobility traces in order to study social patterns [] or to model the spreading of contagious diseases []. Moreover, the use of smartphones is increasingly used to monitor and better understand the causes of health problems such as addictions, obesity, stress and depression [, , ]. Smartphones enable continuous and unobtrusive monitoring of human behavior and, therefore, could allow scientists to conduct large-scale studies using reallife data rather than lab constrained experiments. In this direction, in [] the authors attempt to explain sleeping disorders reported by individuals, by investigating the correlations between sociability, mood and sleeping quality, based on data captured by mobile phones sensors and surveys. Also, in [] the authors study the links between unhealthy habits, such as poor-quality eating and lack of exercise, and the eating and exercise habits of the user’s social network. However, both studies are based on correlation analysis and, consequently, they are not sufficient for deriving valid conclusions about the causal links between the examined variables. For example, an observed correlation between the eating and exercising habits of a social group does not necessarily imply that eating and exercise habits of individuals are influenced by their social group and, therefore, could be modified by changing someone’s social group. Instead, the observed correlation could be due to the fact that people tend to have social relationships with people with similar habits. The efficient exploitation of human generated data in order to uncover causal links among factors of interest remains an open research issue. Some works have proposed the use of randomized trials [, ]. According to this technique, the causal effects of an event or treatment are examined by exposing a randomly selected subset of participants (treatment group) to this event and comparing the result with the corresponding outcome on a control group (i.e., a subset of participants who have not been exposed to the event). By randomly assigning participants to treatment and control groups it is assured that, on average, there will be no systematic difference on the baseline characteristics of the participants between the two groups. Baseline characteristics are considered to be any characteristics of the subjects that could be related with the study (e.g. in a clinical study the age and the previous health status of the subjects could be considered as baseline characteristics). While randomized trials represent a reliable way to detect causal relationships, they require the direct intervention of scientists in participants’ life, which is sometimes unethical or just not feasible. Moreover, such experimental studies cannot exploit the vast am (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1140%2Fepjds%2Fs13688-015-0061-1.pdf
Article home page: https://link.springer.com/article/10.1140/epjds/s13688-015-0061-1

Fani Tsapeli, Mirco Musolesi. Investigating causality in human behavior from smartphone sensor data: a quasi-experimental approach, EPJ Data Science, 2015, pp. 24, Volume 4, Issue 1, DOI: 10.1140/epjds/s13688-015-0061-1