Understanding predictability and exploration in human mobility

EPJ Data Science, Jan 2018

Predictive models for human mobility have important applications in many fields including traffic control, ubiquitous computing, and contextual advertisement. The predictive performance of models in literature varies quite broadly, from over 90% to under 40%. In this work we study which underlying factors - in terms of modeling approaches and spatio-temporal characteristics of the data sources - have resulted in this remarkably broad span of performance reported in the literature. Specifically we investigate which factors influence the accuracy of next-place prediction, using a high-precision location dataset of more than 400 users observed for periods between 3 months and one year. We show that it is much easier to achieve high accuracy when predicting the time-bin location than when predicting the next place. Moreover, we demonstrate how the temporal and spatial resolution of the data have strong influence on the accuracy of prediction. Finally we reveal that the exploration of new locations is an important factor in human mobility, and we measure that on average 20-25% of transitions are to new places, and approx. 70% of locations are visited only once. We discuss how these mechanisms are important factors limiting our ability to predict human mobility.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1140%2Fepjds%2Fs13688-017-0129-1.pdf

Understanding predictability and exploration in human mobility

Cuttone et al. EPJ Data Science (2018) 7:2 https://doi.org/10.1140/epjds/s13688-017-0129-1 REGULAR ARTICLE Open Access Understanding predictability and exploration in human mobility Andrea Cuttone1,3* , Sune Lehmann1,2 and Marta C. González3,4,5 * Correspondence: 1 DTU Compute, Technical University of Denmark, Richard Petersens Plads Building 324, Kgs. Lyngby, Denmark 3 Department of Civil and Environmental Engineering and Engineering Systems, Massachusetts Institute of Technology, 77 Massachusetts Avenue, 1-290, Cambridge, 02139, United States Full list of author information is available at the end of the article Abstract Predictive models for human mobility have important applications in many fields including traffic control, ubiquitous computing, and contextual advertisement. The predictive performance of models in literature varies quite broadly, from over 90% to under 40%. In this work we study which underlying factors - in terms of modeling approaches and spatio-temporal characteristics of the data sources - have resulted in this remarkably broad span of performance reported in the literature. Specifically we investigate which factors influence the accuracy of next-place prediction, using a high-precision location dataset of more than 400 users observed for periods between 3 months and one year. We show that it is much easier to achieve high accuracy when predicting the time-bin location than when predicting the next place. Moreover, we demonstrate how the temporal and spatial resolution of the data have strong influence on the accuracy of prediction. Finally we reveal that the exploration of new locations is an important factor in human mobility, and we measure that on average 20-25% of transitions are to new places, and approx. 70% of locations are visited only once. We discuss how these mechanisms are important factors limiting our ability to predict human mobility. Keywords: human mobility; next-location prediction; predictability 1 Introduction Billions of personal devices, ranging from in-car GPS to mobile phones and fitness bracelets, connect us to the cloud. These ubiquitous interconnections between the physical and the digital world open up a host of new opportunities for predictive mobility models. Each user of a device produces rich information that allows us to measure their daily mobility routine. This type of knowledge, especially when arising from large numbers of individuals, is expected to impact a wide range of areas such as health monitoring [1], ubiquitous computing [2, 3], disaster response [4], or smart traffic management [5]. Recent contributions to mobility modeling come from computer science [6–8], transportation engineering [9, 10], geographic information sciences [11, 12], and complexity sciences [13–15]. The state-of-the-art for mobility modeling has developed rapidly over the past decade, but further work is needed, especially to tackle the problem of individual predictability. In the literature, human mobility has been studied using a multitude of proxies (for example call detail records (CDR), GPS, WiFi, travel surveys), and a variety of techniques © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Cuttone et al. EPJ Data Science (2018) 7:2 Page 2 of 17 have been suggested for predictive models, such as Markov chains, Naive Bayes, artificial neural networks, time series analysis. Studies report varying results for the predictive power of these models, with accuracy from over 90% to under 40%. In this paper we set out to uncover the reasons behind these surprisingly large differences in performance via a systematic investigation of the factors that may influence estimates of mobility predictability. The key contributions of this paper are: 1. We investigate which factors influence performance in the reported cases of predictability. Showing that the differences in results are driven by the following questions: (a) Does the analysis concern an upper bound of predictability, or actual next-place prediction? (b) How is the prediction problem formulated? E.g. is the goal to predict the next location, or is the goal to identify location in the next time-bin? (c) What is the spatial resolution of the data source? E.g. is the analysis based on GPS vs. CDR data? (d) What is the temporal resolution of the data source e.g. minutes, hours? 2. We quantify the amount of explorations and locations visited only once, and show that these are key limiting factors in the accuracy of predictions for individual mobility. 3. We measure the predictive power of a number of contextual features (e.g. social proximity, time, call/SMS). 4. We study the problem of predictability of human mobility using a novel, longitudinal, high-precision location dataset for more than 400 users. The rest of the paper is organized as follows. We first provide an overview of related work in the field of predictions of human mobility. Next, we introduce the dataset and describe the preprocessing steps. In the subsequent section we describe the baseline models, and compare their performances. Finally we introduce the exploration prediction problem and report the performance of the predictive models. 2 Related work In a seminal paper Song et al. [13] investigate the limits of predictability of human mobility, using Call Detail Records (CDR) as proxy for human movement. In their analysis, the authors discretize location into a sequence of places, and estimate an upper limit for the predictive performance using Fano’s inequality on the temporal entropy of visits. Their results show that for a majority of users, this upper bound is surprisingly high (93%). This framework has been further explored to refine the estimate of the upper limit. Specifically, Lin et al. [16] study the effects of spatial and temporal resolution on the predictability limit, Smith et al. [17] consider spatial reachability constraints when selecting the next place to visit, and obtain a tighter upper bound of 81-85%, and Lu et al. [4] analyze the predictability of the population of Haiti after the earthquake in 2010, and find an upper limit of predictability of around 85%. The work described above focuses on estimating an upper limit of predictability for an individual based on an estimate of the entropy their trajectory. When the topic is actual prediction performance, the most studied models are Markov chains, where the probability of the next location is assumed to depend only on the current location. Markov chains have been applied to a variety of data sets. Lu et al. [18] applied Markov chain models to CDR-based locations in Cote D’Ivore, with a prediction (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1140%2Fepjds%2Fs13688-017-0129-1.pdf
Article home page: https://link.springer.com/article/10.1140/epjds/s13688-017-0129-1

Andrea Cuttone, Sune Lehmann, Marta C. González. Understanding predictability and exploration in human mobility, EPJ Data Science, 2018, pp. 2, Volume 7, Issue 1, DOI: 10.1140/epjds/s13688-017-0129-1