Understanding predictability and exploration in human mobility
Cuttone et al. EPJ Data Science (2018) 7:2
https://doi.org/10.1140/epjds/s13688-017-0129-1
REGULAR ARTICLE
Open Access
Understanding predictability and
exploration in human mobility
Andrea Cuttone1,3* , Sune Lehmann1,2 and Marta C. González3,4,5
*
Correspondence:
1
DTU Compute, Technical University
of Denmark, Richard Petersens Plads
Building 324, Kgs. Lyngby, Denmark
3
Department of Civil and
Environmental Engineering and
Engineering Systems,
Massachusetts Institute of
Technology, 77 Massachusetts
Avenue, 1-290, Cambridge, 02139,
United States
Full list of author information is
available at the end of the article
Abstract
Predictive models for human mobility have important applications in many fields
including traffic control, ubiquitous computing, and contextual advertisement. The
predictive performance of models in literature varies quite broadly, from over 90% to
under 40%. In this work we study which underlying factors - in terms of modeling
approaches and spatio-temporal characteristics of the data sources - have resulted in
this remarkably broad span of performance reported in the literature. Specifically we
investigate which factors influence the accuracy of next-place prediction, using a
high-precision location dataset of more than 400 users observed for periods between
3 months and one year. We show that it is much easier to achieve high accuracy
when predicting the time-bin location than when predicting the next place.
Moreover, we demonstrate how the temporal and spatial resolution of the data have
strong influence on the accuracy of prediction. Finally we reveal that the exploration
of new locations is an important factor in human mobility, and we measure that on
average 20-25% of transitions are to new places, and approx. 70% of locations are
visited only once. We discuss how these mechanisms are important factors limiting
our ability to predict human mobility.
Keywords: human mobility; next-location prediction; predictability
1 Introduction
Billions of personal devices, ranging from in-car GPS to mobile phones and fitness
bracelets, connect us to the cloud. These ubiquitous interconnections between the physical and the digital world open up a host of new opportunities for predictive mobility models. Each user of a device produces rich information that allows us to measure their daily
mobility routine. This type of knowledge, especially when arising from large numbers of
individuals, is expected to impact a wide range of areas such as health monitoring [1],
ubiquitous computing [2, 3], disaster response [4], or smart traffic management [5]. Recent contributions to mobility modeling come from computer science [6–8], transportation engineering [9, 10], geographic information sciences [11, 12], and complexity sciences
[13–15]. The state-of-the-art for mobility modeling has developed rapidly over the past
decade, but further work is needed, especially to tackle the problem of individual predictability.
In the literature, human mobility has been studied using a multitude of proxies (for example call detail records (CDR), GPS, WiFi, travel surveys), and a variety of techniques
© The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and
indicate if changes were made.
Cuttone et al. EPJ Data Science (2018) 7:2
Page 2 of 17
have been suggested for predictive models, such as Markov chains, Naive Bayes, artificial neural networks, time series analysis. Studies report varying results for the predictive
power of these models, with accuracy from over 90% to under 40%.
In this paper we set out to uncover the reasons behind these surprisingly large differences in performance via a systematic investigation of the factors that may influence estimates of mobility predictability. The key contributions of this paper are:
1. We investigate which factors influence performance in the reported cases of
predictability. Showing that the differences in results are driven by the following
questions: (a) Does the analysis concern an upper bound of predictability, or actual
next-place prediction? (b) How is the prediction problem formulated? E.g. is the
goal to predict the next location, or is the goal to identify location in the next
time-bin? (c) What is the spatial resolution of the data source? E.g. is the analysis
based on GPS vs. CDR data? (d) What is the temporal resolution of the data source
e.g. minutes, hours?
2. We quantify the amount of explorations and locations visited only once, and show
that these are key limiting factors in the accuracy of predictions for individual
mobility.
3. We measure the predictive power of a number of contextual features (e.g. social
proximity, time, call/SMS).
4. We study the problem of predictability of human mobility using a novel,
longitudinal, high-precision location dataset for more than 400 users.
The rest of the paper is organized as follows. We first provide an overview of related work
in the field of predictions of human mobility. Next, we introduce the dataset and describe
the preprocessing steps. In the subsequent section we describe the baseline models, and
compare their performances. Finally we introduce the exploration prediction problem and
report the performance of the predictive models.
2 Related work
In a seminal paper Song et al. [13] investigate the limits of predictability of human mobility,
using Call Detail Records (CDR) as proxy for human movement. In their analysis, the
authors discretize location into a sequence of places, and estimate an upper limit for the
predictive performance using Fano’s inequality on the temporal entropy of visits. Their
results show that for a majority of users, this upper bound is surprisingly high (93%). This
framework has been further explored to refine the estimate of the upper limit. Specifically,
Lin et al. [16] study the effects of spatial and temporal resolution on the predictability
limit, Smith et al. [17] consider spatial reachability constraints when selecting the next
place to visit, and obtain a tighter upper bound of 81-85%, and Lu et al. [4] analyze the
predictability of the population of Haiti after the earthquake in 2010, and find an upper
limit of predictability of around 85%. The work described above focuses on estimating an
upper limit of predictability for an individual based on an estimate of the entropy their
trajectory.
When the topic is actual prediction performance, the most studied models are Markov
chains, where the probability of the next location is assumed to depend only on the current
location. Markov chains have been applied to a variety of data sets. Lu et al. [18] applied
Markov chain models to CDR-based locations in Cote D’Ivore, with a prediction (...truncated)