Spatio-temporal techniques for user identification by means of GPS mobility data
Rossi et al. EPJ Data Science (2015) 4:11
DOI 10.1140/epjds/s13688-015-0049-x
REGULAR ARTICLE
Open Access
Spatio-temporal techniques for user
identification by means of GPS mobility data
Luca Rossi1* , James Walker1 and Mirco Musolesi2,1
*
Correspondence:
1
School of Computer Science,
University of Birmingham,
Birmingham, B15 2TT, UK
Full list of author information is
available at the end of the article
Abstract
One of the greatest concerns related to the popularity of GPS-enabled devices and
applications is the increasing availability of the personal location information
generated by them and shared with application and service providers. Moreover,
people tend to have regular routines and be characterized by a set of “significant
places”, thus making it possible to identify a user from his/her mobility data.
In this paper we present a series of techniques for identifying individuals from their
GPS movements. More specifically, we study the uniqueness of GPS information for
three popular datasets, and we provide a detailed analysis of the discriminatory
power of speed, direction and distance of travel. Most importantly, we present a
simple yet effective technique for the identification of users from location information
that are not included in the original dataset used for training, thus raising important
privacy concerns for the management of location datasets.
Keywords: GPS; privacy; identification
1 Introduction
Current and past location information can be considered as the most sensitive data
for an individual [, ]. This is particularly true when entire trajectories of individuals
are collected and stored by applications and service providers. Indeed, companies, such
as telecommunication operators and service providers, and governmental organizations
have access to large collections of person and communication data, which may be used
for maintaining and managing communications services, security and surveillance: these
include person location data, which can be collected from GPS devices, cellular phone
usage and WiFi hotspots.
In particular, with the increasing availability and popularity of embedded GPS receivers
into personal devices and the ability to locate cellular phone users from their interactions with network antennas [], new opportunities arise for gaining knowledge about
person movement behavior. An increasing number of researchers has been investigating
new ways to mine this wealth of location-based data. Examples include the prediction of
the future location of a person [], their mode of transport [] and the identification of
individuals from a sample of their location data []. In [] it was shown that there is a high
degree of temporal and spatial regularity in human trajectories: people are more likely to
visit an area if they have been frequently visited it in the past. Moreover, the time a person returns to a location is very likely to be close to that of his/her previous visits. Thus,
© 2015 Rossi et al. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and
indicate if changes were made.
Rossi et al. EPJ Data Science (2015) 4:11
given a geographic trajectory, i.e., a collection of chronologically ordered visited locations,
a potential attacker can discover a considerable amount of information about that person,
such as their home, place of work, interactions with other people and visits to sensitive
locations.
The focus of this work is on location based fingerprinting: the aim is to identify individuals from their movement behavior. As with identifying individuals by the ridges on their
finger, the ability to identify them by their mobility traces depends on the uniqueness of
the mobility data associated with them. By uniqueness here we mean the extent to which
a recorded location in a dataset is shared among different individuals, i.e., the less shared
a location is, the more unique it is. Also, as with traditional fingerprinting, some information about the person to be identified needs to have been previously recorded. A recent
contribution in this sense is represented by the work of de Montjoye et al. [], where the
authors are able to identify users from a small subset of their location records taken from
mobile phone service antennas. We would like to underline a major difference between
this work and that by de Montjoye et al., as in theirs the training set also includes the
points used for the testing and the mobility traces are extracted from mobile operators’
call data records, instead of exact GPS points.
In this paper, to the best of our knowledge, we present the first evaluation of the uniqueness of GPS data traces and we show that, with the high spatial and temporal precision of
GPS, a small number of mobility points, even not present in the given mobility databases
used for classification, is sufficient to accurately identify individuals. More specifically, the
contribution of our work is threefold: Firstly, we show that it is possible to identify individuals with great accuracy using various types of movement data such as speed, direction
and distance of travel recorded by means of GPS devices. This suggests that additional care
is necessary when anonymized data, also not containing exact geographic coordinates,
are released to the public. Secondly, we provide an extensive evaluation of the uniqueness of GPS mobility traces by means of three real-world datasets, namely CabSpotting
[], CenceMe [] and GeoLife []. We consider both spatial as well as spatio-temporal
information, and we show that, in the datasets being investigated, as little as two points
are sufficient to uniquely identify nearly all the users. We also evaluate the impact of the
dataset size and the precision of the GPS coordinates on the uniqueness of the data. Our
findings show that, in some datasets, it is possible to reduce the average uniqueness by
means of spatio-temporal coarsening and achieve a given k-anonymity [, ]. Finally, we
introduce a simple yet effective technique for the identification of users from location information that are not included in the original dataset used for extracting the user mobility
signatures. We also propose a way to measure the extent to which a dataset can resist to
an identification attack based on the techniques proposed in this paper.
The remainder of this article is organized as follows: Section describes the datasets
used in this study. Section introduces our framework for the evaluation of the uniqueness of mobility data and the identification of users by means of previously unseen points.
Section presents an extensive experimental evaluation on real-world datasets, and we
summarize our main findings in Section . (...truncated)