Spatio-temporal techniques for user identification by means of GPS mobility data

EPJ Data Science, Aug 2015

One of the greatest concerns related to the popularity of GPS-enabled devices and applications is the increasing availability of the personal location information generated by them and shared with application and service providers. Moreover, people tend to have regular routines and be characterized by a set of “significant places”, thus making it possible to identify a user from his/her mobility data. In this paper we present a series of techniques for identifying individuals from their GPS movements. More specifically, we study the uniqueness of GPS information for three popular datasets, and we provide a detailed analysis of the discriminatory power of speed, direction and distance of travel. Most importantly, we present a simple yet effective technique for the identification of users from location information that are not included in the original dataset used for training, thus raising important privacy concerns for the management of location datasets.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1140%2Fepjds%2Fs13688-015-0049-x.pdf

Spatio-temporal techniques for user identification by means of GPS mobility data

Rossi et al. EPJ Data Science (2015) 4:11 DOI 10.1140/epjds/s13688-015-0049-x REGULAR ARTICLE Open Access Spatio-temporal techniques for user identification by means of GPS mobility data Luca Rossi1* , James Walker1 and Mirco Musolesi2,1 * Correspondence: 1 School of Computer Science, University of Birmingham, Birmingham, B15 2TT, UK Full list of author information is available at the end of the article Abstract One of the greatest concerns related to the popularity of GPS-enabled devices and applications is the increasing availability of the personal location information generated by them and shared with application and service providers. Moreover, people tend to have regular routines and be characterized by a set of “significant places”, thus making it possible to identify a user from his/her mobility data. In this paper we present a series of techniques for identifying individuals from their GPS movements. More specifically, we study the uniqueness of GPS information for three popular datasets, and we provide a detailed analysis of the discriminatory power of speed, direction and distance of travel. Most importantly, we present a simple yet effective technique for the identification of users from location information that are not included in the original dataset used for training, thus raising important privacy concerns for the management of location datasets. Keywords: GPS; privacy; identification 1 Introduction Current and past location information can be considered as the most sensitive data for an individual [, ]. This is particularly true when entire trajectories of individuals are collected and stored by applications and service providers. Indeed, companies, such as telecommunication operators and service providers, and governmental organizations have access to large collections of person and communication data, which may be used for maintaining and managing communications services, security and surveillance: these include person location data, which can be collected from GPS devices, cellular phone usage and WiFi hotspots. In particular, with the increasing availability and popularity of embedded GPS receivers into personal devices and the ability to locate cellular phone users from their interactions with network antennas [], new opportunities arise for gaining knowledge about person movement behavior. An increasing number of researchers has been investigating new ways to mine this wealth of location-based data. Examples include the prediction of the future location of a person [], their mode of transport [] and the identification of individuals from a sample of their location data []. In [] it was shown that there is a high degree of temporal and spatial regularity in human trajectories: people are more likely to visit an area if they have been frequently visited it in the past. Moreover, the time a person returns to a location is very likely to be close to that of his/her previous visits. Thus, © 2015 Rossi et al. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Rossi et al. EPJ Data Science (2015) 4:11 given a geographic trajectory, i.e., a collection of chronologically ordered visited locations, a potential attacker can discover a considerable amount of information about that person, such as their home, place of work, interactions with other people and visits to sensitive locations. The focus of this work is on location based fingerprinting: the aim is to identify individuals from their movement behavior. As with identifying individuals by the ridges on their finger, the ability to identify them by their mobility traces depends on the uniqueness of the mobility data associated with them. By uniqueness here we mean the extent to which a recorded location in a dataset is shared among different individuals, i.e., the less shared a location is, the more unique it is. Also, as with traditional fingerprinting, some information about the person to be identified needs to have been previously recorded. A recent contribution in this sense is represented by the work of de Montjoye et al. [], where the authors are able to identify users from a small subset of their location records taken from mobile phone service antennas. We would like to underline a major difference between this work and that by de Montjoye et al., as in theirs the training set also includes the points used for the testing and the mobility traces are extracted from mobile operators’ call data records, instead of exact GPS points. In this paper, to the best of our knowledge, we present the first evaluation of the uniqueness of GPS data traces and we show that, with the high spatial and temporal precision of GPS, a small number of mobility points, even not present in the given mobility databases used for classification, is sufficient to accurately identify individuals. More specifically, the contribution of our work is threefold: Firstly, we show that it is possible to identify individuals with great accuracy using various types of movement data such as speed, direction and distance of travel recorded by means of GPS devices. This suggests that additional care is necessary when anonymized data, also not containing exact geographic coordinates, are released to the public. Secondly, we provide an extensive evaluation of the uniqueness of GPS mobility traces by means of three real-world datasets, namely CabSpotting [], CenceMe [] and GeoLife []. We consider both spatial as well as spatio-temporal information, and we show that, in the datasets being investigated, as little as two points are sufficient to uniquely identify nearly all the users. We also evaluate the impact of the dataset size and the precision of the GPS coordinates on the uniqueness of the data. Our findings show that, in some datasets, it is possible to reduce the average uniqueness by means of spatio-temporal coarsening and achieve a given k-anonymity [, ]. Finally, we introduce a simple yet effective technique for the identification of users from location information that are not included in the original dataset used for extracting the user mobility signatures. We also propose a way to measure the extent to which a dataset can resist to an identification attack based on the techniques proposed in this paper. The remainder of this article is organized as follows: Section  describes the datasets used in this study. Section  introduces our framework for the evaluation of the uniqueness of mobility data and the identification of users by means of previously unseen points. Section  presents an extensive experimental evaluation on real-world datasets, and we summarize our main findings in Section . (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1140%2Fepjds%2Fs13688-015-0049-x.pdf
Article home page: https://link.springer.com/article/10.1140/epjds/s13688-015-0049-x

Luca Rossi, James Walker, Mirco Musolesi. Spatio-temporal techniques for user identification by means of GPS mobility data, EPJ Data Science, 2015, pp. 11, Volume 4, Issue 1, DOI: 10.1140/epjds/s13688-015-0049-x