A PDF file should load here. If you do not see its contents
the file may be temporarily unavailable at the journal website
or you do not have a PDF plug-in installed and enabled in your browser.
Alternatively, you can download the file locally and open with any standalone PDF reader:
https://link.springer.com/content/pdf/10.1007%2Fs12061-017-9247-1.pdf
Location Tracing and Potential Risks in Interaction Data Sets
Location Tracing and Potential Risks in Interaction Data Sets
Oliver Duke-Williams 0
0 Department of Information Studies, UCL , London WC1E 6BT , UK
Location-aware mobile phone handsets have become increasingly common in recent years, giving rise to a wide variety of location based services that rely on a person's mobile phone reporting its current location to a remote service provider. Previous research has demonstrated that services that geo-code status updates may permit the estimation of both the rough location of users' home locations and those of their workplaces. The paper investigates the disclosure risks of a priori knowledge of a person's home and workplace locations, or of their current and previous home locations. Detailed interaction data sets published from censuses or other sources are characterised by the sparsity of the contained data, such that unique combinations of two locations may often be observed. In the most detailed 2011 migration data 37% of migrants had a unique combination of origin and destination, whilst in the most detailed journey to work data, 58% of workers had a unique combination of home and workplace. The amount of additional attribute data that might be disclosed is limited. When more coarse geographies are used their still remain a non-trivial number of persons with unique location combinations, with considerably more attributes potentially disclosable.
UK; Census; Interaction data; Disclosure
Introduction
Amongst the outputs from recent UK censuses have been sets of interaction data (also
known as ‘flow data’ or ‘origin-destination data’). In contrast to aggregate census data
which provide information about a defined area (from an entire nation to a small zone)
and microdata which provide individual level observations, census interaction data
*
provide information about people moving between one location and another. The most
common interaction data relating to people are migration data sets and commuting data
sets; where migration data typically report moves between a present residential location
and a former usual residence, and commuting data report on daily journeys between a
residence and a place of work. This paper uses UK examples, although data with the
same structure are available in a number of countries.
Previous research
(Krumm 2007)
has demonstrated that it is possible to estimate the
location of a person’s usual residence by examining anonymously logged data in GPS
units, whilst
Golle and Partridge (2009)
have argued that it also possible to estimate
workplace location for some people, and argued that this would pose a risk for some
previously released data sets. These risk assessments rely on individual level location
trace data. The use of smart phones and other portable devices which can determine – to
varying degrees of accuracy – their current location (and by implication, that of an
owner or user) has become widespread. Such devices allow a wide variety of location
based services to be offered, some running as software on the device itself, and others
running as a remote service. The term location based services has a number of
definitions that are not necessarily consistent (
Küpper 2005
), and also includes many
applications not related to portable devices. Data produced by location based services
may permit service owners or third parties to estimate home or workplace locations of
users. This paper examines the potential disclosure risks to individuals through
publication of UK interaction data sets, analogous to Golle and Partridge’s work on US data
sets and investigates whether the level of risk is similar in the UK data as suggested for
US data, and thus whether UK interaction data are potentially ‘unsafe’. The extent to
which there may be a risk of disclosure is affected by disclosure control procedures
used in conjunction with release of the data. The paper contrasts a number of sets of
interaction data released with different approaches to disclosure control in order to
further explore this issue. The general risk of interaction data are considered, and
possible mitigation strategies in the form of disclosure control arrangements or access
restrictions.
The paper starts by reviewing general observations about the role of confidentiality
and privacy in data released by national statistical agencies. The specific area of UK
interaction data is considered, as these data have particular characteristics that may
increase the risk of disclosure. The methods used by Golle and Partridge to analyse data
from Longitudinal Employer Household Dynamics (LEHD) program are reviewed, and
then applied to a number of data sets produced as outputs from UK Censuses.
Confidentiality
For any statistical agency that intends to release some data, an important consideration
is the preservation of confidentiality relating to those data. The term ‘confidentiality’
refers to preventing disclosure of information to unauthorised parties.
Fellegi (1972 (...truncated)