Location Tracing and Potential Risks in Interaction Data Sets

Applied Spatial Analysis and Policy, Dec 2017

Location-aware mobile phone handsets have become increasingly common in recent years, giving rise to a wide variety of location based services that rely on a person’s mobile phone reporting its current location to a remote service provider. Previous research has demonstrated that services that geo-code status updates may permit the estimation of both the rough location of users’ home locations and those of their workplaces. The paper investigates the disclosure risks of a priori knowledge of a person’s home and workplace locations, or of their current and previous home locations. Detailed interaction data sets published from censuses or other sources are characterised by the sparsity of the contained data, such that unique combinations of two locations may often be observed. In the most detailed 2011 migration data 37% of migrants had a unique combination of origin and destination, whilst in the most detailed journey to work data, 58% of workers had a unique combination of home and workplace. The amount of additional attribute data that might be disclosed is limited. When more coarse geographies are used their still remain a non-trivial number of persons with unique location combinations, with considerably more attributes potentially disclosable.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs12061-017-9247-1.pdf

Location Tracing and Potential Risks in Interaction Data Sets

Location Tracing and Potential Risks in Interaction Data Sets Oliver Duke-Williams 0 0 Department of Information Studies, UCL , London WC1E 6BT , UK Location-aware mobile phone handsets have become increasingly common in recent years, giving rise to a wide variety of location based services that rely on a person's mobile phone reporting its current location to a remote service provider. Previous research has demonstrated that services that geo-code status updates may permit the estimation of both the rough location of users' home locations and those of their workplaces. The paper investigates the disclosure risks of a priori knowledge of a person's home and workplace locations, or of their current and previous home locations. Detailed interaction data sets published from censuses or other sources are characterised by the sparsity of the contained data, such that unique combinations of two locations may often be observed. In the most detailed 2011 migration data 37% of migrants had a unique combination of origin and destination, whilst in the most detailed journey to work data, 58% of workers had a unique combination of home and workplace. The amount of additional attribute data that might be disclosed is limited. When more coarse geographies are used their still remain a non-trivial number of persons with unique location combinations, with considerably more attributes potentially disclosable. UK; Census; Interaction data; Disclosure Introduction Amongst the outputs from recent UK censuses have been sets of interaction data (also known as ‘flow data’ or ‘origin-destination data’). In contrast to aggregate census data which provide information about a defined area (from an entire nation to a small zone) and microdata which provide individual level observations, census interaction data * provide information about people moving between one location and another. The most common interaction data relating to people are migration data sets and commuting data sets; where migration data typically report moves between a present residential location and a former usual residence, and commuting data report on daily journeys between a residence and a place of work. This paper uses UK examples, although data with the same structure are available in a number of countries. Previous research (Krumm 2007) has demonstrated that it is possible to estimate the location of a person’s usual residence by examining anonymously logged data in GPS units, whilst Golle and Partridge (2009) have argued that it also possible to estimate workplace location for some people, and argued that this would pose a risk for some previously released data sets. These risk assessments rely on individual level location trace data. The use of smart phones and other portable devices which can determine – to varying degrees of accuracy – their current location (and by implication, that of an owner or user) has become widespread. Such devices allow a wide variety of location based services to be offered, some running as software on the device itself, and others running as a remote service. The term location based services has a number of definitions that are not necessarily consistent ( Küpper 2005 ), and also includes many applications not related to portable devices. Data produced by location based services may permit service owners or third parties to estimate home or workplace locations of users. This paper examines the potential disclosure risks to individuals through publication of UK interaction data sets, analogous to Golle and Partridge’s work on US data sets and investigates whether the level of risk is similar in the UK data as suggested for US data, and thus whether UK interaction data are potentially ‘unsafe’. The extent to which there may be a risk of disclosure is affected by disclosure control procedures used in conjunction with release of the data. The paper contrasts a number of sets of interaction data released with different approaches to disclosure control in order to further explore this issue. The general risk of interaction data are considered, and possible mitigation strategies in the form of disclosure control arrangements or access restrictions. The paper starts by reviewing general observations about the role of confidentiality and privacy in data released by national statistical agencies. The specific area of UK interaction data is considered, as these data have particular characteristics that may increase the risk of disclosure. The methods used by Golle and Partridge to analyse data from Longitudinal Employer Household Dynamics (LEHD) program are reviewed, and then applied to a number of data sets produced as outputs from UK Censuses. Confidentiality For any statistical agency that intends to release some data, an important consideration is the preservation of confidentiality relating to those data. The term ‘confidentiality’ refers to preventing disclosure of information to unauthorised parties. Fellegi (1972 (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs12061-017-9247-1.pdf

Oliver Duke-Williams. Location Tracing and Potential Risks in Interaction Data Sets, Applied Spatial Analysis and Policy, 2017, pp. 1-18, DOI: 10.1007/s12061-017-9247-1