Understanding Human Mobility from Twitter (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0131469&type=printable

Understanding Human Mobility from Twitter

RESEARCH ARTICLE Understanding Human Mobility from Twitter Raja Jurdak1*, Kun Zhao1, Jiajun Liu1, Maurice AbouJaoude2, Mark Cameron3, David Newth3 1 CSIRO, Brisbane, Australia, 2 American University of Beirut, Beirut, Lebanon, 3 CSIRO, Canberra, Australia * Abstract a11111 OPEN ACCESS Citation: Jurdak R, Zhao K, Liu J, AbouJaoude M, Cameron M, Newth D (2015) Understanding Human Mobility from Twitter. PLoS ONE 10(7): e0131469. doi:10.1371/journal.pone.0131469 Editor: Ye Wu, Beijing University of Posts and Telecommunications, CHINA Received: January 19, 2015 Understanding human mobility is crucial for a broad range of applications from disease prediction to communication networks. Most efforts on studying human mobility have so far used private and low resolution data, such as call data records. Here, we propose Twitter as a proxy for human mobility, as it relies on publicly available data and provides high resolution positioning when users opt to geotag their tweets with their current location. We analyse a Twitter dataset with more than six million geotagged tweets posted in Australia, and we demonstrate that Twitter can be a reliable source for studying human mobility patterns. Our analysis shows that geotagged tweets can capture rich features of human mobility, such as the diversity of movement orbits among individuals and of movements within and between cities. We also find that short- and long-distance movers both spend most of their time in large metropolitan areas, in contrast with intermediate-distance movers’ movements, reflecting the impact of different modes of travel. Our study provides solid evidence that Twitter can indeed be a useful proxy for tracking and predicting human movement. Accepted: June 2, 2015 Published: July 8, 2015 Copyright: © 2015 Jurdak et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: Data have been collected through the public Twitter Stream API (https://dev.twitter.com/overview/api) and are stored at an internal CSIRO repository. Access to the data is restricted due to possible ethical/privacy considerations that enable identification of Geotagged Twitter data. The data used for this analysis cannot be included in the manuscript, supplemental files, or a public repository in adherence with Australia’s National Statement on Ethical Conduct in Human Research. Queries regarding data availability should be directed to Ms. Cathy Pitkin, Manager, Social Responsibility and Ethics, Cathy.Pitkin@csiro. au. Introduction Understanding individual human mobility is of fundamental importance for many applications from urban planning [1] to human and electronic virus prediction [2–4] and traffic and population forecasting [5, 6]. Recent effort has focused on the study of human mobility using new tracking technologies such as mobile phones [7–10], GPS [11–13], Wifi [14, 15], and RFID devices [16, 17]. While these technologies have provided deep insights into human mobility dynamics, their ongoing use for tracking human mobility involves privacy concerns and data access restrictions. Additionally, the use of call data records from cellular phones to track mobility provides low resolution data typically in the order of kilometres, dictated mainly by the distances between cellular towers. Recently, large online systems have been proposed as proxies for providing valuable information on human dynamics [18–23]. For example, the online social networking and microblogging system Twitter, which allows registered users to send and read short text messages called tweets, consists of more than 500 million users posting 340 million tweets per day. Users can opt to geotag their tweet with their current location, thus providing an ideal data source to study human mobility. Geotagged tweets provide high position resolution down to 10 metres PLOS ONE | DOI:10.1371/journal.pone.0131469 July 8, 2015 1 / 16 Understanding Human Mobility from Twitter Funding: The authors have no support or funding to report. Competing Interests: The authors have declared that no competing interests exist. together with a large sample of the population, representing a unique opportunity for studying human mobility dynamics both with high position resolution and at large spatial scales. Despite the data being publicly available and having a large population of users, its representativeness of the underlying mobility dynamics remains an open question. Specifically, there are three open issues with using geotagged tweets for understanding mobility patterns: (1) potential sampling bias; (2) communication modality; and (3) location biases for sending tweets. As a social networking service, the population of Twitter users provides a specific sample of the population where people must have an Internet connection, be relatively tech savvy, and thus typically represent a younger demographic group. While sampling bias is likely to be prevalent for any technology that captures mobility dynamics [24, 25], it is unclear how Twitter’s potential sampling bias affects the mobility patterns of geotagged tweets. Another challenge is that Twitter, unlike previous technologies [26], strictly limits content length within one message. To use Twitter as a proxy for studying human mobility, it is important to understand whether this hard limit on tweet content can impact the spatiotemporal patterns of geotagged tweets. Finally, it is currently unclear whether Twitter users send messages from specific types of locations (such as the home or workplace), and how such preferences to send tweet messages from certain locations can impact the mobility patterns observed from geo-tagged Tweets. This paper analyses a large dataset with 7,811,004 tweets from 156,607 Twitter users from September 2013 to April 2014 in Australia to determine how representative are Twitter-based mobility patterns of population and individual-level movement. We compare the mobility patterns observed through Twitter with the patterns observed through other technologies, such as call data records. Our analysis uses universal indicators for characterising mobility patterns from geotagged tweets, namely the displacement distribution and gyration radius distribution that measures how far individuals typically moves (their spatial orbit). We find that the higher resolution Twitter data reveals multiple modes of human mobility [2] from intra-site to metropolitan and inter-city movements. Our analysis of the time and likelihood of returning to previously visited locations shows that the strict content limit on tweets does not affect the returning patterns, although Twitter users exhibit higher preference than mobile phone users for returning to their most popular location. We also observe that an individual’s spatial orbi (...truncated)