Understanding Human Mobility from Twitter
RESEARCH ARTICLE
Understanding Human Mobility from Twitter
Raja Jurdak1*, Kun Zhao1, Jiajun Liu1, Maurice AbouJaoude2, Mark Cameron3,
David Newth3
1 CSIRO, Brisbane, Australia, 2 American University of Beirut, Beirut, Lebanon, 3 CSIRO, Canberra,
Australia
*
Abstract
a11111
OPEN ACCESS
Citation: Jurdak R, Zhao K, Liu J, AbouJaoude M,
Cameron M, Newth D (2015) Understanding Human
Mobility from Twitter. PLoS ONE 10(7): e0131469.
doi:10.1371/journal.pone.0131469
Editor: Ye Wu, Beijing University of Posts and
Telecommunications, CHINA
Received: January 19, 2015
Understanding human mobility is crucial for a broad range of applications from disease prediction to communication networks. Most efforts on studying human mobility have so far
used private and low resolution data, such as call data records. Here, we propose Twitter as
a proxy for human mobility, as it relies on publicly available data and provides high resolution positioning when users opt to geotag their tweets with their current location. We analyse
a Twitter dataset with more than six million geotagged tweets posted in Australia, and we
demonstrate that Twitter can be a reliable source for studying human mobility patterns. Our
analysis shows that geotagged tweets can capture rich features of human mobility, such as
the diversity of movement orbits among individuals and of movements within and between
cities. We also find that short- and long-distance movers both spend most of their time in
large metropolitan areas, in contrast with intermediate-distance movers’ movements,
reflecting the impact of different modes of travel. Our study provides solid evidence that
Twitter can indeed be a useful proxy for tracking and predicting human movement.
Accepted: June 2, 2015
Published: July 8, 2015
Copyright: © 2015 Jurdak et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are
credited.
Data Availability Statement: Data have been
collected through the public Twitter Stream API
(https://dev.twitter.com/overview/api) and are stored
at an internal CSIRO repository. Access to the data is
restricted due to possible ethical/privacy
considerations that enable identification of Geotagged Twitter data. The data used for this analysis
cannot be included in the manuscript, supplemental
files, or a public repository in adherence with
Australia’s National Statement on Ethical Conduct in
Human Research. Queries regarding data availability
should be directed to Ms. Cathy Pitkin, Manager,
Social Responsibility and Ethics, Cathy.Pitkin@csiro.
au.
Introduction
Understanding individual human mobility is of fundamental importance for many applications from urban planning [1] to human and electronic virus prediction [2–4] and traffic and
population forecasting [5, 6]. Recent effort has focused on the study of human mobility using
new tracking technologies such as mobile phones [7–10], GPS [11–13], Wifi [14, 15], and
RFID devices [16, 17]. While these technologies have provided deep insights into human
mobility dynamics, their ongoing use for tracking human mobility involves privacy concerns
and data access restrictions. Additionally, the use of call data records from cellular phones to
track mobility provides low resolution data typically in the order of kilometres, dictated mainly
by the distances between cellular towers.
Recently, large online systems have been proposed as proxies for providing valuable information on human dynamics [18–23]. For example, the online social networking and microblogging system Twitter, which allows registered users to send and read short text messages
called tweets, consists of more than 500 million users posting 340 million tweets per day. Users
can opt to geotag their tweet with their current location, thus providing an ideal data source to
study human mobility. Geotagged tweets provide high position resolution down to 10 metres
PLOS ONE | DOI:10.1371/journal.pone.0131469 July 8, 2015
1 / 16
Understanding Human Mobility from Twitter
Funding: The authors have no support or funding to
report.
Competing Interests: The authors have declared
that no competing interests exist.
together with a large sample of the population, representing a unique opportunity for studying
human mobility dynamics both with high position resolution and at large spatial scales.
Despite the data being publicly available and having a large population of users, its representativeness of the underlying mobility dynamics remains an open question. Specifically, there
are three open issues with using geotagged tweets for understanding mobility patterns: (1)
potential sampling bias; (2) communication modality; and (3) location biases for sending
tweets. As a social networking service, the population of Twitter users provides a specific sample of the population where people must have an Internet connection, be relatively tech savvy,
and thus typically represent a younger demographic group. While sampling bias is likely to be
prevalent for any technology that captures mobility dynamics [24, 25], it is unclear how Twitter’s potential sampling bias affects the mobility patterns of geotagged tweets. Another challenge is that Twitter, unlike previous technologies [26], strictly limits content length within one
message. To use Twitter as a proxy for studying human mobility, it is important to understand
whether this hard limit on tweet content can impact the spatiotemporal patterns of geotagged
tweets. Finally, it is currently unclear whether Twitter users send messages from specific types
of locations (such as the home or workplace), and how such preferences to send tweet messages
from certain locations can impact the mobility patterns observed from geo-tagged Tweets.
This paper analyses a large dataset with 7,811,004 tweets from 156,607 Twitter users from
September 2013 to April 2014 in Australia to determine how representative are Twitter-based
mobility patterns of population and individual-level movement. We compare the mobility patterns observed through Twitter with the patterns observed through other technologies, such as
call data records. Our analysis uses universal indicators for characterising mobility patterns
from geotagged tweets, namely the displacement distribution and gyration radius distribution
that measures how far individuals typically moves (their spatial orbit). We find that the higher
resolution Twitter data reveals multiple modes of human mobility [2] from intra-site to metropolitan and inter-city movements. Our analysis of the time and likelihood of returning to previously visited locations shows that the strict content limit on tweets does not affect the
returning patterns, although Twitter users exhibit higher preference than mobile phone users
for returning to their most popular location. We also observe that an individual’s spatial orbi (...truncated)