Can Twitter Be a Source of Information on Allergy? Correlation of Pollen Counts with Tweets Reporting Symptoms of Allergic Rhinoconjunctivitis and Names of Antihistamine Drugs
RESEARCH ARTICLE
Can Twitter Be a Source of Information on
Allergy? Correlation of Pollen Counts with
Tweets Reporting Symptoms of Allergic
Rhinoconjunctivitis and Names of
Antihistamine Drugs
Francesco Gesualdo1*, Giovanni Stilo2, Angelo D’Ambrosio1, Emanuela Carloni1,
Elisabetta Pandolfi1, Paola Velardi2, Alessandro Fiocchi1, Alberto E. Tozzi1
1 Multifactorial Disease and Complex Phenotype Research Area, Bambino Gesù Children’s Hospital IRCCS,
Rome, Italy, 2 Department of Informatics, “Sapienza” University of Rome, Rome, Italy
*
Abstract
OPEN ACCESS
Citation: Gesualdo F, Stilo G, D’Ambrosio A, Carloni
E, Pandolfi E, Velardi P, et al. (2015) Can Twitter Be a
Source of Information on Allergy? Correlation of
Pollen Counts with Tweets Reporting Symptoms of
Allergic Rhinoconjunctivitis and Names of
Antihistamine Drugs. PLoS ONE 10(7): e0133706.
doi:10.1371/journal.pone.0133706
Editor: Tobias Preis, University of Warwick, UNITED
KINGDOM
Received: October 21, 2014
Accepted: June 30, 2015
Pollen forecasts are in use everywhere to inform therapeutic decisions for patients with
allergic rhinoconjunctivitis (ARC). We exploited data derived from Twitter in order to identify
tweets reporting a combination of symptoms consistent with a case definition of ARC and
those reporting the name of an antihistamine drug. In order to increase the sensitivity of the
system, we applied an algorithm aimed at automatically identifying jargon expressions
related to medical terms. We compared weekly Twitter trends with National Allergy Bureau
weekly pollen counts derived from US stations, and found a high correlation of the sum of
the total pollen counts from each stations with tweets reporting ARC symptoms (Pearson’s
correlation coefficient: 0.95) and with tweets reporting antihistamine drug names (Pearson’s
correlation coefficient: 0.93). Longitude and latitude of the pollen stations affected the
strength of the correlation. Twitter and other social networks may play a role in allergic
disease surveillance and in signaling drug consumptions trends.
Published: July 21, 2015
Copyright: © 2015 Gesualdo et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are
credited.
Data Availability Statement: All relevant data are
within the paper and its Supporting Information files.
Funding: The authors have no support or funding to
report.
Competing Interests: The authors have declared
that no competing interests exist.
Introduction
The Internet is increasingly exploited as a source of information on the population’s health.
Analysis of media reports [1], search engine queries [2], Wikipedia usage [3] and social networks provide data which may allow to assess and monitor the health status of a population in
real time.
Twitter is a popular social network, based on the sharing of short messages of up to 140
characters. The potential power of this medium for public health is intrinsic to its nature: people tweet about their personal lives, and sometimes include in their messages information on
their health status. The large number of Twitter users (271 million as of June 30th, 2014,
PLOS ONE | DOI:10.1371/journal.pone.0133706 July 21, 2015
1 / 11
Allergic Rhinoconjunctivitis on Twitter
generating over 500 million daily tweets, https://investor.twitterinc.com/releasedetail.cfm?
releaseid=862505) allows to aggregate large amount of data and identify trends in disease
prevalence.
On the basis of this observation, Twitter has mainly been used as a source of data on infectious diseases [4]. In particular, a number of studies showed a correlation between recurrence
of influenza-related terms on Twitter and figures reported by traditional influenza surveillance
systems [5–7].
Social networks and other Internet-based means of communication (eg. emails) have previously been investigated as potentially useful media for improving the care of patients affected
with allergic diseases [8]. A study by Imonikhe et al. assessed information seeking behaviors of
patients affected with allergic conjunctivitis [9]. Moreover, it has been recently shown that
temporal variation in regional pollen counts correlates with Google searches for terms related
to pollen allergy [10]. To our notice, Twitter has never been studied for allergic disease
surveillance.
We conducted the present study with the aim of investigating the potential of Twitter as a
source of information on allergic disease prevalence, on the basis of the observation that
Twitter users affected with allergic rhinoconjunctivitis (ARC) may write tweets including combinations of specific symptoms or names of drugs commonly used to treat this condition. Our
objective was to test the reliability of such tweet trends as a proxy for trends of ARC. To this
aim, as no official surveillance data for ARC or other allergic diseases is available as a term
of comparison, we took into account the correlation of clinical symptoms in allergic patients
with aeroallergen level. We therefore investigated the correlation between US pollen counts
obtained from the American Academy of Allergy, Asthma & Immunology (AAAAI) and
trends of tweets, geolocalized in the US, reporting names of antihistamine drugs and symptoms
of allergic rhinoconjunctivitis (ARC).
Materials and Methods
Twitter data
Starting from February 1st, 2013 and until September 30th, 2013, through the available application programmer’s interfaces (APIs, https://dev.twitter.com/docs/streaming-apis) we acquired
a sample of the worldwide Twitter traffic, including at least one of 82 symptom-related terms.
Such terms were selected as follows: first, we identified 17 singleton terms composing 4 queries
based on case definitions (influenza-like illness, cold, gastroenteritis, allergy) adopted by the
Influenzanet system (https://www.influenzanet.eu/en/results/?page = help#casedef). Secondarily, we applied to each of these terms an algorithm, which automatically detects naive English
words related to specific medical concepts, as described elsewhere [11]. This algorithm allowed
us to retrieve 65 additional jargon keywords. Therefore, we used a total of 17 technical keywords + 65 jargon keywords to acquire a sample of tweets which likely correspond to almost
100% of the Twitter traffic with those terms (being up to 1% of the total Twitter traffic). Due to
network and hardware problems, it was not possible to collect the data on May 12th, and during
the period between the 13th and the 21st of June, 2013.
In order to remove duplicates, avoid spam, and focus only on tweets from common users,
we processed the data applying the following filters: remove all the copies of tweets appearing
more than once in the collection; remove all those tweets that contained hyperlinks. Subsequently, we built a system that allows monitoring collected tweets and producing tim (...truncated)