Can Twitter Be a Source of Information on Allergy? Correlation of Pollen Counts with Tweets Reporting Symptoms of Allergic Rhinoconjunctivitis and Names of Antihistamine Drugs (pdf)

Article PDF cannot be displayed. You can download it here:

https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0133706&type=printable

Can Twitter Be a Source of Information on Allergy? Correlation of Pollen Counts with Tweets Reporting Symptoms of Allergic Rhinoconjunctivitis and Names of Antihistamine Drugs

RESEARCH ARTICLE Can Twitter Be a Source of Information on Allergy? Correlation of Pollen Counts with Tweets Reporting Symptoms of Allergic Rhinoconjunctivitis and Names of Antihistamine Drugs Francesco Gesualdo1*, Giovanni Stilo2, Angelo D’Ambrosio1, Emanuela Carloni1, Elisabetta Pandolfi1, Paola Velardi2, Alessandro Fiocchi1, Alberto E. Tozzi1 1 Multifactorial Disease and Complex Phenotype Research Area, Bambino Gesù Children’s Hospital IRCCS, Rome, Italy, 2 Department of Informatics, “Sapienza” University of Rome, Rome, Italy * Abstract OPEN ACCESS Citation: Gesualdo F, Stilo G, D’Ambrosio A, Carloni E, Pandolfi E, Velardi P, et al. (2015) Can Twitter Be a Source of Information on Allergy? Correlation of Pollen Counts with Tweets Reporting Symptoms of Allergic Rhinoconjunctivitis and Names of Antihistamine Drugs. PLoS ONE 10(7): e0133706. doi:10.1371/journal.pone.0133706 Editor: Tobias Preis, University of Warwick, UNITED KINGDOM Received: October 21, 2014 Accepted: June 30, 2015 Pollen forecasts are in use everywhere to inform therapeutic decisions for patients with allergic rhinoconjunctivitis (ARC). We exploited data derived from Twitter in order to identify tweets reporting a combination of symptoms consistent with a case definition of ARC and those reporting the name of an antihistamine drug. In order to increase the sensitivity of the system, we applied an algorithm aimed at automatically identifying jargon expressions related to medical terms. We compared weekly Twitter trends with National Allergy Bureau weekly pollen counts derived from US stations, and found a high correlation of the sum of the total pollen counts from each stations with tweets reporting ARC symptoms (Pearson’s correlation coefficient: 0.95) and with tweets reporting antihistamine drug names (Pearson’s correlation coefficient: 0.93). Longitude and latitude of the pollen stations affected the strength of the correlation. Twitter and other social networks may play a role in allergic disease surveillance and in signaling drug consumptions trends. Published: July 21, 2015 Copyright: © 2015 Gesualdo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: All relevant data are within the paper and its Supporting Information files. Funding: The authors have no support or funding to report. Competing Interests: The authors have declared that no competing interests exist. Introduction The Internet is increasingly exploited as a source of information on the population’s health. Analysis of media reports [1], search engine queries [2], Wikipedia usage [3] and social networks provide data which may allow to assess and monitor the health status of a population in real time. Twitter is a popular social network, based on the sharing of short messages of up to 140 characters. The potential power of this medium for public health is intrinsic to its nature: people tweet about their personal lives, and sometimes include in their messages information on their health status. The large number of Twitter users (271 million as of June 30th, 2014, PLOS ONE | DOI:10.1371/journal.pone.0133706 July 21, 2015 1 / 11 Allergic Rhinoconjunctivitis on Twitter generating over 500 million daily tweets, https://investor.twitterinc.com/releasedetail.cfm? releaseid=862505) allows to aggregate large amount of data and identify trends in disease prevalence. On the basis of this observation, Twitter has mainly been used as a source of data on infectious diseases [4]. In particular, a number of studies showed a correlation between recurrence of influenza-related terms on Twitter and figures reported by traditional influenza surveillance systems [5–7]. Social networks and other Internet-based means of communication (eg. emails) have previously been investigated as potentially useful media for improving the care of patients affected with allergic diseases [8]. A study by Imonikhe et al. assessed information seeking behaviors of patients affected with allergic conjunctivitis [9]. Moreover, it has been recently shown that temporal variation in regional pollen counts correlates with Google searches for terms related to pollen allergy [10]. To our notice, Twitter has never been studied for allergic disease surveillance. We conducted the present study with the aim of investigating the potential of Twitter as a source of information on allergic disease prevalence, on the basis of the observation that Twitter users affected with allergic rhinoconjunctivitis (ARC) may write tweets including combinations of specific symptoms or names of drugs commonly used to treat this condition. Our objective was to test the reliability of such tweet trends as a proxy for trends of ARC. To this aim, as no official surveillance data for ARC or other allergic diseases is available as a term of comparison, we took into account the correlation of clinical symptoms in allergic patients with aeroallergen level. We therefore investigated the correlation between US pollen counts obtained from the American Academy of Allergy, Asthma & Immunology (AAAAI) and trends of tweets, geolocalized in the US, reporting names of antihistamine drugs and symptoms of allergic rhinoconjunctivitis (ARC). Materials and Methods Twitter data Starting from February 1st, 2013 and until September 30th, 2013, through the available application programmer’s interfaces (APIs, https://dev.twitter.com/docs/streaming-apis) we acquired a sample of the worldwide Twitter traffic, including at least one of 82 symptom-related terms. Such terms were selected as follows: first, we identified 17 singleton terms composing 4 queries based on case definitions (influenza-like illness, cold, gastroenteritis, allergy) adopted by the Influenzanet system (https://www.influenzanet.eu/en/results/?page = help#casedef). Secondarily, we applied to each of these terms an algorithm, which automatically detects naive English words related to specific medical concepts, as described elsewhere [11]. This algorithm allowed us to retrieve 65 additional jargon keywords. Therefore, we used a total of 17 technical keywords + 65 jargon keywords to acquire a sample of tweets which likely correspond to almost 100% of the Twitter traffic with those terms (being up to 1% of the total Twitter traffic). Due to network and hardware problems, it was not possible to collect the data on May 12th, and during the period between the 13th and the 21st of June, 2013. In order to remove duplicates, avoid spam, and focus only on tweets from common users, we processed the data applying the following filters: remove all the copies of tweets appearing more than once in the collection; remove all those tweets that contained hyperlinks. Subsequently, we built a system that allows monitoring collected tweets and producing tim (...truncated)