Forecasting influenza-like illness dynamics for military populations using neural networks and social media

PLOS ONE, Dec 2017

This work is the first to take advantage of recurrent neural networks to predict influenza-like illness (ILI) dynamics from various linguistic signals extracted from social media data. Unlike other approaches that rely on timeseries analysis of historical ILI data and the state-of-the-art machine learning models, we build and evaluate the predictive power of neural network architectures based on Long Short Term Memory (LSTMs) units capable of nowcasting (predicting in “real-time”) and forecasting (predicting the future) ILI dynamics in the 2011 – 2014 influenza seasons. To build our models we integrate information people post in social media e.g., topics, embeddings, word ngrams, stylistic patterns, and communication behavior using hashtags and mentions. We then quantitatively evaluate the predictive power of different social media signals and contrast the performance of the-state-of-the-art regression models with neural networks using a diverse set of evaluation metrics. Finally, we combine ILI and social media signals to build a joint neural network model for ILI dynamics prediction. Unlike the majority of the existing work, we specifically focus on developing models for local rather than national ILI surveillance, specifically for military rather than general populations in 26 U.S. and six international locations., and analyze how model performance depends on the amount of social media data available per location. Our approach demonstrates several advantages: (a) Neural network architectures that rely on LSTM units trained on social media data yield the best performance compared to previously used regression models. (b) Previously under-explored language and communication behavior features are more predictive of ILI dynamics than stylistic and topic signals expressed in social media. (c) Neural network models learned exclusively from social media signals yield comparable or better performance to the models learned from ILI historical data, thus, signals from social media can be potentially used to accurately forecast ILI dynamics for the regions where ILI historical data is not available. (d) Neural network models learned from combined ILI and social media signals significantly outperform models that rely solely on ILI historical data, which adds to a great potential of alternative public sources for ILI dynamics prediction. (e) Location-specific models outperform previously used location-independent models e.g., U.S. only. (f) Prediction results significantly vary across geolocations depending on the amount of social media data available and ILI activity patterns. (g) Model performance improves with more tweets available per geo-location e.g., the error gets lower and the Pearson score gets higher for locations with more tweets.

Forecasting influenza-like illness dynamics for military populations using neural networks and social media

RESEARCH ARTICLE Forecasting influenza-like illness dynamics for military populations using neural networks and social media Svitlana Volkova*, Ellyn Ayton, Katherine Porterfield, Courtney D. Corley Data Sciences and Analytics Group, Computing and Analytics Division, National Security Directorate, Pacific Northwest National Laboratory, Richland, WA, United States of America * a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Volkova S, Ayton E, Porterfield K, Corley CD (2017) Forecasting influenza-like illness dynamics for military populations using neural networks and social media. PLoS ONE 12(12): e0188941. https://doi.org/10.1371/journal. pone.0188941 Editor: Gerardo Chowell, Georgia State University, UNITED STATES Received: February 10, 2017 Accepted: November 15, 2017 Published: December 15, 2017 Copyright: © 2017 Volkova et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: All relevant data are available from the figshare repository at the following DOI: 10.6084/m9.figshare.5632222. Funding: This research was supported by a contract from the Defense Threat Reduction Agency to the Pacific Northwest National Laboratory under contract CB10082, and supported in part by the Deep Learning for Scientific Discovery Initiative at the Pacific Northwest National Laboratory. Abstract This work is the first to take advantage of recurrent neural networks to predict influenza-like illness (ILI) dynamics from various linguistic signals extracted from social media data. Unlike other approaches that rely on timeseries analysis of historical ILI data and the state-of-theart machine learning models, we build and evaluate the predictive power of neural network architectures based on Long Short Term Memory (LSTMs) units capable of nowcasting (predicting in “real-time”) and forecasting (predicting the future) ILI dynamics in the 2011 – 2014 influenza seasons. To build our models we integrate information people post in social media e.g., topics, embeddings, word ngrams, stylistic patterns, and communication behavior using hashtags and mentions. We then quantitatively evaluate the predictive power of different social media signals and contrast the performance of the-state-of-the-art regression models with neural networks using a diverse set of evaluation metrics. Finally, we combine ILI and social media signals to build a joint neural network model for ILI dynamics prediction. Unlike the majority of the existing work, we specifically focus on developing models for local rather than national ILI surveillance, specifically for military rather than general populations in 26 U.S. and six international locations., and analyze how model performance depends on the amount of social media data available per location. Our approach demonstrates several advantages: (a) Neural network architectures that rely on LSTM units trained on social media data yield the best performance compared to previously used regression models. (b) Previously under-explored language and communication behavior features are more predictive of ILI dynamics than stylistic and topic signals expressed in social media. (c) Neural network models learned exclusively from social media signals yield comparable or better performance to the models learned from ILI historical data, thus, signals from social media can be potentially used to accurately forecast ILI dynamics for the regions where ILI historical data is not available. (d) Neural network models learned from combined ILI and social media signals significantly outperform models that rely solely on ILI historical data, which adds to a great potential of alternative public sources for ILI dynamics prediction. (e) Location-specific models outperform previously used location-independent models e.g., U.S. only. (f) Prediction results significantly vary across geolocations depending on the amount of social media data available and ILI activity patterns. (g) Model performance improves with PLOS ONE | https://doi.org/10.1371/journal.pone.0188941 December 15, 2017 1 / 22 Forecasting ILI dynamics using neural networks and social media Competing interests: The authors have declared that no competing interests exist. more tweets available per geo-location e.g., the error gets lower and the Pearson score gets higher for locations with more tweets. Introduction Every year there are 500,000 deaths worldwide attributed to influenza including 30,000 – 50,000 deaths in the US [1]. The Centers for Disease Control and Prevention (CDC) reports weekly on the level of confirmed influenza and influenza-like illnesses (ILI) seen year round in hospitals and by doctor visits that are used to monitor the spread and impact of influenza. However, by the time the CDC data is released, the information is already several weeks old. To overcome this, researchers explored alternative data sources for monitoring influenza and ILI dynamics in real time including web queries [2], Wikipedia logs [3, 4], microblogs [5] and social media platforms, e.g., Twitter [6–9], as a way to enhance predictive ability for health officials when looking at influenza infection rates. Researchers theorized that the most valuable impact alternative data sources [10, 11], e.g., Twitter, can make is by reducing the error in influenza predictions during the weeks the influenza infection rates are under revision by the CDC [7]. Indeed, they have shown through the use of basic linear autoregressive models that a combined model of Twitter and ILI data outperforms a similar model of only ILI data. These promising results give motivation for introducing richer models into this prediction task. The work done by [8] incorporates and experiments with several machine learning ensemble methods to forecast ILI dynamics. Using these models, they are able to accurately predict ILI activity for up to two weeks. However, they only explored basic bag-of-word features extracted from tweets. Similar to [9], we argue that to effectively utilize social media data, the existing natural language processing (NLP) techniques need to be improved or new methods developed in order to extract richer meaning from tweets. Furthermore, researchers advocate the use of Twitter as a way to supplement customary influenza monitoring systems to make accurate predictions [6, 12]. Following prior advances on infectious disease surveillance using social media data [7, 8], we made use of large amounts of public Twitter data – 171M tweets collected between 2011 – 2014. We considered this data as a real-time source of information in order to forecast ILI activity estimates—the total number of people seeking medical attention with ILI symptoms. We specifically focused on military populations and collected ILI activity d (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0188941&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188941

Svitlana Volkova, Ellyn Ayton, Katherine Porterfield, Courtney D. Corley. Forecasting influenza-like illness dynamics for military populations using neural networks and social media, PLOS ONE, 2017, Volume 12, Issue 12, DOI: 10.1371/journal.pone.0188941