Digital Pharmacovigilance and Disease Surveillance: Combining Traditional and Big-Data Systems for Better Public Health

Journal of Infectious Diseases, Nov 2016

The digital revolution has contributed to very large data sets (ie, big data) relevant for public health. The two major data sources are electronic health records from traditional health systems and patient-generated data. As the two data sources have complementary strengths—high veracity in the data from traditional sources and high velocity and variety in patient-generated data—they can be combined to build more-robust public health systems. However, they also have unique challenges. Patient-generated data in particular are often completely unstructured and highly context dependent, posing essentially a machine-learning challenge. Some recent examples from infectious disease surveillance and adverse drug event monitoring demonstrate that the technical challenges can be solved. Despite these advances, the problem of verification remains, and unless traditional and digital epidemiologic approaches are combined, these data sources will be constrained by their intrinsic limits.

Article PDF cannot be displayed. You can download it here:

https://jid.oxfordjournals.org/content/214/suppl_4/S399.full.pdf

Digital Pharmacovigilance and Disease Surveillance: Combining Traditional and Big-Data Systems for Better Public Health

The Journal of Infectious Diseases SUPPLEMENT ARTICLE Digital Pharmacovigilance and Disease Surveillance: Combining Traditional and Big-Data Systems for Better Public Health Marcel Salathé Digital Epidemiology Laboratory, School of Life Sciences and School of Computer and Communication Sciences, EPFL, Geneva, Switzerland Traditional disease surveillance has been a key ingredient in any public health portfolio for many decades. Disease surveillance is widely recognized as one of the most important tools to assess, predict, and mitigate infectious disease outbreaks. Traditional disease surveillance is based on data collected by health institutions, and the data typically consist of information such as morbidity and mortality data, laboratory reports, individual case reports, field investigations, surveys, and demographic data. They are generally collected by physicians, public health laboratories, hospitals, and other health providers and institutions. The computer revolution that began in the 1970s has affected traditional disease surveillance systems by improving the accessibility of data and by increasing the speed at which data are transmitted between institutions. However, the ongoing Internet and mobile phone revolution has a qualitatively distinct effect: in addition to making epidemiologic data available faster and more broadly, new data are generated directly by the public, often on platforms not primarily designed for health purposes. These data streams of user-generated data are almost always bypassing traditional public health channels. They are the data streams on which digital epidemiology is generally based [1, 2]. Correspondence: M. Salathé, Digital Epidemiology Lab, School of Life Sciences and School of Computer and Communication Sciences, EPFL, Geneva, Switzerland (). The Journal of Infectious Diseases® 2016;214(S4):S399–403 © The Author 2016. Published by Oxford University Press for the Infectious Diseases Society of America. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/ 4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, contact . DOI: 10.1093/infdis/jiw281 One of the first and certainly the most prominent examples of digital disease surveillance was Google Flu Trends [3]. Google Flu Trends was essentially an analytical estimate of the level of weekly influenza activity based on the search queries that Google received. The analytical estimate was derived by a model selected by generating the best fit to the Centers for Disease Control and Prevention’s (CDC’s) influenza-like illness (ILI) data from a number of different US regions. The original model results obtained a mean correlation of 0.9 with the CDC data. A few years later, in summer 2015, Google decided to shut down the public website of Google Flu Trends and instead opted to give select academic and public health institutions access to the data. This announcement followed numerous reports [4–6] that systematically assessed Google Flu Trends’ overestimation of influenza activity, attributing it to a combination of a phenomenon termed “big-data hubris” and algorithm dynamics. The first refers to the assumption that the novel big-data streams are a substitute, rather than a supplement, to traditional data collection efforts. The second refers to the observation that, while the Google search algorithm receives updates on a weekly or even daily basis, the Google Flu Trends model received updates only rarely. This led to a situation where the model did not keep in sync with the changing nature of the data from which it was supposed to generate predictions. Despite the problems of Google Flu Trends, the system was an important example of the promises of digital epidemiology: to use novel data streams, often generated for purposes quite distinct from public health, to extract additional public health signals, such as those relevant for disease surveillance. But while Google makes some search pattern data available through an interface called Google Trends, the raw search-query data Digital Pharmacovigilance and Disease Surveillance • JID 2016:214 (Suppl 4) • S399 The digital revolution has contributed to very large data sets (ie, big data) relevant for public health. The two major data sources are electronic health records from traditional health systems and patient-generated data. As the two data sources have complementary strengths—high veracity in the data from traditional sources and high velocity and variety in patient-generated data—they can be combined to build more-robust public health systems. However, they also have unique challenges. Patient-generated data in particular are often completely unstructured and highly context dependent, posing essentially a machine-learning challenge. Some recent examples from infectious disease surveillance and adverse drug event monitoring demonstrate that the technical challenges can be solved. Despite these advances, the problem of verification remains, and unless traditional and digital epidemiologic approaches are combined, these data sources will be constrained by their intrinsic limits. Keywords. digital epidemiology; disease surveillance; pharmagovigilance; Twitter. S400 • JID 2016:214 (Suppl 4) • Salathé undervaccinated populations. Later work on the same data set investigated how negative and positive sentiments about vaccination spread across the social network, suggesting that negative sentiments are more susceptible to social contagion than positive sentiments [15]. Last but not least, data from most of these services are increasingly generated on mobile phones and other devices, increasing the probability that high-resolution geographic information is associated with the data, a phenomenon that will become increasingly important, given the spatial dynamics of disease spread. DIGITAL PHARMACOVIGILANCE The widespread use of the Internet and of social media in particular has had a dramatic effect not only on infectious disease surveillance, but also on the surveillance of drug use and related events. Perhaps even more so than traditional infectious disease surveillance, traditional surveillance of adverse drug reactions (ADRs) after drug use is slow and patchy. When reported by patients or healthcare professionals, ADRs are typically assessed by drug experts and pharmaceutical companies, and the results are then passed on to government agencies. This leads to substantial data loss and delays. A recent study in the United States showed that hospital staff did not report 86% of ADRs among patients [16]. The rate of underreporting in nonclinical settings is arguably even higher. Once government agencies receive the reports, they often release them with a d (...truncated)


This is a preview of a remote PDF: https://jid.oxfordjournals.org/content/214/suppl_4/S399.full.pdf
Article home page: http://jid.oxfordjournals.org/content/214/suppl_4/S399.abstract

Marcel Salathé. Digital Pharmacovigilance and Disease Surveillance: Combining Traditional and Big-Data Systems for Better Public Health, Journal of Infectious Diseases, 2016, pp. S399-S403, 214/suppl 4, DOI: 10.1093/infdis/jiw281