Computational Approaches for Pharmacovigilance Signal Detection: Toward Integrated and Semantically-Enriched Frameworks (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007%2Fs40264-015-0278-8.pdf

Computational Approaches for Pharmacovigilance Signal Detection: Toward Integrated and Semantically-Enriched Frameworks

Computational Approaches for Pharmacovigilance Signal Detection: Toward Integrated and Semantically-Enriched Frameworks Vassilis G. Koutkias 0 1 2 Marie-Christine Jaulent 0 1 2 0 V. G. Koutkias M.-C. Jaulent Universite Paris 13 , Sorbonne Paris Cite , UMR_S 1142, LIMICS, 93430 Villetaneuse , France 1 V. G. Koutkias M.-C. Jaulent Sorbonne Universite s, UPMC Univ Paris 06, UMR_S 1142, LIMICS , 75006 Paris , France 2 V. G. Koutkias (&) M.-C. Jaulent INSERM, U1142, LIMICS, Campus des Cordeliers , 15 rue de l' E cole de Me decine, 75006 Paris , France Computational signal detection constitutes a key element of postmarketing drug monitoring and surveillance. Diverse data sources are considered within the 'search space' of pharmacovigilance scientists, and respective data analysis methods are employed, all with their qualities and shortcomings, towards more timely and accurate signal detection. Recent systematic comparative studies highlighted not only event-based and data-source-based differential performance across methods but also their complementarity. These findings reinforce the arguments for exploiting all possible information sources for drug safety and the parallel use of multiple signal detection methods. Combinatorial signal detection has been pursued in few studies up to now, employing a rather limited number of methods and data sources but illustrating well-promising outcomes. However, the large-scale realization of this approach requires systematic frameworks to address the challenges of the concurrent analysis setting. In this paper, we argue that semantic technologies provide the means to address some of these challenges, and we particularly highlight their contribution in (a) annotating data sources and analysis methods with quality attributes to facilitate their selection given the analysis scope; (b) consistently defining study parameters such as health outcomes and drugs of interest, and providing guidance for study setup; (c) expressing analysis outcomes in a common format enabling data sharing and systematic comparisons; and (d) assessing/supporting the novelty of the aggregated outcomes through access to reference knowledge sources related to drug safety. A semantically-enriched framework can facilitate seamless access and use of different data sources and computational methods in an integrated fashion, bringing a new perspective for large-scale, knowledge-intensive signal detection. 1 Introduction One of the most important aspects of marketed-drug safety monitoring is the identification and analysis of new, medically important findings (so-called signals) that might influence the use of a medicine [1]. According to the CIOMS VIII Working Group, a signal constitutes information that arises from one or multiple sources (including observations and experiments), which suggests a new potentially causal association, or a new aspect of a known association, between an intervention and an event or set of related events, either adverse or beneficial, that is judged to be of sufficient likelihood to justify verificatory action [2]. Computational analysis methods constitute an important tool for signal detection [3, 4]. Lately, the field of signal detection has been very active, with various large-scale collaborative initiatives and projects, such as EU-ADR (http://euadr-project.org/), Mini-Sentinel (http://www. mini-sentinel.org/), OMOP (http://omop.org/), and PROTECT (http://www.imi-protect.eu/). While various advances have been illustrated, e.g. common data models [5], reference datasets for evaluation [6], as well as new analysis methods and systematic empirical assessments [7 12], the challenge of accurate, timely and evidence-based signal detection still remains [13]. In this paper, we first present a brief overview of postmarketing data sources and computational analysis methods, and highlight their strengths and limitations for signal detection, taking into account recent comparative studies. Under this perspective, we indicate the need for combinatorial signal detection, relying on the concurrent exploitation of diverse data sources and detection methods, and refer to early successful paradigms. We argue that in order to explore combinatorial signal detection in its full potential, semantically-enriched detection frameworks are required to overcome existing barriers. We also illustrate how such a framework can be incorporated within the signal detection workflow, refer to example applications of semantic technologies in drug safety and, finally, discuss this perspective in the scope of large-scale, knowledgeintensive signal detection. 2 Data Sources and Signal Detection Methods: The Need for Combinatorial Exploitation The types of data sources employed for signal detection vary [4]. According to the computational methods adopted/ required for their analysis, we may discriminate the main sources into the following: 1. Spontaneous reporting systems (SRSs) These constitute the dominant signal source through which cases of suspected adverse drug reactions (ADRs) are reported by healthcare professionals or citizens to regulatory authorities or other bodies. Typically, methods for the analysis of SRS data rely on the statistical investigation of disproportionality (DP) [14], or are based on multivariate modeling [3, 4]. A comprehensive review of SRS-based signal detection methods has been presented by Hauben and Bate [15]. Despite SRSs having been quite extensively analyzed, advances on detection methods are still being demonstrated, such as the vigiRank algorithm [16], which combines multiple strength-of-evidence prediction indicators to improve accuracy compared with DP analysis alone. 2. Structured longitudinal observational healthcare databases These are primarily obtained from Electronic Health Record (EHR) and administrative claim systems, and offer the potential to enable active and real-time surveillance [5]. Signal detection methods applied to this type of data typically involve datamining techniques that have their origin from statistical epidemiology [7, 17], e.g. casecontrol methods [8], cohort methods [11], self-controlled case-series methods [10], and self-controlled cohort design methods [7, 9]. Notably, DP-based methods, originally proposed for the analysis of SRS data, have also been applied to observational data [12], following appropriate extensions and data transformations [18]. A comprehensive review of signal detection methods exploiting observational data has been presented by Suling and Pigeot [19]. 3. Unstructured/free-text sources Typical examples include clinical narratives, scientific literature and patient-generated content, e.g. in social media. Extraction of information associating drugs with adverse events from unstructured text requires the employment of text-mining techniques [20]. Clinical narratives are a major part of many clinical information systems and, despite the complexity and barriers in processing clinic (...truncated)