Computational Approaches for Pharmacovigilance Signal Detection: Toward Integrated and Semantically-Enriched Frameworks
Computational Approaches for Pharmacovigilance Signal Detection: Toward Integrated and Semantically-Enriched Frameworks
Vassilis G. Koutkias 0 1 2
Marie-Christine Jaulent 0 1 2
0 V. G. Koutkias M.-C. Jaulent Universite Paris 13 , Sorbonne Paris Cite , UMR_S 1142, LIMICS, 93430 Villetaneuse , France
1 V. G. Koutkias M.-C. Jaulent Sorbonne Universite s, UPMC Univ Paris 06, UMR_S 1142, LIMICS , 75006 Paris , France
2 V. G. Koutkias (&) M.-C. Jaulent INSERM, U1142, LIMICS, Campus des Cordeliers , 15 rue de l' E cole de Me decine, 75006 Paris , France
Computational signal detection constitutes a key element of postmarketing drug monitoring and surveillance. Diverse data sources are considered within the 'search space' of pharmacovigilance scientists, and respective data analysis methods are employed, all with their qualities and shortcomings, towards more timely and accurate signal detection. Recent systematic comparative studies highlighted not only event-based and data-source-based differential performance across methods but also their complementarity. These findings reinforce the arguments for exploiting all possible information sources for drug safety and the parallel use of multiple signal detection methods. Combinatorial signal detection has been pursued in few studies up to now, employing a rather limited number of methods and data sources but illustrating well-promising outcomes. However, the large-scale realization of this approach requires systematic frameworks to address the challenges of the concurrent analysis setting. In this paper, we argue that semantic technologies provide the means to address some of these challenges, and we particularly highlight their contribution in (a) annotating data sources and analysis methods with quality attributes to facilitate their selection given the analysis scope; (b) consistently defining study parameters such as health outcomes and drugs of interest, and providing guidance for study setup; (c) expressing analysis outcomes in a common format enabling data sharing and systematic comparisons; and (d) assessing/supporting the novelty of the aggregated outcomes through access to reference knowledge sources related to drug safety. A semantically-enriched framework can facilitate seamless access and use of different data sources and computational methods in an integrated fashion, bringing a new perspective for large-scale, knowledge-intensive signal detection.
1 Introduction
One of the most important aspects of marketed-drug safety
monitoring is the identification and analysis of new,
medically important findings (so-called signals) that
might influence the use of a medicine [1]. According to the
CIOMS VIII Working Group, a signal constitutes
information that arises from one or multiple sources (including
observations and experiments), which suggests a new
potentially causal association, or a new aspect of a known
association, between an intervention and an event or set of
related events, either adverse or beneficial, that is judged to
be of sufficient likelihood to justify verificatory action [2].
Computational analysis methods constitute an important
tool for signal detection [3, 4]. Lately, the field of signal
detection has been very active, with various large-scale
collaborative initiatives and projects, such as EU-ADR
(http://euadr-project.org/), Mini-Sentinel (http://www.
mini-sentinel.org/), OMOP (http://omop.org/), and
PROTECT (http://www.imi-protect.eu/). While various
advances have been illustrated, e.g. common data models [5],
reference datasets for evaluation [6], as well as new
analysis methods and systematic empirical assessments [7
12], the challenge of accurate, timely and evidence-based
signal detection still remains [13].
In this paper, we first present a brief overview of
postmarketing data sources and computational analysis
methods, and highlight their strengths and limitations for signal
detection, taking into account recent comparative studies.
Under this perspective, we indicate the need for
combinatorial signal detection, relying on the concurrent
exploitation of diverse data sources and detection methods,
and refer to early successful paradigms. We argue that in
order to explore combinatorial signal detection in its full
potential, semantically-enriched detection frameworks are
required to overcome existing barriers. We also illustrate
how such a framework can be incorporated within the
signal detection workflow, refer to example applications of
semantic technologies in drug safety and, finally, discuss
this perspective in the scope of large-scale,
knowledgeintensive signal detection.
2 Data Sources and Signal Detection Methods: The
Need for Combinatorial Exploitation
The types of data sources employed for signal detection
vary [4]. According to the computational methods adopted/
required for their analysis, we may discriminate the main
sources into the following:
1. Spontaneous reporting systems (SRSs) These
constitute the dominant signal source through which cases of
suspected adverse drug reactions (ADRs) are reported
by healthcare professionals or citizens to regulatory
authorities or other bodies. Typically, methods for
the analysis of SRS data rely on the statistical
investigation of disproportionality (DP) [14], or are
based on multivariate modeling [3, 4]. A
comprehensive review of SRS-based signal detection methods has
been presented by Hauben and Bate [15]. Despite
SRSs having been quite extensively analyzed,
advances on detection methods are still being
demonstrated, such as the vigiRank algorithm [16], which
combines multiple strength-of-evidence prediction
indicators to improve accuracy compared with DP
analysis alone.
2. Structured longitudinal observational healthcare
databases These are primarily obtained from
Electronic Health Record (EHR) and administrative claim
systems, and offer the potential to enable active and
real-time surveillance [5]. Signal detection methods
applied to this type of data typically involve
datamining techniques that have their origin from
statistical epidemiology [7, 17], e.g. casecontrol methods
[8], cohort methods [11], self-controlled case-series
methods [10], and self-controlled cohort design
methods [7, 9]. Notably, DP-based methods, originally
proposed for the analysis of SRS data, have also been
applied to observational data [12], following
appropriate extensions and data transformations [18]. A
comprehensive review of signal detection methods
exploiting observational data has been presented by
Suling and Pigeot [19].
3. Unstructured/free-text sources Typical examples
include clinical narratives, scientific literature and
patient-generated content, e.g. in social media.
Extraction of information associating drugs with adverse
events from unstructured text requires the employment
of text-mining techniques [20]. Clinical narratives are
a major part of many clinical information systems and,
despite the complexity and barriers in processing
clinic (...truncated)