Pretrained transformer models for predicting the withdrawal of drugs from the market.
Bioinformatics, 2023, 39(8), btad519
https://doi.org/10.1093/bioinformatics/btad519
Advance Access Publication Date: 23 August 2023
Original Paper
Data and text mining
Pretrained transformer models for predicting the
withdrawal of drugs from the market
Eyal Mazuz
1
1,
*, Guy Shtar
1
, Nir Kutsky1, Lior Rokach1, Bracha Shapira1
Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, P.O.B. 653, Beer-Sheva, 8410501, Israel
*Corresponding author. Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
E-mail: (E.M.)
Associate Editor: Jonathan Wren
Abstract
Motivation: The process of drug discovery is notoriously complex, costing an average of 2.6 billion dollars and taking 13 years to bring a new
drug to the market. The success rate for new drugs is alarmingly low (around 0.0001%), and severe adverse drug reactions (ADRs) frequently
occur, some of which may even result in death. Early identification of potential ADRs is critical to improve the efficiency and safety of the drug
development process.
Results: In this study, we employed pretrained large language models (LLMs) to predict the likelihood of a drug being withdrawn from the market due
to safety concerns. Our method achieved an area under the curve (AUC) of over 0.75 through cross-database validation, outperforming classical
machine learning models and graph-based models. Notably, our pretrained LLMs successfully identified over 50% drugs that were subsequently
withdrawn, when predictions were made on a subset of drugs with inconsistent labeling between the training and test sets.
Availability and implementation: The code and datasets are available at https://github.com/eyalmazuz/DrugWithdrawn.
1 Introduction
Today, the cost of developing a single drug exceeds 2.6 billion
dollars. But the investment required is not solely financial, as
it takes an average of around 13 years to bring a drug to market (DiMasi et al. 2016). Identifying a new drug requires over
100 000 candidate compounds, as well as in vitro, in vivo,
and three-phase clinical trials on thousands of subjects.
Approximately 1 out of 10 compounds succeed in clinical trials, which means the success rate is around 0.0001%
(Kapetanovic 2008).
In some cases, a drug that has shown effectiveness in clinical trials and has been approved by the US Food and Drug
Administration (FDA) will have positive results in some
patients, but for others, the drug may result in severe
unwanted side effects, and in the worst case, even death
(Lazarou et al. 1998, Wysowski and Swartz 2005, Ma and Lu
2011). Severe adverse drug reactions (ADRs) occur in 6.2%–
6.7% of hospitalized patients, with over two million cases in
the general population occurring annually in the USA alone.
These ADRs result in 100 000 deaths per year in the US
(Lazarou et al. 1998, Wilkinson 2005). Therefore, many
drugs causing unexpectedly severe ADRs are eventually withdrawn from the market, with considerable impact on the producer, including a loss of revenue and reputation damage. Of
the 548 drugs approved by the FDA between the years 1975
and 1999, 56 (10.2%) received a boxed warning or were
eventually withdrawn from the market, and 20 (3.8%) of the
528 drugs approved between 1990 and 2009 in Canada were
withdrawn for safety reasons (Lasser et al. 2002, Ninan and
Wertheimer 2012).
Identifying safety issues in drugs is a challenging task. Each
clinical trial phase can include up to a few hundred patients.
In the early stages of drug discovery, computational
approaches are used to identify potential drug molecules, reducing costs and time. Virtual screening (VS) is a powerful
computational approach in which comprehensive libraries of
small molecules are screened for new hits with desired properties that can be further investigated (Shoichet 2004).
Among the various VS approaches, quantitative structureactivity relationship (QSAR) analysis is a ligand-based drug
design method. It attempts to find a statistically significant
correlation between the chemical structure and biological and
toxicological properties based on regression and classification
techniques (Cherkasov et al. 2014). Despite their efficacy in
identifying potential drug candidates, these techniques often
overlook potential safety issues of the drugs. Furthermore, it’s
important to note that QSAR models find applications not
only in the pharmaceutical field but also in industrial chemistry, where they assess ecotoxicity, and in the field of materials
science (Leszczynski 2006, Puzyn et al. 2010).
ADR prediction has received much attention in recent
years. Prior studies utilized diverse information such as biological pathways (Scheiber et al. 2009), chemical-protein
interactions (LaBute et al. 2014), and post-market surveillance data (Tatonetti et al. 2012) to predict ADRs. Despite
their use, most of these types of data are based on
Received: 20 March 2023; Revised: 24 July 2023; Editorial Decision: 20 August 2023; Accepted: 22 August 2023
C The Author(s) 2023. Published by Oxford University Press.
V
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
2
experimental results or post-market reports, which require
significant time and expense to gather and are unavailable in
the early stage of the drug lifecycle. (Whitebread et al. 2005,
Shtar 2021). But the reality is, to predict ADRs for a drug
candidate at an early stage of drug development, predictions
need to be made using just the chemical structure (Pauwels
et al. 2011, Liu et al. 2014).
ADR events can lead to the withdrawal of a drug from the
market, but that is not the only reason for withdrawing a drug;
post-market drug withdrawals may be caused by a variety of
factors, ranging from safety concerns such as reported deaths, to
a wide range of non-safety concerns including the product’s inefficacy and a variety of manufacturing, regulatory, or business
issues (Magro et al. 2012). Onakpoya et al. (2016) reported that
the average time it takes for a drug to be withdrawn from the
market has decreased; however, the method of identifying these
drugs after a serious ADR has not improved in the past 60 years.
In addition, the methodology used to identify previously unknown risks, as well as the delay between the introduction of a
drug and its withdrawal for safety reasons, remain sources of
concern. No gold standard for predicting drug withdrawal has
been described in the literature.
A few in silico methods for predicting drug withdrawal
have been proposed. Onay et al. (2017), who focused on
drugs associated with nervous system diseases, were the first
to distinguish between approved drugs and withdrawn ones.
Their study combined Their study integrated the (...truncated)