Pretrained transformer models for predicting the withdrawal of drugs from the market.

Bioinformatics, Aug 2023

The process of drug discovery is notoriously complex, costing an average of 2.6 billion dollars and taking ∼13 years to bring a new drug to the market. The success rate for new drugs is alarmingly low (around 0.0001%), and severe adverse ...

Article PDF cannot be displayed. You can download it here:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10469107/pdf/

Pretrained transformer models for predicting the withdrawal of drugs from the market.

Bioinformatics, 2023, 39(8), btad519 https://doi.org/10.1093/bioinformatics/btad519 Advance Access Publication Date: 23 August 2023 Original Paper Data and text mining Pretrained transformer models for predicting the withdrawal of drugs from the market Eyal Mazuz 1 1, *, Guy Shtar 1 , Nir Kutsky1, Lior Rokach1, Bracha Shapira1 Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, P.O.B. 653, Beer-Sheva, 8410501, Israel *Corresponding author. Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel. E-mail: (E.M.) Associate Editor: Jonathan Wren Abstract Motivation: The process of drug discovery is notoriously complex, costing an average of 2.6 billion dollars and taking 13 years to bring a new drug to the market. The success rate for new drugs is alarmingly low (around 0.0001%), and severe adverse drug reactions (ADRs) frequently occur, some of which may even result in death. Early identification of potential ADRs is critical to improve the efficiency and safety of the drug development process. Results: In this study, we employed pretrained large language models (LLMs) to predict the likelihood of a drug being withdrawn from the market due to safety concerns. Our method achieved an area under the curve (AUC) of over 0.75 through cross-database validation, outperforming classical machine learning models and graph-based models. Notably, our pretrained LLMs successfully identified over 50% drugs that were subsequently withdrawn, when predictions were made on a subset of drugs with inconsistent labeling between the training and test sets. Availability and implementation: The code and datasets are available at https://github.com/eyalmazuz/DrugWithdrawn. 1 Introduction Today, the cost of developing a single drug exceeds 2.6 billion dollars. But the investment required is not solely financial, as it takes an average of around 13 years to bring a drug to market (DiMasi et al. 2016). Identifying a new drug requires over 100 000 candidate compounds, as well as in vitro, in vivo, and three-phase clinical trials on thousands of subjects. Approximately 1 out of 10 compounds succeed in clinical trials, which means the success rate is around 0.0001% (Kapetanovic 2008). In some cases, a drug that has shown effectiveness in clinical trials and has been approved by the US Food and Drug Administration (FDA) will have positive results in some patients, but for others, the drug may result in severe unwanted side effects, and in the worst case, even death (Lazarou et al. 1998, Wysowski and Swartz 2005, Ma and Lu 2011). Severe adverse drug reactions (ADRs) occur in 6.2%– 6.7% of hospitalized patients, with over two million cases in the general population occurring annually in the USA alone. These ADRs result in 100 000 deaths per year in the US (Lazarou et al. 1998, Wilkinson 2005). Therefore, many drugs causing unexpectedly severe ADRs are eventually withdrawn from the market, with considerable impact on the producer, including a loss of revenue and reputation damage. Of the 548 drugs approved by the FDA between the years 1975 and 1999, 56 (10.2%) received a boxed warning or were eventually withdrawn from the market, and 20 (3.8%) of the 528 drugs approved between 1990 and 2009 in Canada were withdrawn for safety reasons (Lasser et al. 2002, Ninan and Wertheimer 2012). Identifying safety issues in drugs is a challenging task. Each clinical trial phase can include up to a few hundred patients. In the early stages of drug discovery, computational approaches are used to identify potential drug molecules, reducing costs and time. Virtual screening (VS) is a powerful computational approach in which comprehensive libraries of small molecules are screened for new hits with desired properties that can be further investigated (Shoichet 2004). Among the various VS approaches, quantitative structureactivity relationship (QSAR) analysis is a ligand-based drug design method. It attempts to find a statistically significant correlation between the chemical structure and biological and toxicological properties based on regression and classification techniques (Cherkasov et al. 2014). Despite their efficacy in identifying potential drug candidates, these techniques often overlook potential safety issues of the drugs. Furthermore, it’s important to note that QSAR models find applications not only in the pharmaceutical field but also in industrial chemistry, where they assess ecotoxicity, and in the field of materials science (Leszczynski 2006, Puzyn et al. 2010). ADR prediction has received much attention in recent years. Prior studies utilized diverse information such as biological pathways (Scheiber et al. 2009), chemical-protein interactions (LaBute et al. 2014), and post-market surveillance data (Tatonetti et al. 2012) to predict ADRs. Despite their use, most of these types of data are based on Received: 20 March 2023; Revised: 24 July 2023; Editorial Decision: 20 August 2023; Accepted: 22 August 2023 C The Author(s) 2023. Published by Oxford University Press. V This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 2 experimental results or post-market reports, which require significant time and expense to gather and are unavailable in the early stage of the drug lifecycle. (Whitebread et al. 2005, Shtar 2021). But the reality is, to predict ADRs for a drug candidate at an early stage of drug development, predictions need to be made using just the chemical structure (Pauwels et al. 2011, Liu et al. 2014). ADR events can lead to the withdrawal of a drug from the market, but that is not the only reason for withdrawing a drug; post-market drug withdrawals may be caused by a variety of factors, ranging from safety concerns such as reported deaths, to a wide range of non-safety concerns including the product’s inefficacy and a variety of manufacturing, regulatory, or business issues (Magro et al. 2012). Onakpoya et al. (2016) reported that the average time it takes for a drug to be withdrawn from the market has decreased; however, the method of identifying these drugs after a serious ADR has not improved in the past 60 years. In addition, the methodology used to identify previously unknown risks, as well as the delay between the introduction of a drug and its withdrawal for safety reasons, remain sources of concern. No gold standard for predicting drug withdrawal has been described in the literature. A few in silico methods for predicting drug withdrawal have been proposed. Onay et al. (2017), who focused on drugs associated with nervous system diseases, were the first to distinguish between approved drugs and withdrawn ones. Their study combined Their study integrated the (...truncated)


This is a preview of a remote PDF: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10469107/pdf/
Article home page: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10469107

E. Mazuz, G. Shtar, N. Kutsky, L. Rokach, B. Shapira. Pretrained transformer models for predicting the withdrawal of drugs from the market., Bioinformatics, 2023, Volume 39, Issue 8, DOI: 10.1093/bioinformatics/btad519