Structure-inclusive similarity based directed GNN: a method that can control information flow to predict drug–target binding affinity (pdf)

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/bioinformatics/article-pdf/40/10/btae563/59756457/btae563.pdf

Structure-inclusive similarity based directed GNN: a method that can control information flow to predict drug–target binding affinity

Bioinformatics, 2024, 40(10), btae563 https://doi.org/10.1093/bioinformatics/btae563 Advance Access Publication Date: 18 September 2024 Original Paper Data and text mining Jipeng Huang 1,2,3,†, Chang Sun Jin-Mao Wei1,2,� 1,2,3,† , Minglei Li1,2,3, Rong Tang1,2,3, Bin Xie4, Shuqin Wang5,�, 1 Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China College of Computer Science, Nankai University, Tianjin 300071, China 3 Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China 4 College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China 5 College of Computer and Information Engineering, Tianjin Normal University, Tianjin, Xi Qing District 300387, China 2 �Corresponding authors. College of Computer and Information Engineering, Tianjin Normal University, No.393, Extension of Bin Shui West Road, Tianjin, Xi Qing District 300387, China. E-mail: (S.W.); Centre for Bioinformatics and Intelligent Medicine, Nankai University, No. 38 Tongyan Road, Tianjin 300071, China. E-mail: (J.M.W.) † Equal contribution. Associate Editor: Jonathan Wren Abstract Motivation: Exploring the association between drugs and targets is essential for drug discovery and repurposing. Comparing with the tradi tional methods that regard the exploration as a binary classification task, predicting the drug–target binding affinity can provide more specific in formation. Many studies work based on the assumption that similar drugs may interact with the same target. These methods constructed a symmetric graph according to the undirected drug similarity or target similarity. Although these similarities can measure the difference between two molecules, it is unable to analyze the inclusion relationship of their substructure. For example, if drug A contains all the substructures of drug B, then in the message-passing mechanism of the graph neural network, drug A should acquire all the properties of drug B, while drug B should only obtain some of the properties of A. Results: To this end, we proposed a structure-inclusive similarity (SIS) which measures the similarity of two drugs by considering the inclusion relationship of their substructures. Based on SIS, we constructed a drug graph and a target graph, respectively, and predicted the binding affini ties between drugs and targets by a graph convolutional network-based model. Experimental results show that considering the inclusion rela tionship of the substructure of two molecules can effectively improve the accuracy of the prediction model. The performance of our SIS-based prediction method outperforms several state-of-the-art methods for drug–target binding affinity prediction. The case studies demonstrate that our model is a practical tool to predict the binding affinity between drugs and targets. Availability and implementation: Source codes and data are available at https://github.com/HuangStomach/SISDTA. 1 Introduction Drug development usually involves steps such as molecular design, preclinical studies, and clinical trials, which are time-consuming and costly. Studies have shown that the av erage success rate for developing a new molecular entity is only 2.01% and that the clinical development cycle takes an average of 13.9 years (Yeu et al. 2015). To reduce the cost of drug development, biologists try to find new indications for approved drugs (i.e. drug repurposing). Nearly 70 exist ing FDA-approved drugs are currently being investigated to see if they can be repurposed to treat COVID-19 (Gordon et al. 2020). The short cycle time and low cost of drug repurposing shorten the review process and have significant implications for drug development. One of the key steps in drug repurposing is the identification of drug−target relationships. Computational methods are considered for predicting reli able drug−protein interactions to reduce the workload of subsequent experiments for drug−target relationship identifi cation. With the continuous development of biological big data, machine learning methods have been applied to various practical tasks in the field of bioinformatics. Determining the new drug−target relationship with computational methods has become a research hotspot in the field of drug develop ment (Eisenstein 2022). Drug−target relationship prediction algorithms can be broadly classified into two categories. One class focuses on drug–target interaction (DTI) prediction, which treats the prediction task as a binary classification task (Li et al. 2022, Sun et al. 2022). For example, AEFS (Sun et al. 2021) tried to predict DTIs by maintaining the consistency between drug properties and their functions with a multilayer encoder. Luo Received: 19 March 2024; Revised: 21 May 2024; Editorial Decision: 11 September 2024; Accepted: 17 September 2024 © The Author(s) 2024. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Structure-inclusive similarity based directed GNN: a method that can control information flow to predict drug–target binding affinity 2 can represent the respective drug nodes. These substructures of one drug but not the other drug node contain the informa tion that should be regarded as noise in the graph model. However, on undirected graphs constructed based on tradi tional similarity, they are also passed to each other with the same weight. A graph model constructed in this way would then have a significant amount of noise passing along the edges, which may affect prediction performance. To explore the difficulty of intermolecular binding in more detail, we focus on predicting the more informative DTA. The biological activity of a molecule or any of its other prop erties can be explained by its substructures (Nicolotti et al. 2011). Certain privileged structures can even determine whether or not they will interact with certain targets (DeSimone et al. 2004). Considering the inclusion and com patibility relationship between molecular structures, we pro pose an SIS-based directed GNN model. Unlike other works, the similarity scores used to construct the graphical model are different in both directions. The scores for two drugs are calculated from the weight of the intersection of their sub structures in all their respective substructures. Drugs with more substructures will have a smaller proportion of intersec tions in all substructures and thus receive a lower score. The more substructures beyond the intersection of a drug, the more likely it is to contain information about biological activ ity that another drug does not have. This information becomes noise when such a drug is linked to other drugs through edges in the graph model. The transmission of this noise is suppressed (...truncated)