Structure-inclusive similarity based directed GNN: a method that can control information flow to predict drug–target binding affinity
Bioinformatics, 2024, 40(10), btae563
https://doi.org/10.1093/bioinformatics/btae563
Advance Access Publication Date: 18 September 2024
Original Paper
Data and text mining
Jipeng Huang 1,2,3,†, Chang Sun
Jin-Mao Wei1,2,�
1,2,3,†
, Minglei Li1,2,3, Rong Tang1,2,3, Bin Xie4, Shuqin Wang5,�,
1
Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
College of Computer Science, Nankai University, Tianjin 300071, China
3
Tianjin Key Laboratory of Network and Data Security, Tianjin 300350, China
4
College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China
5
College of Computer and Information Engineering, Tianjin Normal University, Tianjin, Xi Qing District 300387, China
2
�Corresponding authors. College of Computer and Information Engineering, Tianjin Normal University, No.393, Extension of Bin Shui West Road, Tianjin, Xi
Qing District 300387, China. E-mail: (S.W.); Centre for Bioinformatics and Intelligent Medicine, Nankai University, No. 38 Tongyan
Road, Tianjin 300071, China. E-mail: (J.M.W.)
†
Equal contribution.
Associate Editor: Jonathan Wren
Abstract
Motivation: Exploring the association between drugs and targets is essential for drug discovery and repurposing. Comparing with the tradi
tional methods that regard the exploration as a binary classification task, predicting the drug–target binding affinity can provide more specific in
formation. Many studies work based on the assumption that similar drugs may interact with the same target. These methods constructed a
symmetric graph according to the undirected drug similarity or target similarity. Although these similarities can measure the difference between
two molecules, it is unable to analyze the inclusion relationship of their substructure. For example, if drug A contains all the substructures of
drug B, then in the message-passing mechanism of the graph neural network, drug A should acquire all the properties of drug B, while drug B
should only obtain some of the properties of A.
Results: To this end, we proposed a structure-inclusive similarity (SIS) which measures the similarity of two drugs by considering the inclusion
relationship of their substructures. Based on SIS, we constructed a drug graph and a target graph, respectively, and predicted the binding affini
ties between drugs and targets by a graph convolutional network-based model. Experimental results show that considering the inclusion rela
tionship of the substructure of two molecules can effectively improve the accuracy of the prediction model. The performance of our SIS-based
prediction method outperforms several state-of-the-art methods for drug–target binding affinity prediction. The case studies demonstrate that
our model is a practical tool to predict the binding affinity between drugs and targets.
Availability and implementation: Source codes and data are available at https://github.com/HuangStomach/SISDTA.
1 Introduction
Drug development usually involves steps such as molecular
design, preclinical studies, and clinical trials, which are
time-consuming and costly. Studies have shown that the av
erage success rate for developing a new molecular entity is
only 2.01% and that the clinical development cycle takes an
average of 13.9 years (Yeu et al. 2015). To reduce the cost
of drug development, biologists try to find new indications
for approved drugs (i.e. drug repurposing). Nearly 70 exist
ing FDA-approved drugs are currently being investigated to
see if they can be repurposed to treat COVID-19 (Gordon
et al. 2020). The short cycle time and low cost of drug
repurposing shorten the review process and have significant
implications for drug development. One of the key steps in
drug repurposing is the identification of drug−target
relationships.
Computational methods are considered for predicting reli
able drug−protein interactions to reduce the workload of
subsequent experiments for drug−target relationship identifi
cation. With the continuous development of biological big
data, machine learning methods have been applied to various
practical tasks in the field of bioinformatics. Determining the
new drug−target relationship with computational methods
has become a research hotspot in the field of drug develop
ment (Eisenstein 2022).
Drug−target relationship prediction algorithms can be
broadly classified into two categories. One class focuses on
drug–target interaction (DTI) prediction, which treats the
prediction task as a binary classification task (Li et al. 2022,
Sun et al. 2022). For example, AEFS (Sun et al. 2021) tried to
predict DTIs by maintaining the consistency between drug
properties and their functions with a multilayer encoder. Luo
Received: 19 March 2024; Revised: 21 May 2024; Editorial Decision: 11 September 2024; Accepted: 17 September 2024
© The Author(s) 2024. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Structure-inclusive similarity based directed GNN: a
method that can control information flow to predict
drug–target binding affinity
2
can represent the respective drug nodes. These substructures
of one drug but not the other drug node contain the informa
tion that should be regarded as noise in the graph model.
However, on undirected graphs constructed based on tradi
tional similarity, they are also passed to each other with the
same weight. A graph model constructed in this way would
then have a significant amount of noise passing along the
edges, which may affect prediction performance.
To explore the difficulty of intermolecular binding in more
detail, we focus on predicting the more informative DTA.
The biological activity of a molecule or any of its other prop
erties can be explained by its substructures (Nicolotti et al.
2011). Certain privileged structures can even determine
whether or not they will interact with certain targets
(DeSimone et al. 2004). Considering the inclusion and com
patibility relationship between molecular structures, we pro
pose an SIS-based directed GNN model. Unlike other works,
the similarity scores used to construct the graphical model
are different in both directions. The scores for two drugs are
calculated from the weight of the intersection of their sub
structures in all their respective substructures. Drugs with
more substructures will have a smaller proportion of intersec
tions in all substructures and thus receive a lower score. The
more substructures beyond the intersection of a drug, the
more likely it is to contain information about biological activ
ity that another drug does not have. This information
becomes noise when such a drug is linked to other drugs
through edges in the graph model. The transmission of this
noise is suppressed (...truncated)