MSDAFL: molecular substructure-based dual attention feature learning framework for predicting drug–drug interactions
Bioinformatics, 2024, 40(10), btae596
https://doi.org/10.1093/bioinformatics/btae596
Advance Access Publication Date: 9 October 2024
Original Paper
Systems biology
Chao Hou1, Guihua Duan2, Cheng Yan
1
2
1,�
School of Informatics, Hunan University of Chinese Medicine, Changsha, Hunan 410208, China
School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
�Corresponding author. School of Informatics, Hunan University of Chinese Medicine, Changsha, Hunan 410208, China. E-mail:
Associate Editor: Jianlin Cheng
Abstract
Motivation: Drug–drug interactions (DDIs) can cause unexpected adverse drug reactions, affecting treatment efficacy and patient safety. The
need for computational methods to predict DDIs has been growing due to the necessity of identifying potential risks associated with drug com
binations in advance. Although several deep learning methods have been recently proposed to predict DDIs, many overlook feature learning
based on interactions between the substructures of drug pairs.
Results: In this work, we introduce a molecular Substructure-based Dual Attention Feature Learning framework (MSDAFL), designed to fully
utilize the information between substructures of drug pairs to enhance the performance of DDI prediction. We employ a self-attention module
to obtain a set number of self-attention vectors, which are associated with various substructural patterns of the drug molecule itself, while also
extracting interaction vectors representing inter-substructure interactions between drugs through an interactive attention module.
Subsequently, an interaction module based on cosine similarity is used to further capture the interactive characteristics between the selfattention vectors of drug pairs. We also perform normalization after the interaction feature extraction to mitigate overfitting. After applying
three-fold cross-validation, the MSDAFL model achieved average precision scores of 0.9707, 0.9991, and 0.9987, and area under the receiver
operating characteristic curve scores of 0.9874, 0.9934, and 0.9974 on three datasets, respectively. In addition, the experiment results of fivefold cross-validation and cross-datum study also indicate that MSDAFL performs well in predicting DDIs.
Availability and implementation: Data and source codes are available at https://github.com/27167199/MSDAFL.
1 Introduction
Drug–drug interactions (DDIs) can cause unexpected adverse
drug reactions, affecting treatment efficacy and patient safety
(Vilar et al. 2014). DDIs refer to interactions that occur between
two or more drug administration processes, including changes
in drug properties and the occurrence of toxic side effects (Sun
et al. 2016). Therefore, research on DDI prediction is of great
practical importance. However, traditional biological or
pharmacological methods are costly, time-consuming, and
labor-intensive (Shao and Zhang 2013).
Machine learning offers a fresh avenue for accurately predict
ing DDIs (Mei and Zhang 2021). Methods based on feature
similarity posit that drugs sharing similar attributes often
exhibit comparable reaction patterns, relying largely on drug
properties such as fingerprinting (Vilar et al. 2013), chemical
structures (Takeda et al. 2017), pharmacological phenotypes (Li
et al. 2015), and RNA profiles (Li et al. 2022). Enhancements
in model efficacy are achieved by integrating various features.
For instance, the DDI-IS-SL model forecasts DDIs through a
blend of integrated similarity measures and semi-supervised
learning techniques (Yan et al. 2020). Despite their advance
ments, these feature similarity-based methods often overlook
the structural details of drugs, and their feature selection heavily
depends on specialized knowledge and experience.
Graph neural networks (GNNs) have widely been imple
mented to analyze the chemical structures of drugs and forecast
DDIs. Contemporary GNN methodologies are divided into two
main types. The first type focuses on embedding features di
rectly from the molecular graphs of drugs, effectively utilizing a
straightforward method to encapsulate graph-based data
(Gilmer et al. 2017). In this method, atoms within the molecular
graph are treated as nodes, with chemical bonds serving as the
connecting edges. This setup allows for the embedding of the
molecular graph by learning features of individual atoms and
the interactions conveyed through the chemical bonds. For in
stance, SSI-DDI deconstructs the DDI prediction task between
two drugs to pinpoint pairwise interactions among their respec
tive substructures (Nyamabo et al. 2021). DSN-DDI is a
dual-view drug representation learning network specifically
engineered to concurrently learn drug substructures from indi
vidual drugs and drug pairs (Li et al. 2023). The second type
leverages existing drug interaction networks, where drugs are
nodes and their interactions are edges, treating the task of DDI
prediction as akin to link prediction within these networks.
Received: 25 June 2024; Revised: 24 August 2024; Editorial Decision: 29 September 2024; Accepted: 7 October 2024
© The Author(s) 2024. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
MSDAFL: molecular substructure-based dual attention
feature learning framework for predicting drug–drug
interactions
2
Hou et al.
�
�
�
We designed a new Molecular Substructure-based Dual
Attention Feature Learning framework for predicting DDIs
(MSDAFL). This framework leverages both self-attention
and interactive attention mechanisms to effectively extract
and process interaction information between drug substruc
tures, enhancing the accuracy of DDI predictions.
To uncover the hidden features of interactions between
drug substructures, we computed the cosine similarity
matrix. This approach has shown that these similarity
vectors significantly contribute to the accuracy of predict
ing DDIs.
Additionally, to reduce overfitting during model training,
we adopted a normalization strategy. This not only
retains the essential interaction features but also improves
the predictability and reliability of DDI outcomes.
2 Materials and methods
2.1 Dataset
To evaluate the scalability and robustness of MSDAFL, we test
our model on three public datasets, which vary in scale, density
and widely used in previous studies. The scale of the dataset is
determined by the number of drugs included. According to pre
vious studies, we also treat the observed DDIs as positive sam
ples and also randomly sample the non-existing DDIs to
generate the negative samples. We perform stratified splitting to
divide all the drug pairs into a training set, a validation set, and
a testing set in a ratio of 6:2:2 (three-fold cross-validation) and
8 (...truncated)