REDDA: Integrating multiple biological relations to heterogeneous graph neural network for drug-disease association prediction

https://doi.org/10.1016/j.compbiomed.2022.106127Get rights and content

Highlights

  • A large-scale benchmark for drug-disease association is proposed with 5 entities and 10 relations.

  • A Heterogeneous graph neural network is proposed for drug-disease association prediction called REDDA.

  • REDDA uses 3 attention mechanisms and a topological subnet to learn comprehensive node representations.

  • REDDA outperforms 8 advanced baselines and gain impressive improvements on 2 datasets.

  • Attention visualization analyzes and case study demonstrate the effectiveness of the model architecture of REDDA.

Abstract

Computational drug repositioning is an effective way to find new indications for existing drugs, thus can accelerate drug development and reduce experimental costs. Recently, various deep learning-based repurposing methods have been established to identify the potential drug-disease associations (DDA). However, effective utilization of the relations of biological entities to capture the biological interactions to enhance the drug-disease association prediction is still challenging. To resolve the above problem, we proposed a heterogeneous graph neural network called REDDA (Relations-Enhanced Drug-Disease Association prediction). Assembled with three attention mechanisms, REDDA can sequentially learn drug/disease representations by a general heterogeneous graph convolutional network-based node embedding block, a topological subnet embedding block, a graph attention block, and a layer attention block. Performance comparisons on our proposed benchmark dataset show that REDDA outperforms 8 advanced drug-disease association prediction methods, achieving relative improvements of 0.76% on the area under the receiver operating characteristic curve (AUC) score and 13.92% on the precision-recall curve (AUPR) score compared to the suboptimal method. On the other benchmark dataset, REDDA also obtains relative improvements of 2.48% on the AUC score and 4.93% on the AUPR score. Specifically, case studies also indicate that REDDA can give valid predictions for the discovery of -new indications for drugs and new therapies for diseases. The overall results provide an inspiring potential for REDDA in the in silico drug development. The proposed benchmark dataset and source code are available in https://github.com/gu-yaowen/REDDA.

Introduction

The traditional wet-experiment-guided drug discovery is a time-consuming and high-risk process [1]. Recently, it has become increasingly difficult to identify potential therapeutic entities with novel chemical structures. The total cost of developing a new drug range from 3.2 to 27.0 billion dollars, take over 5.8–15.2 years and only achieve a success rate of 6.2% [[2], [3], [4]]. Thus, computational methods with cheaper and labor-saving solutions can accelerate the drug discovery, and have attracted increasing interests for both pharmaceutical industry and academic research communities [5,6]. For instance, there have been successful applications in drug property prediction [[7], [8], [9]], drug-target interaction assessment [8,10,11], and drug sensitivity prediction [12,13], etc. Computational drug repositioning methods focus on determining the new indications for drugs [14], thus reducing the unnecessary cost and improving success rate of drug development [15]. A variety of promising applications show that the role of computational drug repositioning in drug development cannot be ignored [[16], [17], [18], [19], [20]].

The existing computational drug repositioning methods can be approximately divided into 4 categories [21]: classical machine learning approaches, network propagation approaches, matrix factorization/completion approaches, and deep learning approaches.

Classical machine learning approaches take known drug-disease association pairs as positive labels to convert drug repositioning into a binary classification problem and further adopt the drug and disease information as input features to train machine learning classifiers. For instance, Gao et al. proposed a Laplacian regularized least squares algorithm combined with a similarity kernel fusion method to predict the drug-disease association, which called DDA-SKF [22]. Network propagation approaches construct drug-disease heterogeneous networks and use network-based algorithms (e.g., random walk) to predict the drug-disease association probabilities [6]. For example, Luo et al. proposed a method called MBiRW, which used similarity measurers to construct a drug-disease heterogeneous network and adopted the bi-random walk algorithm to predict potential drug-disease associations [23]. Matrix factorization/completion approaches model drug repositioning as a recommendation system, thus recommending new drugs/indications based on prior information such as known drug-disease associations. Zhang et al. proposed a similarity constrained matrix factorization method called SCMFDD for drug-disease association prediction. It maps the drug and disease features to low-rank spaces for solving constraint optimization [24]. Yang et al. took drug-disease association prediction as a noisy matrix completion problem and developed a bounded nuclear norm regularization (BNNR) method for it [25]. Yang et al. proposed a multi-similarities for bilinear matrix factorization (MSBMF) to extract effective drug and disease representations, which could be used to infer missing drug-disease associations [26]. With ongoing development in recent years, these types of computational drug repositioning methods have gained competitive performance. However, some crucial shortcomings still limit the achievement of higher accuracy and the utilization of practical scenarios, such as the high dependence on the quality of input features in machine learning approaches, the representation bias for nodes with high degrees on heterogeneous network in network propagation approaches, and the weak representation ability of drug-disease associations caused by linear multiplication in matrix factorization/completion approaches.

Deep learning approaches have been effectively applied in many biological domains, such as gene regulatory representation [27,28], single-cell omics analysis [[29], [30], [31], [32]], drug efficacy prediction [33], etc. For drug repositioning, deep learning approaches use neural networks to model the interactions between drugs and diseases with high flexibility and scalability, which have been widely used and proven to be highly competitive compared to the above three approaches [[34], [35], [36], [37], [38], [39], [40], [41]]. For instance, Zeng et al. integrated 10 drug-disease-related networks and trained a multimodal deep autoencoder on them to learn the high-order representations for drug repositioning, which is called deepDR [37]; Yu et al. established a graph convolutional network called LAGCN on a heterogeneous drug-disease network [34]; Meng et al. proposed a neighborhood and neighborhood interaction-based neural collaborative filtering approach called DRWBNCF for drug repositioning [36]. Zhang et al. designed a multi-scale topology learning method which integrated multiple drug-disease heterogeneous network and adopted random walk and attention mechanism for representation learning [42]. Xuan et al. proposed a graph autoencoder architecture with scale-level attention and convolutional neural network which called MGPred [43].

As sufficiently advanced methods for modeling drug-disease associations, these approaches have provided a series of attractive methodologies for deep learning-based drug repositioning, such as the construction of the heterogeneous networks, the utilization of layer attention mechanism, and bilinear dot decoder, etc. However, the drug-disease associations cannot be simply integrated as an isolated biological system as the above studies have done, while ignoring other extensive biological interactions, such as drug-protein, protein–gene, gene-pathway, pathway-disease, etc. From our perspective, these external biological relations can be assembled in the drug-disease heterogeneous network and bring extra information for the simulations of drug therapeutic process, thus enhance the representation ability of the drug repositioning model. Nevertheless, these concerns have not been studied in depth.

To resolve these problems, we propose a benchmark dataset which can construct to a heterogeneous network with 5 entities (drug, protein, gene, pathway, and disease) and 10 relations (drug-drug, drug-protein, protein-protein, gene-gene, gene-pathway, pathway-pathway, pathway-disease, disease-disease, and drug-disease) for drug repositioning. Furthermore, we also develop a promising drug repositioning method on the heterogeneous network, which we called Relations-Enhanced Drug-Disease Association prediction (REDDA). The main contributions of this work are summarized as follows:

  • We propose a large-scale benchmark dataset for drug repositioning. The benchmark contains 41,100 nodes and 1,008,258 edges with 5 biological entities (drug, protein, gene, pathway, and disease).

  • We propose a deep learning-based method for drug repositioning, namely REDDA. It takes the heterogeneous graph neural network as the backbone and integrates 3 attention mechanisms to learn the node representations of the heterogeneous network and topological subnets.

  • Comprehensive experiments demonstrate that REDDA outperforms several state-of-the-art algorithms. Ablation experiments indicate that the fusion of extra biological relations is beneficial for REDDA to predict drug-disease associations. Attention visualization analysis shows the importance of topological decomposition, graph-level aggregation, and layer-level aggregation in REDDA.

Section snippets

Dataset

As the existing benchmarks lack the biological entities and their relations, we construct a drug-disease association benchmark, including 5 entities (drug, protein, gene, pathway, and disease) and 10 relations (drug-drug, drug-protein, protein-protein, protein-gene, gene-gene, gene-pathway, pathway-pathway, pathway-disease, disease-disease, and drug-disease) as these biological entities and relations have been proved to contribute to drug repositioning [[44], [45], [46]]. We assemble the

Comparison with state-of-the-art approaches

We compared REDDA with 8 drug-disease association prediction methods to demonstrate the effectiveness of our model, including SCMFDD [24], MBiRW [23], NIMGCN [57], HINGRL-Node2Vec-RF, HINGRL-DeepWalk-RF [44], LAGCN [34], and DRWBNCF [36]. The details for the construction of these baseline methods can be found in supplementary materials.

The performance results of 10-fold cross-validations on our proposed benchmark dataset are shown in Table 3 and Fig. 2, while the statistical results of the

Conclusion

In this study, we propose a benchmark dataset for drug-disease association prediction. Much larger than a single drug-disease heterogeneous network, the constructed comprehensive heterogeneous network contains 5 biological entities and 10 relations. Moreover, to enhance the effectiveness of these extra biological relations on improving the performance of computational drug repositioning, a graph learning method (REDDA) on a heterogeneous network is proposed for predicting drug-disease

Funding

This work was supported by Chinese Academy of Medical Sciences (Grant No. 2021-I2M-1–056), Fundamental Research Funds for the Central Universities (Grant No. 3332022144), National Key R&D Program of China (Grant No. 2016YFC0901901 and Grant No. 2017YFC0907503) and the National Natural Science Foundation of China (Grant No. 81601573).

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to thank all anonymous reviewers for their constructive advice.

References (68)

  • Y. Gu et al.

    CurrMG: a curriculum learning approach for graph based molecular property prediction

  • Q. Ye et al.

    A unified drug-target interaction prediction framework based on knowledge graph and recommendation system

    Nat. Commun.

    (2021)
  • Y. Gu et al.

    An efficient curriculum learning-based strategy for molecular graph learning

    Briefings Bioinf.

    (2022)
  • W. Kong et al.

    Prediction and optimization of NaV1. 7 sodium channel inhibitors based on machine learning and simulated annealing

    J. Chem. Inf. Model.

    (2020)
  • Q. Liu et al.

    DeepCDR: a hybrid graph convolutional network for predicting cancer drug response

    Bioinformatics

    (2020)
  • F. Zhang et al.

    A novel heterogeneous network-based method for drug response prediction in cancer cell lines

    Sci. Rep.

    (2018)
  • J. Li et al.

    A survey of current trends in computational drug repositioning

    Briefings Bioinf.

    (2016)
  • H. Xue et al.

    Review of drug repositioning approaches and resources

    Int. J. Biol. Sci.

    (2018)
  • G. Fahimian et al.

    RepCOOL: computational drug repositioning via integrating heterogeneous biological networks

    J. Transl. Med.

    (2020)
  • J.I. Traylor et al.

    Computational drug repositioning identifies potentially active therapies for chordoma

    Neurosurgery

    (2021)
  • L. Bai et al.

    Computational drug repositioning of atorvastatin for ulcerative colitis

    J. Am. Med. Inf. Assoc.

    (2021)
  • C. Budak et al.

    Determining similarities of COVID-19 - lung cancer drugs and affinity binding mode analysis by graph neural network-based GEFA method

    J. Biomol. Struct. Dyn.

    (2021)
  • H. Luo et al.

    Biomedical data and computational models for drug repositioning: a comprehensive review

    Briefings Bioinf.

    (2021)
  • C.-Q. Gao et al.

    Predicting drug-disease associations using similarity kernel fusion

    Front. Pharmacol.

    (2022)
  • H. Luo et al.

    Drug repositioning based on comprehensive similarity measures and Bi-Random walk algorithm

    Bioinformatics

    (2016)
  • M. Yang et al.

    Drug repositioning based on bounded nuclear norm regularization

    Bioinformatics

    (2019)
  • M. Yang et al.

    Computational drug repositioning based on multi-similarities bilinear matrix factorization

    Briefings Bioinf.

    (2021)
  • Q. Cao et al.

    A unified framework for integrative study of heterogeneous gene regulatory mechanisms

    Nat. Mach. Intell.

    (2020)
  • W. Zeng et al.

    Reusability report: compressing regulatory networks to vectors for interpreting gene expression and genetic variants

    Nat. Mach. Intell.

    (2021)
  • Q. Liu et al.

    Simultaneous deep generative modeling and clustering of single cell genomic data

    Nat. Mach. Intell.

    (2021)
  • X. Chen et al.

    Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding

    Nat. Mach. Intell.

    (2022)
  • P. Zeng et al.

    Coupled co-clustering-based unsupervised transfer learning for the integrative analysis of single-cell genomic data

    Briefings Bioinf.

    (2021)
  • X. Huang et al.

    Cellsnp-lite: an efficient tool for genotyping single cells

    Bioinformatics

    (2021)
  • J. Zhu et al.

    Prediction of drug efficacy from transcriptional profiles with deep learning

    Nat. Biotechnol.

    (2021)
  • Cited by (0)

    1

    The first two authors contribute equally to the paper.

    View full text