Elsevier

Neurocomputing

Volume 206, 19 September 2016, Pages 50-57
Neurocomputing

Predicting drug–target interaction using positive-unlabeled learning

https://doi.org/10.1016/j.neucom.2016.03.080Get rights and content

Abstract

Identifying interactions between drug compounds and target proteins is an important process in drug discovery. It is time-consuming and expensive to determine interactions between drug compounds and target proteins with experimental methods. The computational methods provide an effective strategy to address this issue. The difficulties of drug–target interaction identification include the lack of known drug–target association and no experimentally verified negative samples. In this work, we present a method, called PUDT, to predict drug–target interactions. Instead of treating unknown interactions as negative samples, we set it as unlabeled samples. We use three strategies (Random walk with restarts, KNN and heat kernel diffusion) to part unlabeled samples into two groups: reliable negative samples (RN) and likely negative samples (LN) based on target similarity information. Then, majority voting method is used to aggregate these strategies to decide the final label of unlabeled samples. Finally, weighted support vector machine is employed to build a classifier. Four datasets (enzyme, ion channel, GPCR and nuclear receptor) are used to evaluate the performance of our method. The results demonstrate that the performance of our method is comparable or better than recent state-of-the-art approaches.

Introduction

The development of a new drug is a cost- and time-consuming process. According to the US Food and Drug Administrations (FDA) statistical data, the cost of new molecular entity discovery is approximately $1.8 billion and it takes averagely 13 years [1]. In addition, only about 20 new molecular entities are approved by FDA each year. Therefore, it is an important issue in reducing these expenses in drug discovery. The computational methods provide an effective strategy to address this issue [2].

With the development of high-throughput techniques, a great deal of drug–target interaction data has been generated [3], [4], [5]. Several databases have been established to store interaction information and provide relevant retrieval servers. For example, DrugBank [6] database is a popular web resource containing information on drugs and drug targets which contains 7740 drug entries in the present version. ChEMBL [7] maintained by the European Bioinformatics Institute (EBI) is a manually curated chemical database of bioactive molecules with drug-like properties. In version 19, it contains 10,579 targets and 1,637,862 compound records and 2,843,338 bioactivity evidences. Supertarget [8] is an online and freely accessible database which contains over 6000 target proteins.

The computational methods have been boosted to predict drug–target interactions on account of the availability of interaction data. The traditional computational methods for drug–target interaction identification can be classified into three categories: ligand-based methods [9], [10], docking-based methods [11], [12] and literature text mining methods [13]. These approaches have achieved great successful in drug target interaction prediction. However, these methods have some limitations: the ligand-based methods rely on the number of known ligands, the docking-based methods need the information of protein structure, and literature text mining based methods are unable to find unknown and interesting interactions.

Recently, more and more statistical methods have been proposed to predict drug target interactions by integrating biological knowledge such as drug chemical structures, target protein sequence, gene expression and known drug–target interactions [14], [15], [16], [17]. The assumption of these approaches is that similar drugs show similar patterns of interactions with targets in drug–target interaction network [18], [19]. Chen et al. [15] presented network-based random walk with restart method, called NRWRH, to predict relationships between drugs and targets by integrating drug–drug chemical structure similarity network, protein–protein sequence similarity network and known drug–target interaction network into a heterogeneous network. Cheng et al. [14] proposed three inferring methods including drug-based similarity inference (DBSI), target-based similarity inference (TBSI) and network-based inference (NBI) to predict drug–target interactions. Similar work has been accomplished by Alaimo et al. [20], they presented DT-hybrid approach which extends network-based inference method by domain-based knowledge to detect drug–target interactions. Emig et al. [21] integrated different network-based methods to predict drug targets of a specific disease. These methods are easy to be implemented. However, these methods are unable to apply to drugs without any targets information. In addition, Bleakley and Yamanishi [17] employed bipartite local models to predict relationships between drugs and targets. Further work has been completed by Mei et al. [22], they integrated neighbour information into bipartite local models for drug target interaction identification. The Gaussian interaction profile kernel and weighted nearest neighbour were integrated for drug–target interaction prediction [23]. The Bayesian matrix factorization and binary classification [24] and probabilistic matrix factorization [25] were proposed to detect drug–target interactions. The common limitation of these supervised learning approaches is to treat unknown drug–target interactions as negative samples, which may affect predictive accuracy. Xia et al. [16] developed a semi-supervised method (NetLapRLS) for drug–target interaction identification by using positive and unlabeled samples. Chen and Zhang [26] presented NetCBP method by maximizing the rank coherence with respect to known knowledge to identify associations between drugs and targets. These semi-supervised methods can make use of unlabeled information. But they need to combine two different classifiers in the final.

Despite these approaches have achieved good performance, there are some limitations and difficulties for drug–target interactions prediction. Firstly, most of the methods adopt sequence information to measure the similarity of two proteins. More studies demonstrate that the structure information is more conservative than sequence information. Therefore, the structure information of target protein may be better suited for drug–target interaction identification. Secondly, there are no experimentally verified negative samples. Traditional methods treat the non-interaction data as negative sample which is unreasonable as those non-interaction data may contain undetected drug–target interactions. Thirdly, some methods are unable to predict new drugs without any targets, which limits the application in practice.

In this paper, we propose a framework to predict drug–target interaction based on positive-unlabeled learning. Comparing with existing approaches, we integrated multiple target resources including target structure information, target function category information and target function annotation information. In addition, we treat unknown drug target interactions as unlabeled set U instead of negative set N. Three strategies (Random walk with restarts, KNN and heat kernel diffusion) are used to classify unlabeled samples into two groups: reliable negative samples (RN) and likely negative samples (LN) based on target similarity information and majority voting method is used to aggregate these strategies to decide the final label of unlabeled samples. The weighted support vector machines are employed to build a multi-level classifier to predict drug target interactions based on positive set, reliable negative set and likely negative set. The experiments are conducted on four datasets (including Enzyme, Ion Channel, GPCR and Nuclear Receptor). The experimental results demonstrate that our method outperforms state-of-the-art approaches.

Section snippets

Data preparation

In this paper, we use four drug–target interaction networks in human involving Enzyme, Ion Channel, GPCR and Nuclear Receptor which are first analysed by Yamanishi et al. [27]. These datasets can be downloaded from http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/. Table 1 show some information of four datasets. The drug–target interaction data are collected from the KEGG BRITE [28], BRENDA [29], SuperTarget [8] and DrugBank [6].

Drug chemical structure information is retrieved from the DRUG

Experiments and results

In this section, we first analyse degree distributions of drugs in four drug–target interaction networks. Then, we compare our method with five state-of-the-art approaches (DBSI [14], NetLapRLS [16], KBMF2K [24], NetCBP [26], WNN-GIP [23]) for drug–target interaction prediction. Last, we show the performance of our method in potential drug–target interaction identification.

Conclusion and discussion

To systematically understand the associations between chemical compounds and target proteins is conducive to new drug design and discovery. Due to the limitation of traditional experimental methods, it is common for biological scientists to predict for drug–target interaction prediction by computational methods. Many computational approaches have been developed to predict drug–target interactions. However, there are some limitations existing in these methods: (1) some methods treat unlabeled

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grant nos. 61232001, 61428209 and 61420106009; the Program for New Century Excellent Talents in University (NCET-12-0547).

Wei Lan received his B.Sc. and M.Sc. degrees in Henan Polytechnical University and Guangxi University, China in 2009 and 2012, respectively. He is currently a Ph.D. Candidate in Bioinformatics at Central South University. His currently research interests including data mining, machine learning and bioinformatics especially in drug target, disease gene and noncoding RNA.

References (46)

  • A. Gaulton et al.

    Chembla large-scale bioactivity database for drug discovery

    Nucl. Acids Res.

    (2012)
  • N. Hecker et al.

    Supertarget goes quantitativeupdate on drug–target interactions

    Nucl. Acids Res.

    (2011)
  • M.J. Keiser et al.

    Relating protein pharmacology by ligand chemistry

    Nat. Biotechnol.

    (2007)
  • S. Pérot et al.

    Insights into an original pocket–ligand pair classificationa promising tool for ligand profile prediction

    PLoS One

    (2013)
  • A.C. Cheng et al.

    Structure-based maximal affinity model predicts small-molecule druggability

    Nat. Biotechnol.

    (2007)
  • S.A. Combs et al.

    Small-molecule ligand docking into comparative models with rosetta

    Nat. Protoc.

    (2013)
  • S. Zhu et al.

    A probabilistic model for mining implicit ‘chemical compound–gene’ relations from literature

    Bioinformatics

    (2005)
  • F. Cheng et al.

    Prediction of drug–target interactions and drug repositioning via network-based inference

    PLoS Comput. Biol.

    (2012)
  • X. Chen et al.

    Drug–target interaction prediction by random walk on the heterogeneous network

    Mol. BioSyst.

    (2012)
  • Z. Xia et al.

    Semi-supervised drug–protein interaction prediction from heterogeneous biological spaces

    BMC Syst. Biol.

    (2010)
  • K. Bleakley et al.

    Supervised prediction of drug–target interactions using bipartite local models

    Bioinformatics

    (2009)
  • X. Chen et al.

    Drug–target interaction predictiondatabases, web servers and computational models

    Brief. Bioinform.

    (2015)
  • S. Alaimo et al.

    Drug–target interaction prediction through domain-tuned network-based inference

    Bioinformatics

    (2013)
  • Cited by (85)

    • DTIP-TC2A: An analytical framework for drug-target interactions prediction methods

      2022, Computational Biology and Chemistry
      Citation Excerpt :

      In other words, this class of approaches, in a proper combination with other traditional categories, allows learning from a limited number of positive samples and a large number of unlabeled samples. The consequence of this appropriate combination can positively affect the prediction results and increase the accuracy of the final results (Lan et al., 2016). Their algorithm then has identified the unlabeled sample that has the largest total distance from the positive samples (P) and considers it as the first negative sample.

    • Drug-target interaction prediction using reliable negative samples and effective feature selection methods

      2022, Journal of Pharmacological and Toxicological Methods
      Citation Excerpt :

      One of the limitations of network-based methods is that they essentially identify novel target proteins close to the known target proteins in the network. In recent years, machine learning-based methods have been widely used to overcome the problems of previous methods (Bagherian et al., 2020; Bahi & Batouche, 2018; Chen & Zhang, 2013; Hameed, Verspoor, Kusljic, & Halgamuge, 2017; Lan et al., 2016; Peng et al., 2017; Redkar, Mondal, Joseph, & Hareesha, 2020; Sachdev & Gupta, 2019; Wang et al., 2020; Wen et al., 2017). These methods assume that similar drugs are likely to interact with similar proteins.

    View all citing articles on Scopus

    Wei Lan received his B.Sc. and M.Sc. degrees in Henan Polytechnical University and Guangxi University, China in 2009 and 2012, respectively. He is currently a Ph.D. Candidate in Bioinformatics at Central South University. His currently research interests including data mining, machine learning and bioinformatics especially in drug target, disease gene and noncoding RNA.

    Jianxin Wang received the B.Eng. and M.Eng. degrees in Computer Engineering from Central South University, China, in 1992 and 1996, respectively, and the Ph.D. degree in computer science from Central South University, China, in 2001. He is the Vice Dean and a Professor in School of Information Science and Engineering, Central South University, Changsha, Hunan, PR China. His current research interests include algorithm analysis and optimization, parameterized algorithm, bioinformatics and computer network. He has published more than 150 papers in various International Journals and refereed Conferences.

    Min Li received the B.S. in Communication Engineering from Central South University, China, in 2001, M.S. degrees in Traffic Information and Control Engineering from Central South University, China, in 2004 and the Ph.D. degree in Computer Science from Central South University, China, in 2008. She is the Professor in School of Information Science and Engineering, Central South University, Changsha, Hunan, PR China. Her current research interests include protein–protein interaction networks, essential proteins discovery, integrative analysis of molecular networks with other biological data and identifying dynamic network modules.

    Jin Liu received his B.S. degree in Automation from East China Institute of Technology in 2010 and his M.S. degree in Computer Technology from University of Chinese Academy of Sciences in 2013. He is currently a Ph.D. Candidate in School of Information Science and Engineering, Central South University, Changsha, Hunan, PR China. His current research interests include medical image analysis, machine learning and pattern recognition.

    Yaohang Li is an Associate Professor in the Department of Computer Science at Old Dominion University, Norfolk, VA, USA. His research interests are in Computational Biology and Scientific Computing. He received the M.S. and Ph.D. degrees in Computer Science from the Florida State University, Tallahassee, FL, USA, in 2000 and 2003, respectively. After graduation, he worked at Oak Ridge National Laboratory as a research associate for a short period of time. Before joining ODU, he was an Associate Professor in the Computer Science Department at North Carolina A&T State University, Greensboro, NC, USA.

    Fang-Xiang Wu received the B.Sc. and M.Sc. degrees in Applied Mathematics, both from Dalian University of Technology, China, in 1990 and 1993, respectively, the first Ph.D. degree in Control Theory and its Applications from Northwestern Polytechnical University in 1998, and the second Ph.D. degree in Biomedical Engineering from the University of Saskatchewan, Canada, in 2004. Currently, he is working as an Associate Professor of Bioengineering with the Department of Mechanical Engineering and graduate chair of the Division of Biomedical Engineering at the University of Saskatchewan, Canada. His current research interests include systems biology, genomic and proteomic data analysis, biological system identification and parameter estimation, and applications of control theory to biological system.

    Yi Pan is a Regents׳ Professor of Computer Science and an Interim Associate Dean and Chair of Biology at Georgia State University, USA. Dr. Pan joined Georgia State University in 2000 and was promoted to full professor in 2004, named a Distinguished University Professor in 2013 and designated a Regents׳ Professor (the highest recognition given to a faculty member by the University System of Georgia) in 2015. He served as the Chair of Computer Science Department from 2005–2013. He is also a visiting Changjiang Chair Professor at Central South University, China. Dr. Pan received his B.Eng. and M.Eng. degrees in computer engineering from Tsinghua University, China, in 1982 and 1984, respectively, and his Ph.D. degree in computer science from the University of Pittsburgh, USA, in 1991. His profile has been featured as a distinguished alumnus in both Tsinghua Alumni Newsletter and University of Pittsburgh CS Alumni Newsletter. Dr. Pan׳s research interests include parallel and cloud computing, wireless networks, and bioinformatics. Dr. Pan has published more than 330 papers including over 180 SCI journal papers and 60 IEEE/ACM Transactions papers. In addition, he has edited/authored 40 books. His work has been cited more than 6500 times. Dr. Pan has served as an editor-in-chief or editorial board member for 15 journals including 7 IEEE Transactions. He is the recipient of many awards including IEEE Transactions Best Paper Award, 4 other international conference or journal Best Paper Awards, 4 IBM Faculty Awards, 2 JSPS Senior Invitation Fellowships, IEEE BIBE Outstanding Achievement Award, NSF Research Opportunity Award, and AFOSR Summer Faculty Research Fellowship. He has organized many international conferences and delivered keynote speeches at over 50 international conferences around the world.

    View full text