skip to main content
10.1145/3555776.3578606acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Semi-Supervised Hybrid Predictive Bi-Clustering Trees for Drug-Target Interaction Prediction

Published: 07 June 2023 Publication History

Abstract

Information about interactions between objects can be used to solve many important problems. One of these important problems is drug-target interaction prediction, where different machine learning methods can be applied to solve the prediction task. Among them, Predictive Bi-Clustering Trees (PBCTs) stand out for being a global-based multi-label algorithm with the ability to predict all interactions simultaneously. PBCTs induce a decision tree based on the interaction matrix to produce partitions, where each leaf node corresponds to a partition of the initial matrix. To be used, it needs an interaction matrix built from a true bipartite graph containing the interactions referring to the objects. However, it has a significant disadvantage over unbalanced datasets or datasets with a high rate of unknown (unlabeled) data. In this work, we propose a semi-supervised approach to improve predictive bi-clustering trees, where the semi-supervised impurity function replaces the impurity reduction function used in tree splits. We applied our approach to predict drug-target interaction and obtained competitive results compared to the original state-of-the-art PBCT.

References

[1]
Jessa Bekker and Jesse Davis. 2020. Learning from positive and unlabeled data: a survey. Machine Learning 109, 4 (apr 2020), 719--760.
[2]
Jerome Cary Beltran, Paolo Valdez, and Prospero Naval. 2019. Predicting Protein-Protein Interactions based on Biological Information using Extreme Gradient Boosting. In IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE.
[3]
Kendrick Boyd, Kevin H. Eng, and C. David Page. 2013. Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals. In Advanced Information Systems Engineering. Springer Berlin Heidelberg, 451--466.
[4]
Guilherme Camargo, Pedro H. Bugatti, and Priscila T. M. Saito. 2020. Active semi-supervised learning for biological data classification. PLOS ONE 15, 8 (aug 2020), e0237428.
[5]
Rita Casadio, Pier Luigi Martelli, and Castrense Savojardo. 2022. Machine learning solutions for predicting protein-protein interactions. WIREs Computational Molecular Science (mar 2022).
[6]
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16 (jun 2002), 321--357.
[7]
Huaming Chen, Lei Wang, Chi-Hung Chi, and Jun Shen. 2019. Leveraging SMOTE in a Two-Layer Model for Prediction of Protein-Protein Interactions. In 2019 Seventh International Conference on Advanced Cloud and Big Data (CBD). IEEE.
[8]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM.
[9]
Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 1 (jan 2020).
[10]
Davide Chicco, Niklas Tötsch, and Giuseppe Jurman. 2021. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining 14, 1 (feb 2021).
[11]
Jesse Davis and Mark Goadrich. 2006. The relationship between Precision-Recall and ROC curves. ACM Press.
[12]
Shifei Ding, Zhibin Zhu, and Xiekai Zhang. 2015. An overview on semi-supervised support vector machine. Neural Computing and Applications 28, 5 (nov 2015), 969--978.
[13]
Francisco Herrera, Francisco Charte, Antonio J. Rivera, and María J. del Jesus. 2016. Multilabel Classification. Springer International Publishing.
[14]
Cristina Jimenez, Marco Molina, and Carlos Montenegro. 2019. Deep Learning Based Models for Drug-Drug Interactions Extraction in the Current Biomedical Literature. In International Conference on Information Systems and Software Technologies (ICI2ST). IEEE.
[15]
Jurica Levatić, Michelangelo Ceci, Dragi Kocev, and Sašo Džeroski. 2017. Semi-supervised classification trees. Journal of Intelligent Information Systems 49, 3 (mar 2017), 461--486.
[16]
Jianqiang Li, Xiaofeng Shi, Zhu-Hong You, Hai-Cheng Yi, Zhuangzhuang Chen, Qiuzhen Lin, and Min Fang. 2020. Using Weighted Extreme Learning Machine Combined with Scale-invariant Feature Transform to Predict Protein-Protein Interactions from Protein Evolutionary Information. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2020), 1--1.
[17]
Shiwei Li, Sanan Wu, Lin Wang, Fenglei Li, Hualiang Jiang, and Fang Bai. 2022. Recent advances in predicting protein-protein interactions with the aid of artificial intelligence algorithms. Current Opinion in Structural Biology 73 (apr 2022), 102344.
[18]
Yu-Feng Li and De-Ming Liang. 2019. Safe semi-supervised learning: a brief introduction. Frontiers of Computer Science 13, 4 (jun 2019), 669--676.
[19]
Zengchao Mu, Ting Yu, Enfeng Qi, Juntao Liu, and Guojun Li. 2019. DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information. BMC Bioinformatics 20, 1 (jun 2019).
[20]
Konstantinos Pliakos, Pierre Geurts, and Celine Vens. 2018. Global multi-output decision trees for interaction prediction. Machine Learning 107, 8-10 (may 2018), 1257--1281.
[21]
Konstantinos Pliakos and Celine Vens. 2020. Drug-target interaction prediction with tree-ensemble learning and output space reconstruction. BMC Bioinformatics 21, 1 (feb 2020).
[22]
Claudio Saccà, Stefano Teso, Michelangelo Diligenti, and Andrea Passerini. 2014. Improved multi-level protein-protein interaction prediction with semantic-based regularization. BMC Bioinformatics 15, 1 (2014), 103.
[23]
Marie Schrynemackers, Robert Küffner, and Pierre Geurts. 2013. On protocols and measures for the validation of supervised methods for the inference of biological networks. Frontiers in Genetics 4 (2013).
[24]
Maha A. Thafar, Rawan S. Olayan, Somayah Albaradei, Vladimir B. Bajic, Takashi Gojobori, Magbubah Essack, and Xin Gao. 2021. DTi2Vec: Drug-target interaction prediction using network embedding and ensemble learning. Journal of Cheminformatics 13, 1 (sep 2021).
[25]
Jesper E. van Engelen and Holger H. Hoos. 2019. A survey on semi-supervised learning. Machine Learning 109, 2 (nov 2019), 373--440.
[26]
Twan van Laarhoven, Sander B. Nabuurs, and Elena Marchiori. 2011. Gaussian interaction profile kernels for predicting drug-target interaction. Bioinformatics 27, 21 (sep 2011), 3036--3043.
[27]
Celine Vens, Jan Struyf, Leander Schietgat, Sašo Džeroski, and Hendrik Blockeel. 2008. Decision Trees for Hierarchical Multi-Label Classification. Machine Learning 73, 2 (Nov. 2008), 185--214.
[28]
Ye Wang, Changqing Mei, Yuming Zhou, Yan Wang, Chunhou Zheng, Xiao Zhen, Yan Xiong, Peng Chen, Jun Zhang, and Bing Wang. 2019. Semi-supervised prediction of protein interaction sites from unlabeled sample information. BMC Bioinformatics 20, S25 (dec 2019).
[29]
Y. Yamanishi, M. Araki, A. Gutteridge, W. Honda, and M. Kanehisa. 2008. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24, 13 (jun 2008), i232--i240.
[30]
Cheng Yan, Guihua Duan, Yi Pan, Fang-Xiang Wu, and Jianxin Wang. 2019. DDIGIP: predicting drug-drug interactions based on Gaussian interaction profile kernels. BMC Bioinformatics 20, S15 (dec 2019).
[31]
Cheng Yan, Guihua Duan, Yayan Zhang, Fang-Xiang Wu, Yi Pan, and Jianxin Wang. 2022. Predicting Drug-Drug Interactions Based on Integrated Similarity and Semi-Supervised Learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics 19, 1 (jan 2022), 168--179.
[32]
K. M. Shawkat Zamil and Julia Rahman. 2018. Prediction of Protein-Protein Interaction from Amino Acid Sequence Using Ensemble Classifier. In International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2). IEEE.
[33]
Lin Zhong, Zhong Ming, Guobo Xie, Chunlong Fan, and Xue Piao. 2020. Recent Advances on the Semi-Supervised Learning for Long Non-Coding RNA-Protein Interactions Prediction: A Review. Protein & Peptide Letters 27, 5 (apr 2020), 385--391.
[34]
Zhi-Hua Zhou. 2021. Semi-Supervised Learning. In Machine Learning. Springer Singapore, 315--341.
[35]
Qiuming Zhu. 2020. On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recognition Letters 136 (aug 2020), 71--80.

Cited By

View all

Index Terms

  1. Semi-Supervised Hybrid Predictive Bi-Clustering Trees for Drug-Target Interaction Prediction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing
      March 2023
      1932 pages
      ISBN:9781450395175
      DOI:10.1145/3555776
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 June 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. machine learning
      2. interaction prediction
      3. biclustering
      4. hybrid semi-supervision
      5. imbalanced learning

      Qualifiers

      • Research-article

      Funding Sources

      • Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES)
      • Fundação de Amparo à Pesquisa do Estado de São Paulo -Brasil (FAPESP)

      Conference

      SAC '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

      Upcoming Conference

      SAC '25
      The 40th ACM/SIGAPP Symposium on Applied Computing
      March 31 - April 4, 2025
      Catania , Italy

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 44
        Total Downloads
      • Downloads (Last 12 months)17
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media