research-article

Semi-Supervised Hybrid Predictive Bi-Clustering Trees for Drug-Target Interaction Prediction

Authors:

Ricardo CerriAuthors Info & Claims

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing

Pages 1163 - 1170

https://doi.org/10.1145/3555776.3578606

Published: 07 June 2023 Publication History

Abstract

Information about interactions between objects can be used to solve many important problems. One of these important problems is drug-target interaction prediction, where different machine learning methods can be applied to solve the prediction task. Among them, Predictive Bi-Clustering Trees (PBCTs) stand out for being a global-based multi-label algorithm with the ability to predict all interactions simultaneously. PBCTs induce a decision tree based on the interaction matrix to produce partitions, where each leaf node corresponds to a partition of the initial matrix. To be used, it needs an interaction matrix built from a true bipartite graph containing the interactions referring to the objects. However, it has a significant disadvantage over unbalanced datasets or datasets with a high rate of unknown (unlabeled) data. In this work, we propose a semi-supervised approach to improve predictive bi-clustering trees, where the semi-supervised impurity function replaces the impurity reduction function used in tree splits. We applied our approach to predict drug-target interaction and obtained competitive results compared to the original state-of-the-art PBCT.

References

[1]

Jessa Bekker and Jesse Davis. 2020. Learning from positive and unlabeled data: a survey. Machine Learning 109, 4 (apr 2020), 719--760.

Digital Library

[2]

Jerome Cary Beltran, Paolo Valdez, and Prospero Naval. 2019. Predicting Protein-Protein Interactions based on Biological Information using Extreme Gradient Boosting. In IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE.

Digital Library

[3]

Kendrick Boyd, Kevin H. Eng, and C. David Page. 2013. Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals. In Advanced Information Systems Engineering. Springer Berlin Heidelberg, 451--466.

Digital Library

[4]

Guilherme Camargo, Pedro H. Bugatti, and Priscila T. M. Saito. 2020. Active semi-supervised learning for biological data classification. PLOS ONE 15, 8 (aug 2020), e0237428.

[5]

Rita Casadio, Pier Luigi Martelli, and Castrense Savojardo. 2022. Machine learning solutions for predicting protein-protein interactions. WIREs Computational Molecular Science (mar 2022).

[6]

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16 (jun 2002), 321--357.

[7]

Huaming Chen, Lei Wang, Chi-Hung Chi, and Jun Shen. 2019. Leveraging SMOTE in a Two-Layer Model for Prediction of Protein-Protein Interactions. In 2019 Seventh International Conference on Advanced Cloud and Big Data (CBD). IEEE.

[8]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM.

Digital Library

[9]

Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 1 (jan 2020).

[10]

Davide Chicco, Niklas Tötsch, and Giuseppe Jurman. 2021. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining 14, 1 (feb 2021).

[11]

Jesse Davis and Mark Goadrich. 2006. The relationship between Precision-Recall and ROC curves. ACM Press.

Digital Library

[12]

Shifei Ding, Zhibin Zhu, and Xiekai Zhang. 2015. An overview on semi-supervised support vector machine. Neural Computing and Applications 28, 5 (nov 2015), 969--978.

Digital Library

[13]

Francisco Herrera, Francisco Charte, Antonio J. Rivera, and María J. del Jesus. 2016. Multilabel Classification. Springer International Publishing.

[14]

Cristina Jimenez, Marco Molina, and Carlos Montenegro. 2019. Deep Learning Based Models for Drug-Drug Interactions Extraction in the Current Biomedical Literature. In International Conference on Information Systems and Software Technologies (ICI2ST). IEEE.

[15]

Jurica Levatić, Michelangelo Ceci, Dragi Kocev, and Sašo Džeroski. 2017. Semi-supervised classification trees. Journal of Intelligent Information Systems 49, 3 (mar 2017), 461--486.

Digital Library

[16]

Jianqiang Li, Xiaofeng Shi, Zhu-Hong You, Hai-Cheng Yi, Zhuangzhuang Chen, Qiuzhen Lin, and Min Fang. 2020. Using Weighted Extreme Learning Machine Combined with Scale-invariant Feature Transform to Predict Protein-Protein Interactions from Protein Evolutionary Information. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2020), 1--1.

Digital Library

[17]

Shiwei Li, Sanan Wu, Lin Wang, Fenglei Li, Hualiang Jiang, and Fang Bai. 2022. Recent advances in predicting protein-protein interactions with the aid of artificial intelligence algorithms. Current Opinion in Structural Biology 73 (apr 2022), 102344.

[18]

Yu-Feng Li and De-Ming Liang. 2019. Safe semi-supervised learning: a brief introduction. Frontiers of Computer Science 13, 4 (jun 2019), 669--676.

Digital Library

[19]

Zengchao Mu, Ting Yu, Enfeng Qi, Juntao Liu, and Guojun Li. 2019. DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information. BMC Bioinformatics 20, 1 (jun 2019).

[20]

Konstantinos Pliakos, Pierre Geurts, and Celine Vens. 2018. Global multi-output decision trees for interaction prediction. Machine Learning 107, 8-10 (may 2018), 1257--1281.

Digital Library

[21]

Konstantinos Pliakos and Celine Vens. 2020. Drug-target interaction prediction with tree-ensemble learning and output space reconstruction. BMC Bioinformatics 21, 1 (feb 2020).

[22]

Claudio Saccà, Stefano Teso, Michelangelo Diligenti, and Andrea Passerini. 2014. Improved multi-level protein-protein interaction prediction with semantic-based regularization. BMC Bioinformatics 15, 1 (2014), 103.

[23]

Marie Schrynemackers, Robert Küffner, and Pierre Geurts. 2013. On protocols and measures for the validation of supervised methods for the inference of biological networks. Frontiers in Genetics 4 (2013).

[24]

Maha A. Thafar, Rawan S. Olayan, Somayah Albaradei, Vladimir B. Bajic, Takashi Gojobori, Magbubah Essack, and Xin Gao. 2021. DTi2Vec: Drug-target interaction prediction using network embedding and ensemble learning. Journal of Cheminformatics 13, 1 (sep 2021).

[25]

Jesper E. van Engelen and Holger H. Hoos. 2019. A survey on semi-supervised learning. Machine Learning 109, 2 (nov 2019), 373--440.

[26]

Twan van Laarhoven, Sander B. Nabuurs, and Elena Marchiori. 2011. Gaussian interaction profile kernels for predicting drug-target interaction. Bioinformatics 27, 21 (sep 2011), 3036--3043.

Digital Library

[27]

Celine Vens, Jan Struyf, Leander Schietgat, Sašo Džeroski, and Hendrik Blockeel. 2008. Decision Trees for Hierarchical Multi-Label Classification. Machine Learning 73, 2 (Nov. 2008), 185--214.

Digital Library

[28]

Ye Wang, Changqing Mei, Yuming Zhou, Yan Wang, Chunhou Zheng, Xiao Zhen, Yan Xiong, Peng Chen, Jun Zhang, and Bing Wang. 2019. Semi-supervised prediction of protein interaction sites from unlabeled sample information. BMC Bioinformatics 20, S25 (dec 2019).

[29]

Y. Yamanishi, M. Araki, A. Gutteridge, W. Honda, and M. Kanehisa. 2008. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24, 13 (jun 2008), i232--i240.

Digital Library

[30]

Cheng Yan, Guihua Duan, Yi Pan, Fang-Xiang Wu, and Jianxin Wang. 2019. DDIGIP: predicting drug-drug interactions based on Gaussian interaction profile kernels. BMC Bioinformatics 20, S15 (dec 2019).

[31]

Cheng Yan, Guihua Duan, Yayan Zhang, Fang-Xiang Wu, Yi Pan, and Jianxin Wang. 2022. Predicting Drug-Drug Interactions Based on Integrated Similarity and Semi-Supervised Learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics 19, 1 (jan 2022), 168--179.

Digital Library

[32]

K. M. Shawkat Zamil and Julia Rahman. 2018. Prediction of Protein-Protein Interaction from Amino Acid Sequence Using Ensemble Classifier. In International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2). IEEE.

[33]

Lin Zhong, Zhong Ming, Guobo Xie, Chunlong Fan, and Xue Piao. 2020. Recent Advances on the Semi-Supervised Learning for Long Non-Coding RNA-Protein Interactions Prediction: A Review. Protein & Peptide Letters 27, 5 (apr 2020), 385--391.

[34]

Zhi-Hua Zhou. 2021. Semi-Supervised Learning. In Machine Learning. Springer Singapore, 315--341.

[35]

Qiuming Zhu. 2020. On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recognition Letters 136 (aug 2020), 71--80.

Cited By

Index Terms

Semi-Supervised Hybrid Predictive Bi-Clustering Trees for Drug-Target Interaction Prediction
1. Applied computing
  1. Life and medical sciences
    1. Computational biology
2. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms

Recommendations

Fast Bipartite Forests for Semi-supervised Interaction Prediction
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing

Numerous machine learning tasks can be framed as the prediction of interactions in a bipartite network, such as relationships between proteins and drug molecules, genes and transcription factors, or microRNAs and messenger RNAs. Such tasks present unique ...
Semi-supervised trees for multi-target regression

The predictive performance of traditional supervised methods heavily depends on the amount of labeled data. However, obtaining labels is a difficult process in many real-life tasks, and only a small amount of labeled data is typically available for ...
Semi-supervised Predictive Clustering Trees for Multi-label Protein Subcellular Localization
Intelligent Systems
Abstract
Protein subcellular localization is an important classification task because the location of proteins in a cell is directly linked to their functions. Since a protein can act at two or more locations simultaneously, multi-label classification ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing

March 2023

1932 pages

ISBN:9781450395175

DOI:10.1145/3555776

Conference Chairs:
Jiman Hong
Soongsil University, South Korea
,
Maart Lanperne
Tallinn University, Estonia
,
Program Chairs:
Juw Won Park
University of Louisville, USA
,
Tomas Cerny
Baylor University, USA
,
Publication Chair:
Hossain Shahriar
Kennesaw State University, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES)
Fundação de Amparo à Pesquisa do Estado de São Paulo -Brasil (FAPESP)

Conference

SAC '23

Sponsor:

SIGAPP

SAC '23: 38th ACM/SIGAPP Symposium on Applied Computing

March 27 - 31, 2023

Tallinn, Estonia

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
44
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)3

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten