Abstract
Understanding the substrate specificity of human immunodeficiency virus 1 (HIV-1) protease plays a significance role in the design of effective HIV-1 protease inhibitors. During the past two decades, a variety of machine learning models have been developed to predict the existence of HIV-1 protease cleavage sites. However, since the acquisition of cleavable octapeptides requires expensive and time-consuming experiments, and uncleavable octapeptides are usually generated by artificial strategies, the number of cleavable octapeptides in the existing data set is far less than that of uncleavable octapeptides. This phenomenon of unbalanced datasets may cause the prediction performance of the classification model to be inaccurate. In this work, we combine the idea of asymmetric bagging and the support vector machine (SVM) classifier to propose an ensemble learning algorithm, namely AB-HIV, for an effective treat the dataset imbalance problem in predict HIV-1 protease cleavage sites. In order to make full use of the information of the substrate sequence, AB-HIV uses three different coding schemes (amino acid identities, chemical properties and variable length coevolutionary patterns) to construct the feature vector. By using asymmetric bagging to resample a set of balanced training subsets from the training set, and then a set of SVM classifiers can be built for integration to complete the prediction task. Experiments on three independent benchmark datasets indicate that the proposed ensemble learning method outperforms the existing prediction methods in terms of AUC, PR AUC and F-measure evaluation criteria. Therefore, AB-HIV can be regarded as an effective method to deal with the dataset imbalance problem in predict HIV-1 protease cleavage sites.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Cai, Y.D., Chou, K.C.: Artificial neural network model for predicting hiv protease cleavage sites in protein. Adv. Eng. Softw. 29(2), 119–128 (1998)
Cai, Y.D., Liu, X.J., Xu, X.B., Chou, K.C.: Support vector machines for predicting hiv protease cleavage sites in protein. J. Comput. Chem. 23(2), 267–274 (2002)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Dang, T.H., Van Leemput, K., Verschoren, A., Laukens, K.: Prediction of kinasespecific phosphorylation sites using conditional random fields. Bioinformatics 24(24), 2857–2864 (2008)
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml 7(1) (2019)
Gök, M., Ozcerit, A.T.: A new feature encoding scheme for hiv-1 protease cleavage ¨ site prediction. Neural Comput. Appl. 22(7), 1757–1761 (2013)
Hu, L., Chan, K.C., Yuan, X., Xiong, S.: A variational bayesian framework for cluster analysis in a complex network. IEEE Trans. Knowl. Data Eng. 32(11), 2115–2128 (2019)
Hu, L., Chen, Q., Qiao, L., Du, L., Ye, R.: Automatic detection of melanins and sebums from skin images using a generative adversarial network. Cognitive Computation, pp. 1–10 (2021)
Hu, L., Hu, P., Luo, X., Yuan, X., You, Z.H.: Incorporating the coevolving information of substrates in predicting hiv-1 protease cleavage sites. IEEE/ACM Trans. Comput. Biol. Bioinf. 17(6), 2017–2028 (2019)
Hu, L., Pan, X., Yan, H., Hu, P., He, T.: Exploiting higher-order patterns for community detection in attributed graphs. Integrated Computer-Aided Engineering (Preprint), 1–12 (2020)
Hu, L., Wang, X., Huang, Y.A., Hu, P., You, Z.H.: A survey on computational models for predicting protein–protein interactions. Briefings in Bioinformatics (2021)
Hu, L., Yang, S., Luo, X., Zhou, M.: An algorithm of inductively identifying clusters from attributed graphs. IEEE Trans. Big Data (2020)
Hu, L., Yuan, X., Liu, X., Xiong, S., Luo, X.: Efficiently detecting protein complexes from protein interaction networks via alternating direction method of multipliers. IEEE/ACM Trans. Comput. Biol. Bioinf. 16(6), 1922–1935 (2018)
Hu, L., Zhang, J., Pan, X., Yan, H., You, Z.H.: HISCF: leveraging higher-order structures for clustering analysis in biological networks. Bioinformatics (2020)
Kontijevskis, A., Wikberg, J.E., Komorowski, J.: Computational proteomics analysis of hiv-1 protease interactome. Proteins: Structure Function Bioinform. 68(1), 305–312 (2007)
Li, X., Hu, H., Shu, L.: Predicting human immunodeficiency virus protease cleavage sites in nonlinear projection space. Mol. Cell. Biochem. 339(1), 127–133 (2010)
Li, Z., Hu, L.: The identification of variable-length coevolutionary patterns for predicting hiv-1 protease cleavage sites. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 4192–4197. IEEE (2020).
Li, Z., Hu, L., Tang, Z., Zhao, C.: Predicting hiv-1 protease cleavage sites with positive-unlabeled learning. Front. Genet. 12, 456 (2021)
Luo, X., Zhou, Y., Liu, Z., Hu, L., Zhou, M.: Generalized nesterov’s accelerationincorporated non-negative and adaptive latent factor analysis. IEEE Trans Services Comput. (2021)
Martin, M.P., et al.: Epistatic interaction between kir3ds1 and hla-b delays the progression to aids. Nat. Genet. 31(4), 429–434 (2002)
Narayanan, A., Wu, X., Yang, Z.R.: Mining viral protease data to extract cleavage knowledge. Bioinformatics 18(suppl_1), S5–S13 (2002)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
R¨ognvaldsson, T., You, L.: Why neural networks should not be used for hiv-1 protease cleavage site prediction. Bioinformatics 20(11), 1702–1709 (2004)
R¨ognvaldsson, T., You, L., Garwicz, D.: State of the art prediction of hiv-1 protease cleavage sites. Bioinformatics 31(8), 1204–1210 (2015)
Sadiq, S.K., Noé, F., De Fabritiis, G.: Kinetic characterization of the critical step in hiv-1 protease maturation. Proc. Natl. Acad. Sci. 109(50), 20449–20454 (2012)
Shen, H.B., Chou, K.C.: Hivcleave: a web-server for predicting human immunodeficiency virus protease cleavage sites in proteins. Anal. Biochem. 375(2), 388–390 (2008)
Song, J., et al.: Prosperous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy. Bioinformatics 34(4), 684–687 (2018)
Song, J., et al.: iprot-sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief. Bioinform. 20(2), 638–658 (2019)
Sundquist, W.I., Kräusslich, H.G.: Hiv-1 assembly, budding, and maturation. Cold Spring Harbor perspectives in medicine 2(7), a006924 (2012)
Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1088–1099 (2006)
Thompson, T.B., Chou, K.C., Zheng, C.: Neural network prediction of the hiv-1 protease cleavage sites. J. Theor. Biol. 177(4), 369–379 (1995)
Travers, S.A., Tully, D.C., McCormack, G.P., Fares, M.A.: A study of the coevolutionary patterns operating within the env gene of the hiv-1 group m subtypes. Mol. Biol. Evol. 24(12), 2787–2801 (2007)
Wang, X., Hu, P., Hu, L.: A novel stochastic block model for network-based prediction of protein-protein interactions. In: International Conference on Intelligent Computing, pp. 621–632. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60802-6_54
Acknowledgements
This work has been supported by the National Natural Science Foundation of China [grant number 61602352] and the Pioneer Hundred Talents Program of Chinese Academy of Sciences.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Z., Hu, P., Hu, L. (2021). An Ensemble Learning Algorithm for Predicting HIV-1 Protease Cleavage Sites. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12838. Springer, Cham. https://doi.org/10.1007/978-3-030-84532-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-84532-2_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84531-5
Online ISBN: 978-3-030-84532-2
eBook Packages: Computer ScienceComputer Science (R0)