Skip to main content

An Ensemble Learning Algorithm for Predicting HIV-1 Protease Cleavage Sites

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12838))

Included in the following conference series:

  • 1373 Accesses

Abstract

Understanding the substrate specificity of human immunodeficiency virus 1 (HIV-1) protease plays a significance role in the design of effective HIV-1 protease inhibitors. During the past two decades, a variety of machine learning models have been developed to predict the existence of HIV-1 protease cleavage sites. However, since the acquisition of cleavable octapeptides requires expensive and time-consuming experiments, and uncleavable octapeptides are usually generated by artificial strategies, the number of cleavable octapeptides in the existing data set is far less than that of uncleavable octapeptides. This phenomenon of unbalanced datasets may cause the prediction performance of the classification model to be inaccurate. In this work, we combine the idea of asymmetric bagging and the support vector machine (SVM) classifier to propose an ensemble learning algorithm, namely AB-HIV, for an effective treat the dataset imbalance problem in predict HIV-1 protease cleavage sites. In order to make full use of the information of the substrate sequence, AB-HIV uses three different coding schemes (amino acid identities, chemical properties and variable length coevolutionary patterns) to construct the feature vector. By using asymmetric bagging to resample a set of balanced training subsets from the training set, and then a set of SVM classifiers can be built for integration to complete the prediction task. Experiments on three independent benchmark datasets indicate that the proposed ensemble learning method outperforms the existing prediction methods in terms of AUC, PR AUC and F-measure evaluation criteria. Therefore, AB-HIV can be regarded as an effective method to deal with the dataset imbalance problem in predict HIV-1 protease cleavage sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    Google Scholar 

  2. Cai, Y.D., Chou, K.C.: Artificial neural network model for predicting hiv protease cleavage sites in protein. Adv. Eng. Softw. 29(2), 119–128 (1998)

    Article  Google Scholar 

  3. Cai, Y.D., Liu, X.J., Xu, X.B., Chou, K.C.: Support vector machines for predicting hiv protease cleavage sites in protein. J. Comput. Chem. 23(2), 267–274 (2002)

    Article  Google Scholar 

  4. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    Google Scholar 

  5. Dang, T.H., Van Leemput, K., Verschoren, A., Laukens, K.: Prediction of kinasespecific phosphorylation sites using conditional random fields. Bioinformatics 24(24), 2857–2864 (2008)

    Article  Google Scholar 

  6. Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)

    Google Scholar 

  7. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml 7(1) (2019)

  8. Gök, M., Ozcerit, A.T.: A new feature encoding scheme for hiv-1 protease cleavage ¨ site prediction. Neural Comput. Appl. 22(7), 1757–1761 (2013)

    Google Scholar 

  9. Hu, L., Chan, K.C., Yuan, X., Xiong, S.: A variational bayesian framework for cluster analysis in a complex network. IEEE Trans. Knowl. Data Eng. 32(11), 2115–2128 (2019)

    Article  Google Scholar 

  10. Hu, L., Chen, Q., Qiao, L., Du, L., Ye, R.: Automatic detection of melanins and sebums from skin images using a generative adversarial network. Cognitive Computation, pp. 1–10 (2021)

    Google Scholar 

  11. Hu, L., Hu, P., Luo, X., Yuan, X., You, Z.H.: Incorporating the coevolving information of substrates in predicting hiv-1 protease cleavage sites. IEEE/ACM Trans. Comput. Biol. Bioinf. 17(6), 2017–2028 (2019)

    Article  Google Scholar 

  12. Hu, L., Pan, X., Yan, H., Hu, P., He, T.: Exploiting higher-order patterns for community detection in attributed graphs. Integrated Computer-Aided Engineering (Preprint), 1–12 (2020)

    Google Scholar 

  13. Hu, L., Wang, X., Huang, Y.A., Hu, P., You, Z.H.: A survey on computational models for predicting protein–protein interactions. Briefings in Bioinformatics (2021)

    Google Scholar 

  14. Hu, L., Yang, S., Luo, X., Zhou, M.: An algorithm of inductively identifying clusters from attributed graphs. IEEE Trans. Big Data (2020)

    Google Scholar 

  15. Hu, L., Yuan, X., Liu, X., Xiong, S., Luo, X.: Efficiently detecting protein complexes from protein interaction networks via alternating direction method of multipliers. IEEE/ACM Trans. Comput. Biol. Bioinf. 16(6), 1922–1935 (2018)

    Article  Google Scholar 

  16. Hu, L., Zhang, J., Pan, X., Yan, H., You, Z.H.: HISCF: leveraging higher-order structures for clustering analysis in biological networks. Bioinformatics (2020)

    Google Scholar 

  17. Kontijevskis, A., Wikberg, J.E., Komorowski, J.: Computational proteomics analysis of hiv-1 protease interactome. Proteins: Structure Function Bioinform. 68(1), 305–312 (2007)

    Google Scholar 

  18. Li, X., Hu, H., Shu, L.: Predicting human immunodeficiency virus protease cleavage sites in nonlinear projection space. Mol. Cell. Biochem. 339(1), 127–133 (2010)

    Article  Google Scholar 

  19. Li, Z., Hu, L.: The identification of variable-length coevolutionary patterns for predicting hiv-1 protease cleavage sites. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 4192–4197. IEEE (2020).

    Google Scholar 

  20. Li, Z., Hu, L., Tang, Z., Zhao, C.: Predicting hiv-1 protease cleavage sites with positive-unlabeled learning. Front. Genet. 12, 456 (2021)

    Google Scholar 

  21. Luo, X., Zhou, Y., Liu, Z., Hu, L., Zhou, M.: Generalized nesterov’s accelerationincorporated non-negative and adaptive latent factor analysis. IEEE Trans Services Comput. (2021)

    Google Scholar 

  22. Martin, M.P., et al.: Epistatic interaction between kir3ds1 and hla-b delays the progression to aids. Nat. Genet. 31(4), 429–434 (2002)

    Article  Google Scholar 

  23. Narayanan, A., Wu, X., Yang, Z.R.: Mining viral protease data to extract cleavage knowledge. Bioinformatics 18(suppl_1), S5–S13 (2002)

    Google Scholar 

  24. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    Google Scholar 

  25. R¨ognvaldsson, T., You, L.: Why neural networks should not be used for hiv-1 protease cleavage site prediction. Bioinformatics 20(11), 1702–1709 (2004)

    Google Scholar 

  26. R¨ognvaldsson, T., You, L., Garwicz, D.: State of the art prediction of hiv-1 protease cleavage sites. Bioinformatics 31(8), 1204–1210 (2015)

    Google Scholar 

  27. Sadiq, S.K., Noé, F., De Fabritiis, G.: Kinetic characterization of the critical step in hiv-1 protease maturation. Proc. Natl. Acad. Sci. 109(50), 20449–20454 (2012)

    Google Scholar 

  28. Shen, H.B., Chou, K.C.: Hivcleave: a web-server for predicting human immunodeficiency virus protease cleavage sites in proteins. Anal. Biochem. 375(2), 388–390 (2008)

    Article  Google Scholar 

  29. Song, J., et al.: Prosperous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy. Bioinformatics 34(4), 684–687 (2018)

    Article  Google Scholar 

  30. Song, J., et al.: iprot-sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief. Bioinform. 20(2), 638–658 (2019)

    Article  Google Scholar 

  31. Sundquist, W.I., Kräusslich, H.G.: Hiv-1 assembly, budding, and maturation. Cold Spring Harbor perspectives in medicine 2(7), a006924 (2012)

    Google Scholar 

  32. Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1088–1099 (2006)

    Article  Google Scholar 

  33. Thompson, T.B., Chou, K.C., Zheng, C.: Neural network prediction of the hiv-1 protease cleavage sites. J. Theor. Biol. 177(4), 369–379 (1995)

    Article  Google Scholar 

  34. Travers, S.A., Tully, D.C., McCormack, G.P., Fares, M.A.: A study of the coevolutionary patterns operating within the env gene of the hiv-1 group m subtypes. Mol. Biol. Evol. 24(12), 2787–2801 (2007)

    Article  Google Scholar 

  35. Wang, X., Hu, P., Hu, L.: A novel stochastic block model for network-based prediction of protein-protein interactions. In: International Conference on Intelligent Computing, pp. 621–632. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60802-6_54

Download references

Acknowledgements

This work has been supported by the National Natural Science Foundation of China [grant number 61602352] and the Pioneer Hundred Talents Program of Chinese Academy of Sciences.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lun Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Z., Hu, P., Hu, L. (2021). An Ensemble Learning Algorithm for Predicting HIV-1 Protease Cleavage Sites. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12838. Springer, Cham. https://doi.org/10.1007/978-3-030-84532-2_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-84532-2_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-84531-5

  • Online ISBN: 978-3-030-84532-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics