Skip to main content

Identification of Active and Binding Sites with Multi-dimensional Feature Vectors and K-Nearest Neighbor Classification Algorithm

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14088))

Included in the following conference series:

  • 751 Accesses

Abstract

Helicobacter pylori is a pathogenic and carcinogenic bacterium, mainly living in the stomach and duodenum, and has been declared a prokaryotic carcinogen by the World Health Organization. The control of gastric cancer has attracted increasing attention. Studying the binding reactions of substrates at different protein sites will help to understand the relationship between protein structure and function, and pave the way for future research on the pathogenesis of Helicobacter pylori and the development of protein-targeted drugs. This paper provides a new identification method for predicting protein sites. It wants to classify the active sites and binding sites of proteins based on a K-nearest neighbor classification method by learning the multi-dimensional features of protein sites. First, the protein information of Helicobacter pylori is retrieved, and the Active_site and Binding_site sites are obtained from the existing database. Then, the protein fragment sequences adjacent to the sites are intercepted, and the protein sequences are analyzed by a custom correlation function to obtain feature vectors with the same length. After that, supervised learning will be used. For the n-dimensional vector input after the transformation, the machine learning KNN classification algorithm is used to perform the corresponding kd-tree optimization, and the NCA algorithm is introduced to automatically learn the distance measurement and complete the dimensionality reduction. The accuracy rate in the test set reaches 84.2, which is 6.5 higher than the traditional gradient boosting tree algorithm (GBDT). It is shown that this classification method is much better than previous classifiers and can make the binding site of proteins more effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ding, H., Liu, L., Guo, F.-B., Huang, J., Lin, H.: Identify Golgi protein types with modified mahalanobis discriminant algorithm and pseudo amino acid composition. Protein Peptide Lett. 18(1), 58–63 (2011)

    Google Scholar 

  2. Zeng, X., Lin, W., Guo, M., Zou, Q.: A comprehensive overview and evaluation of circular RNA detection tools. PLoS Comput. Biol. 13(6), e1005420 (2017)

    Article  Google Scholar 

  3. Yu, B., et al.: SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting. Bioinformatics 36(4), 1074–1081 (2020)

    Article  Google Scholar 

  4. Savojardo, C., Bruciaferri, N., Tartari, G., Martelli, P.L., Casadio, R.: DeepMito: accurate prediction of protein sub-mitochondrial localization using convolutional neural networks. Bioinformatics 36(1), 56–64 (2020)

    Article  Google Scholar 

  5. Heinzinger, M., et al.: Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinform. 20(1), 1–17 (2019)

    Article  Google Scholar 

  6. Trompier, D., et al.: Brain peroxisomes. Biochimie 98, 102–110 (2014)

    Article  Google Scholar 

  7. Cai, M., et al.: Disruption of peroxisome function leads to metabolic stress, mTOR inhibition, and lethality in liver cancer cells. Cancer Lett. 421, 82–93 (2018)

    Article  Google Scholar 

  8. Qiu, W., et al.: Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition. J. Theor. Biol. 450, 86–103 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  9. Sampaio, P.N., Cunha, B., Rosa, F., Sales, K., Lopes, M., Calado, C.R.C.: Molecular fingerprint of human gastric cell line infected by Helicobacter pylori. In: 2015 IEEE 4th Portuguese Meeting on Bioengineering (ENBENG), Porto, Portugal, pp. 1-5 (2015)

    Google Scholar 

  10. Runhong, M., Shihe, S., Fang, M.: Construction and identification of hp0532 gene mutant in Helicobacter pylori Cag-PAI. In: Proceedings 2011 International Conference on Human Health and Biomedical Engineering, Jilin, China, pp. 280–284 (2011)

    Google Scholar 

  11. Gunasundari, R., Thara, L.: Helicobacter pylori infection and associated stomach diseases: comparative data mining approaches for diagnosis and prevention measures. In: 2016 IEEE International Conference on Advances in Computer Applications (ICACA), Coimbatore, India, pp. 9–13 (2016)

    Google Scholar 

  12. Song, T., Rodríguez-Patón, A., Zheng, P., Zeng, X.: Spiking neural P systems with colored spikes. IEEE Trans. Cogn. Dev. Syst. 10(4), 1106–1115 (2017)

    Article  Google Scholar 

  13. Morgat, A., et al.: Enzyme annotation in UniProtKB using Rhea. Bioinformatics 36(6), 1896–1901 (2020)

    Article  Google Scholar 

  14. Zeng, X., Lin, W., Guo, M., Zou, Q.: A comprehensive overview and evaluation of circular RNA detection tools. PLoScomputational biology 13(6), e1005420 (2017)

    Google Scholar 

  15. Li, W., Godzik, A.: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13), 1658–1659 (2006)

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China (Grant No. 61902337), Xuzhou Science and Technology Plan Project (KC21047), Jiangsu Provincial Natural Science Foundation (No. SBK2019040953), Natural Science Fund for Colleges and Universities in Jiangsu Province (No. 19KJB520016) and Young Talents of Science and Technology in Jiangsu and ghfund202302026465.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhuo Wang or Wenzheng Bao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, B., Wang, Z., Bao, W., Cheng, H. (2023). Identification of Active and Binding Sites with Multi-dimensional Feature Vectors and K-Nearest Neighbor Classification Algorithm. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science, vol 14088. Springer, Singapore. https://doi.org/10.1007/978-981-99-4749-2_51

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-4749-2_51

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-4748-5

  • Online ISBN: 978-981-99-4749-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics