skip to main content
10.1145/3674658.3674675acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbbtConference Proceedingsconference-collections
research-article

Predlectins-MLP: an improved predictor of cancer-lectins using mixed features

Published: 18 November 2024 Publication History

Abstract

Cancer-lectins play a key role in the differentiation and metastasis of cancer cells. The correct identification of cancer-lectins provides important information for the application of these proteins in cancer treatment. In this study, we propose an improved predictor of cancer-lectins based on mixed features. Firstly, protein sequences were encoded with mixed features of PseAAC, CKSAAP and CTriad. PseAAC was used to characterize the physicochemical properties and relevance of amino acids, the CKSAAP is used to indicate the order and position of amino acids, the CTriad was used to represent the physicochemical properties of the adjacent three amino acids. Secondly, the PCA algorithm was used to perform the dimensionality reduction operation on the extracted features. Finally, a deep learning model was used to trained on the training data, and a prediction model was established. In 5-fold cross-validation, our model (called Predlectins-MLP) achieved an accuracy of 96.04%, and has better performance than the existing methods.

References

[1]
Ocansey, D. K. W., Qian, F., Cai, P., Ocansey, S., Amoah, S., Qian, Y., & Mao, F. (2024). Current evidence and therapeutic implication of PANoptosis in cancer. Theranostics, 14(2), 640
[2]
Chung, D. C., Gray, D. M., Singh, H., Issaka, R. B., Raymond, V. M., Eagle, C., ... & Grady, W. M. (2024). A cell-free DNA blood-based test for colorectal cancer screening. New England Journal of Medicine, 390(11), 973-983.
[3]
Swartz, M. A. (2001). The physiology of the lymphatic system. Advanced drug delivery reviews, 50(1-2):3-20
[4]
Sikander, R., Ghulam, A., Hassan, J., Rehman, L., Jabeen, N., & Iqbal, N. (2023). Identification of cancerlectin proteins using hyperparameter optimization in deep learning and DDE profiles. Mehran University Research Journal Of Engineering & Technology, 42(4), 28-40
[5]
Kumar, R., Panwar, B., Chauhan, J. S., & Raghava, G. P. (2011). Analysis and prediction of cancerlectins using evolutionary and domain information. BMC research notes, 4:1-9.
[6]
Yuan, D., & Wang, X. . (2024). Improved svm algorithm financial management model for data mining. Journal of Information & Knowledge Management, 23(01).
[7]
Lin, H., Liu, W. X., He, J., Liu, X. H., Ding, H., & Chen, W. (2015). Predicting cancerlectins by the optimal g-gap dipeptides. Scientific reports, 5(1):16964.
[8]
St, L., & Wold, S. (1989). Analysis of variance (ANOVA). Chemometrics and intelligent laboratory systems, 6(4): 259-272.
[9]
Zhang, J., Ju, Y., Lu, H., Xuan, P., & Zou, Q. (2016). Accurate identification of cancerlectins through hybrid machine learning technology. International journal of genomics, 2016.
[10]
Yang, R., Liu, J., & Zhang, L. (2023). ECAmyloid: An amyloid predictor based on ensemble learning and comprehensive sequence-derived features. Computational Biology and Chemistry, 104, 107853.
[11]
Samman, N., Mohabatkar, H., & Rabiei, P. (2023). Using several pseudo amino acid composition types and different machine learning algorithms to classify and predict archaeal phospholipases. Molecular Biology Research Communications, 12(3), 117.
[12]
Lai, H. Y., Chen, X. X., Chen, W., Tang, H., & Lin, H. (2017). Sequence-based predictive modeling to identify cancer-lectins. Oncotarget, 8(17): 28169.
[13]
Yang, R., Zhang, C., Zhang, L., & Gao, R. (2018). A two-step feature selection method to predict cancerlectins by multiview features and synthetic minority oversampling technique. BioMed research international, 2018.
[14]
Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology, 292(2):195-202.
[15]
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16: 321-357.
[16]
Hu, J., & Szymczak, S. (2023). A review on longitudinal data analysis with random forest. Briefings in Bioinformatics, 24(2), bbad002
[17]
Zuo, Y., Jia, C., Li, T., & Chen, Y. (2018). Identification of cancerlectins by split bi-profile Bayes feature extraction. Current Proteomics, 15(3): 196-200.
[18]
Yang, L., Gao, H., Wu, K., Zhang, H., Li, C., & Tang, L. (2020). Identification of cancerlectins by using cascade linear discriminant analysis and optimal g-gap tripeptide composition. Current Bioinformatics, 15(6): 528-537.
[19]
Charoenkwan, P., Chumnanpuen, P., Schaduangrat, N., Oh, C., Manavalan, B., & Shoombuatong, W. (2023). PSRQSP: An effective approach for the interpretable prediction of quorum sensing peptide using propensity score representation learning. Computers in Biology and Medicine, 158, 106784.
[20]
Lu, S., Ding, Y., Liu, M., Yin, Z., Yin, L., & Zheng, W. (2023). Multiscale feature extraction and fusion of image and text in VQA. International Journal of Computational Intelligence Systems, 16(1), 54.
[21]
Uddin, M. P., Mamun, M. A., & Hossain, M. A. (2021). PCA-based feature reduction for hyperspectral remote sensing image classification. IETE Technical Review, 38(4): 377-396.
[22]
Sathish, T., Sunagar, P., Singh, V., Boopathi, S., Al-Enizi, A. M., Pandit, B., ... & Sehgal, S. S. (2023). Characteristics estimation of natural fibre reinforced plastic composites using deep multi-layer perceptron (MLP) technique. Chemosphere, 337, 139346.
[23]
Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2021). A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems, 33(12): 6999-7019.
[24]
Wei, L., Liao, M., Gao, Y., Ji, R., He, Z., & Zou, Q. (2013). Improved and promising identification of human microRNAs by incorporating a high-quality negative set. IEEE/ACM transactions on computational biology and bioinformatics, 11(1): 192-201.
[25]
Wei, L., Tang, J., & Zou, Q. (2017). Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information. Information Sciences, 384: 135-144.
[26]
Wei, L., Xing, P., Shi, G., Ji, Z., & Zou, Q. (2017). Fast prediction of protein methylation sites using a sequence-based feature selection technique. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(4): 1264-1273.
[27]
Wang, Y., Zhai, Y., Ding, Y., & Zou, Q. (2023). SBSM-Pro: support bio-sequence machine for proteins. arXiv preprint arXiv:2308.10275.
[28]
Muschelli III, J. (2020). ROC and AUC with a binary predictor: a potentially misleading metric. Journal of classification, 37(3): 696-708.
[29]
Wang, Q. Q., Yu, S. C., Qi, X., Hu, Y. H., Zheng, W. J., Shi, J. X., & Yao, H. Y. (2019). Overview of logistic regression model analysis and application. Zhonghua yu fang yi xue za zhi [Chinese journal of preventive medicine], 53(9): 955-960.
[30]
Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01): 20-28.
[31]
Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4(2):1883.
[32]
Hornyák, O., & Iantovics, L. B. (2023). AdaBoost Algorithm Could Lead to Weak Results for Data with Certain Characteristics. Mathematics, 11(8), 1801.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICBBT '24: Proceedings of the 2024 16th International Conference on Bioinformatics and Biomedical Technology
May 2024
279 pages
ISBN:9798400717666
DOI:10.1145/3674658
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 November 2024

Check for updates

Author Tags

  1. CKSAAP
  2. Cancer-lectins
  3. Machine learning
  4. Protein classification

Qualifiers

  • Research-article

Conference

ICBBT 2024

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 19
    Total Downloads
  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)12
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media