Skip to main content

Feature Selection Based on Ranking Metagenomic Relative Abundance for Inflammatory Bowel Disease Prediction

  • Conference paper
  • First Online:
Complex, Intelligent and Software Intensive Systems (CISIS 2024)

Abstract

Inflammatory bowel diseases can be severe, but with access to metagenomic data, we can diagnose them and take the necessary steps to prevent further complications. The key to identifying the composition in the human body that causes the disease is carefully selecting features from the metagenomic data. Our research has demonstrated that using the Random Forest machine learning technique to rank the relative abundance of features for disease prediction tasks is reliable. We have also discovered that selecting features ranging from 1 to 50 improves the accuracy of diagnosis. In addition, we have performed an intersection on the Top 10, 20, 30, 40, and 50 features to determine which ones appear in all datasets. Our experiments on six inflammatory bowel disease-related datasets have yielded better results than previous studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html.

References

  1. Wang, J., et al.: Biomaterials for inflammatory bowel disease: treatment, diagnosis and organoids. Appl. Mater. Today 36, 102078 (2024). https://doi.org/10.1016/j.apmt.2024.102078

    Article  MATH  Google Scholar 

  2. Dangi, P., et al.: Nanotechnology impacting probiotics and prebiotics: a paradigm shift in nutraceuticals technology. Int. J. Food Microbiol. 388, 110083 (2023). https://doi.org/10.1016/j.ijfoodmicro.2022.110083

    Article  MATH  Google Scholar 

  3. Luo, M., Zhang, X., Wu, J., Zhao, J.: Modifications of polysaccharide-based biomaterials under structure-property relationship for biomedical applications. Carbohyd. Polym. 266, 118097 (2021). https://doi.org/10.1016/j.carbpol.2021.118097

    Article  Google Scholar 

  4. Wang, H., Xu, Z., Li, Q., Wu, J.: Application of metal-based biomaterials in wound repair. Engineered Regeneration 2, 137–153 (2021). https://doi.org/10.1016/j.engreg.2021.09.005

    Article  MATH  Google Scholar 

  5. Devi, S.G., Fathima, A.A., Radha, S., Arunraj, R., Curtis, W.R., Ramya, M.: A rapid and economical method for efficient DNA extraction from diverse soils suitable for metagenomic applications. PLoS ONE 10(7), e0132441 (2015). https://doi.org/10.1371/journal.pone.0132441

    Article  Google Scholar 

  6. Hassan, M., Essam, T., Megahed, S.: Illumina sequencing and assessment of new cost-efficient protocol for metagenomic-DNA extraction from environmental water samples. Braz. J. Microbiol. 49, 1–8 (2018). https://doi.org/10.1016/j.bjm.2018.03.002

    Article  MATH  Google Scholar 

  7. Chandrasiri, S., Perera, T., Dilhara, A., Perera, I., Mallawaarachchi, V.: CH-Bin: a convex hull based approach for binning metagenomic contigs. Comput. Biol. Chem. 100, 107734 (2022). https://doi.org/10.1016/j.compbiolchem.2022.107734

    Article  MATH  Google Scholar 

  8. de Flamingh, A., et al.: Combining methods for non-invasive fecal DNA enables whole genome and metagenomic analyses in wildlife biology. Front. Genet. 13 (2023). https://doi.org/10.3389/fgene.2022.1021004

  9. Liu, D., et al.: Multicenter assessment of shotgun metagenomics for pathogen detection. eBioMedicine 74, 103649 (2021). https://doi.org/10.1016/j.ebiom.2021.103649

  10. Ma, J., Xu, F., Rong, X.: Discriminative multi-label feature selection with adaptive graph diffusion. Pattern Recogn. 148, 110154 (2024). https://doi.org/10.1016/j.patcog.2023.110154

    Article  MATH  Google Scholar 

  11. Zulfiker, M.S., Kabir, N., Biswas, A.A., Nazneen, T., Uddin, M.S.: An in-depth analysis of machine learning approaches to predict depression. Curr. Res. Behav. Sci. 2, 100044 (2021). https://doi.org/10.1016/j.crbeha.2021.100044

    Article  Google Scholar 

  12. Piernik, M., Morzy, T.: A study on using data clustering for feature extraction to improve the quality of classification. Knowl. Inf. Syst. 63(7), 1771–1805 (2021). https://doi.org/10.1007/s10115-021-01572-6

    Article  MATH  Google Scholar 

  13. Samareh-Jahani, M., Saberi-Movahed, F., Eftekhari, M., Aghamollaei, G., Tiwari, P.: Low-redundant unsupervised feature selection based on data structure learning and feature orthogonalization. Expert Syst. Appl. 240, 122556 (2024). https://doi.org/10.1016/j.eswa.2023.122556

    Article  Google Scholar 

  14. Hu, Y., et al.: A federated feature selection algorithm based on particle swarm optimization under privacy protection. Knowl.-Based Syst. 260, 110122 (2023). https://doi.org/10.1016/j.knosys.2022.110122

    Article  MATH  Google Scholar 

  15. Solorio-Fernández, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: Filter unsupervised spectral feature selection method for mixed data based on a new feature correlation measure. Neurocomputing 571, 127111 (2024). https://doi.org/10.1016/j.neucom.2023.127111

    Article  MATH  Google Scholar 

  16. Al-Ajlan, A., El Allali, A.: Feature selection for gene prediction in metagenomic fragments. BioData Min. 11(1) (2018). https://doi.org/10.1186/s13040-018-0170-z

  17. Qian, W., Xiong, Y., Ding, W., Huang, J., Vong, C.M.: Label correlations-based multi-label feature selection with label enhancement. Eng. Appl. Artif. Intell. 127, 107310 (2024). https://doi.org/10.1016/j.engappai.2023.107310

    Article  MATH  Google Scholar 

  18. He, Z., Lin, Y., Wang, C., Guo, L., Ding, W.: Multi-label feature selection based on correlation label enhancement. Inf. Sci. 647, 119526 (2023). https://doi.org/10.1016/j.ins.2023.119526

    Article  MATH  Google Scholar 

  19. Fioravanti, D., Giarratano, Y., Maggio, V., Agostinelli, C., Chierici, M., Jurman, G., Furlanello, C.: Phylogenetic convolutional neural networks in metagenomics. BMC Bioinform. 19(S2) (2018). https://doi.org/10.1186/s12859-018-2033-5

  20. Srivastava, A., Kataria, A., Yadav, D.K., Han, I., Choi, E.H.: Interplay of alpha-synuclein pathology and gut microbiome in Parkinson’s disease, pp. 159–178. Elsevier (2022). https://doi.org/10.1016/B978-0-323-91313-3.00003-9

  21. Phan, N.Y.K., Nguyen, H.T.: Inflammatory bowel disease classification improvement with metagenomic data binning using mean-shift clustering, pp. 294–308. Springer, Singapore (2020). https://doi.org/10.1007/978-981-33-4370-2_21

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai Thanh Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, H.T.T., Le, H.N., Nguyen, H.T. (2024). Feature Selection Based on Ranking Metagenomic Relative Abundance for Inflammatory Bowel Disease Prediction. In: Barolli, L. (eds) Complex, Intelligent and Software Intensive Systems. CISIS 2024. Lecture Notes on Data Engineering and Communications Technologies, vol 87. Springer, Cham. https://doi.org/10.1007/978-3-031-70011-8_9

Download citation

Publish with us

Policies and ethics