Skip to main content

An Ensemble-Based Phenotype Classifier to Diagnose Crohn’s Disease from 16s rRNA Gene Sequences

  • Conference paper
  • First Online:
Pattern Recognition and Image Analysis (IbPRIA 2023)

Abstract

In the past few years, one area of bioinformatics that has sparked special interest is the classification of diseases using machine learning. This is especially challenging in solving the classification of dysbiosis-based diseases, i.e., diseases caused by an imbalance in the composition of the microbial community. In this work, a curated pipeline is followed for classifying phenotypes using 16S rRNA gene amplicons, focusing on Crohn’s disease. It aims to reduce the dimensionality of data through a feature selection step, decreasing the computational cost, and maintaining an acceptably high f1-score. From this study, an ensemble model is proposed to contain the best-performing techniques from several representative machine learning algorithms. High f1-scores of up to 0.81 were reached thanks to this ensemble joining multilayer perceptron, extreme gradient boosting, and support vector machines, with as low as 300 target number of features. The results achieved were similar to or even better than other works studying the same data, so we demonstrated the goodness of our method.

This work has received financial support from Instituto de Salud Carlos III (Spain) (PI21/00588), the Xunta de Galicia - Consellería de Cultura, Educación e Universidade (Centro de investigación de Galicia accreditation 2019–2022 ED431G-2019/04, Reference Competitive Group accreditation 2021–2024, GRC2021/48, Group with Growth Potential accreditation 2020–2022 GPC2020/27 and L Vázquez-González support ED481A-2021) and the European Union (European Regional Development Fund-ERDF).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.ncbi.nlm.nih.gov/bioproject/PRJEB13679.

References

  1. Asgari, E., Garakani, K., McHardy, A.C., Mofrad, M.R.K.: MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples. Bioinformatics 34(13), i32–i42 (2018)

    Article  Google Scholar 

  2. Callahan, B.J., McMurdie, P.J., Holmes, S.P.: Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11(12), 2639–2643 (2017)

    Article  Google Scholar 

  3. Callahan, B.J., McMurdie, P.J., Rosen, Michael Jand Han, A.W., Johnson, A.J.A., Holmes, S.P.: DADA2: high-resolution sample inference from illumina amplicon data. Nat. Meth. 13(7), 581–583 (2016)

    Google Scholar 

  4. Edgar, R.C., Flyvbjerg, H.: Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics 31(21), 3476–3482 (2015)

    Article  Google Scholar 

  5. Gevers, D., et al.: The treatment-Naive microbiome in new-onset Crohn’s disease. Cell Host Microbe 15(3), 382–392 (2014)

    Article  Google Scholar 

  6. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)

    Article  MATH  Google Scholar 

  7. Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12), 550 (2014)

    Article  Google Scholar 

  8. Paulson, J.N., Stine, O.C., Bravo, H.C., Pop, M.: Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10(12), 1200–1202 (2013)

    Article  Google Scholar 

  9. Rajendhran, J., Gunasekaran, P.: Microbial phylogeny and diversity: small subunit ribosomal RNA sequence analysis and beyond. Microbiol. Res. 166(2), 99–110 (2011)

    Article  Google Scholar 

  10. Relvas, M.: Relationship between dental and periodontal health status and the salivary microbiome: bacterial diversity, co-occurrence networks and predictive models. Sci. Rep. 11(1), 929 (2021)

    Article  MathSciNet  Google Scholar 

  11. Uddin, S., Khan, A., Hossain, M.E., Moni, M.A.: Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inform. Decis. Mak. 19(1), 281 (2019)

    Article  Google Scholar 

  12. Yu, Z., Wang, K., Wan, Z., Xie, S., Lv, Z.: Popular deep learning algorithms for disease prediction: a review. Cluster Comput. 26, 1231–1251 (2022)

    Article  Google Scholar 

  13. Zhao, Z., Woloszynek, S., Agbavor, F., Mell, J.C., Sokhansanj, B.A., Rosen, G.L.: Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network. PLoS Comput. Biol. 17(9), 1–36 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lara Vázquez-González .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vázquez-González, L., Peña-Reyes, C., Balsa-Castro, C., Tomás, I., Carreira, M.J. (2023). An Ensemble-Based Phenotype Classifier to Diagnose Crohn’s Disease from 16s rRNA Gene Sequences. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36616-1_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36615-4

  • Online ISBN: 978-3-031-36616-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics