Skip to main content

Feature Selection Using Local Interpretable Model-Agnostic Explanations on Metagenomic Data

  • Conference paper
  • First Online:
Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications (FDSE 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1306))

Included in the following conference series:

  • 1240 Accesses

Abstract

Metagenomics is one of the emerging concepts in personalized medicine approaches to take care of and improve human health. Numerous studies have revealed that metagenomic data can associate with a vast of human diseases. Recent advancements in machine learning techniques and computation resources enable us to speed up the data processing and also improve the performance in diagnosis accuracy. However, we face difficulties to process metagenomic data due to its complexities and high dimension. This work proposes an approach based on an explanation model to perform feature selection tasks. The proposed approach selects a small set of features from the original features with Interpretable Model-agnostic Explanations that can obtain better performances than feature selection based on importance scores generated from a robust learning machine learning such as Random Forests. We are expected that the approach can be an efficient feature selection method compared to classic feature selection techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gilbert, J.A., Quinn, R.A., Debelius, J., et al.: Microbiome-wide association studies link dynamic microbial consortia to disease. Nature 535(7610), 94–103 (2016). https://doi.org/10.1038/nature18850

    Article  Google Scholar 

  2. Petrosino, J.F.: The microbiome in precision medicine: the way forward. Genome Med. 10, 12 (2018). https://doi.org/10.1186/s13073-018-0525-6

    Article  Google Scholar 

  3. Udugama, B., et al.: Diagnosing COVID-19: the disease and tools for detection. ACS Nano 14(4), 3822–3835 (2020)

    Article  Google Scholar 

  4. Do, T.H., et al.: Mining biomass-degrading genes through illumina-based de novo sequencing and metagenomic analysis of free-living bacteria in the gut of the lower termite Coptotermes gestroi harvested in Vietnam. J. Biosci. Bioeng. 118(6), 665–671 (2014). https://doi.org/10.1016/j.jbiosc.2014.05.010

    Article  Google Scholar 

  5. Chroneos, Z.C.: Metagenomics: theory, methods, and applications. Hum. Genomics 4(4), 282–283 (2010). https://doi.org/10.1186/1479-7364-4-4-282

    Article  Google Scholar 

  6. Handelsman, J.: Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev. 68(4), 669–685 (2004). https://doi.org/10.1128/MMBR.68.4.669-685.2004

    Article  Google Scholar 

  7. Ma, B., France, M., Ravel, J.: Meta-pangenome: at the crossroad of pangenomics and metagenomics. Pangenome 205, 205–218 (2020). https://doi.org/10.1007/978-3-030-38281-0_9

    Article  Google Scholar 

  8. Jang, S.J., Ho, P.T., Jun, S.Y., Kim, D., Won, Y.J.: Dataset supporting description of the new mussel species of genus Gigantidas (Bivalvia: Mytilidae) and metagenomic data of bacterial community in the host mussel gill tissue. Data Brief 2020(30), 105651 (2020). https://doi.org/10.1016/j.dib.2020.105651

    Article  Google Scholar 

  9. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  10. Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1135–1144. ACM (2016)

    Google Scholar 

  11. Auslander, N., et al.: Seeker: alignment-free identification of bacteriophage genomes by deep learning. bioRxiv (2020). https://doi.org/10.1101/2020.04.04.025783

  12. Ren, J., et al.: Identifying viruses from metagenomic data using deep learning. Quant. Biol. 8(1), 64–77 (2020). https://doi.org/10.1007/s40484-019-0187-4

    Article  MathSciNet  Google Scholar 

  13. Garretto, A., Hatzopoulos, T., Putonti, C.: virMine: automated detection of viral sequences from complex metagenomic samples. PeerJ 7, e6695 (2019). https://doi.org/10.7717/peerj.6695

    Article  Google Scholar 

  14. Tran, P.Q., Trieu, N.T., Dao, N.V., Nguyen, H.T., Huynh, H.X.: Effective opinion words extraction for food reviews classification. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 11(7), 421–426 (2020). https://doi.org/10.14569/IJACSA.2020.0110755

    Article  Google Scholar 

  15. Li, B.Q., Cai, Y.D., Feng, K.Y., Zhao, G.J.: Prediction of protein cleavage site with feature selection by random forest. PLoS One 7(9), e45854 (2012). https://doi.org/10.1371/journal.pone.0045854

    Article  Google Scholar 

  16. Feng, Q., et al.: Gut microbiome development along the colorectal adenoma-carcinoma sequence. Nat. Commun. 11(6), 6528 (2015). https://doi.org/10.1038/ncomms7528. PMID: 25758642

    Article  Google Scholar 

  17. Vogtmann, E., et al.: Colorectal cancer and the human gut microbiome: reproducibility with whole-genome shotgun sequencing. PLoS One 11(5), e0155362 (2016). https://doi.org/10.1371/journal.pone.0155362. PMID: 27171425; PMCID: PMC4865240

    Article  Google Scholar 

  18. Yu, J., et al.: Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer. Gut 66(1), 70–78 (2015). https://doi.org/10.1136/gutjnl-2015-309800. PMID: 26408641

    Article  Google Scholar 

  19. Zeller, G., et al.: Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol. 10(11), 766 (2014). https://doi.org/10.15252/msb.20145645. PMID: 25432777; PMCID: PMC4299606

    Article  Google Scholar 

  20. Ribeiro, M., Singh, S., Guestrin, C.: Local Interpretable Model-Agnostic Explanations (LIME): An Introduction. O’Reilly Media, Newton (2016). https://www.oreilly.com/

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nguyen Thanh-Hai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thanh-Hai, N., Tran, T.B., Tran, A.C., Thai-Nghe, N. (2020). Feature Selection Using Local Interpretable Model-Agnostic Explanations on Metagenomic Data. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2020. Communications in Computer and Information Science, vol 1306. Springer, Singapore. https://doi.org/10.1007/978-981-33-4370-2_24

Download citation

  • DOI: https://doi.org/10.1007/978-981-33-4370-2_24

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-33-4369-6

  • Online ISBN: 978-981-33-4370-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics