Abstract
Metagenomics is one of the emerging concepts in personalized medicine approaches to take care of and improve human health. Numerous studies have revealed that metagenomic data can associate with a vast of human diseases. Recent advancements in machine learning techniques and computation resources enable us to speed up the data processing and also improve the performance in diagnosis accuracy. However, we face difficulties to process metagenomic data due to its complexities and high dimension. This work proposes an approach based on an explanation model to perform feature selection tasks. The proposed approach selects a small set of features from the original features with Interpretable Model-agnostic Explanations that can obtain better performances than feature selection based on importance scores generated from a robust learning machine learning such as Random Forests. We are expected that the approach can be an efficient feature selection method compared to classic feature selection techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gilbert, J.A., Quinn, R.A., Debelius, J., et al.: Microbiome-wide association studies link dynamic microbial consortia to disease. Nature 535(7610), 94–103 (2016). https://doi.org/10.1038/nature18850
Petrosino, J.F.: The microbiome in precision medicine: the way forward. Genome Med. 10, 12 (2018). https://doi.org/10.1186/s13073-018-0525-6
Udugama, B., et al.: Diagnosing COVID-19: the disease and tools for detection. ACS Nano 14(4), 3822–3835 (2020)
Do, T.H., et al.: Mining biomass-degrading genes through illumina-based de novo sequencing and metagenomic analysis of free-living bacteria in the gut of the lower termite Coptotermes gestroi harvested in Vietnam. J. Biosci. Bioeng. 118(6), 665–671 (2014). https://doi.org/10.1016/j.jbiosc.2014.05.010
Chroneos, Z.C.: Metagenomics: theory, methods, and applications. Hum. Genomics 4(4), 282–283 (2010). https://doi.org/10.1186/1479-7364-4-4-282
Handelsman, J.: Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev. 68(4), 669–685 (2004). https://doi.org/10.1128/MMBR.68.4.669-685.2004
Ma, B., France, M., Ravel, J.: Meta-pangenome: at the crossroad of pangenomics and metagenomics. Pangenome 205, 205–218 (2020). https://doi.org/10.1007/978-3-030-38281-0_9
Jang, S.J., Ho, P.T., Jun, S.Y., Kim, D., Won, Y.J.: Dataset supporting description of the new mussel species of genus Gigantidas (Bivalvia: Mytilidae) and metagenomic data of bacterial community in the host mussel gill tissue. Data Brief 2020(30), 105651 (2020). https://doi.org/10.1016/j.dib.2020.105651
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1135–1144. ACM (2016)
Auslander, N., et al.: Seeker: alignment-free identification of bacteriophage genomes by deep learning. bioRxiv (2020). https://doi.org/10.1101/2020.04.04.025783
Ren, J., et al.: Identifying viruses from metagenomic data using deep learning. Quant. Biol. 8(1), 64–77 (2020). https://doi.org/10.1007/s40484-019-0187-4
Garretto, A., Hatzopoulos, T., Putonti, C.: virMine: automated detection of viral sequences from complex metagenomic samples. PeerJ 7, e6695 (2019). https://doi.org/10.7717/peerj.6695
Tran, P.Q., Trieu, N.T., Dao, N.V., Nguyen, H.T., Huynh, H.X.: Effective opinion words extraction for food reviews classification. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 11(7), 421–426 (2020). https://doi.org/10.14569/IJACSA.2020.0110755
Li, B.Q., Cai, Y.D., Feng, K.Y., Zhao, G.J.: Prediction of protein cleavage site with feature selection by random forest. PLoS One 7(9), e45854 (2012). https://doi.org/10.1371/journal.pone.0045854
Feng, Q., et al.: Gut microbiome development along the colorectal adenoma-carcinoma sequence. Nat. Commun. 11(6), 6528 (2015). https://doi.org/10.1038/ncomms7528. PMID: 25758642
Vogtmann, E., et al.: Colorectal cancer and the human gut microbiome: reproducibility with whole-genome shotgun sequencing. PLoS One 11(5), e0155362 (2016). https://doi.org/10.1371/journal.pone.0155362. PMID: 27171425; PMCID: PMC4865240
Yu, J., et al.: Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer. Gut 66(1), 70–78 (2015). https://doi.org/10.1136/gutjnl-2015-309800. PMID: 26408641
Zeller, G., et al.: Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol. 10(11), 766 (2014). https://doi.org/10.15252/msb.20145645. PMID: 25432777; PMCID: PMC4299606
Ribeiro, M., Singh, S., Guestrin, C.: Local Interpretable Model-Agnostic Explanations (LIME): An Introduction. O’Reilly Media, Newton (2016). https://www.oreilly.com/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Thanh-Hai, N., Tran, T.B., Tran, A.C., Thai-Nghe, N. (2020). Feature Selection Using Local Interpretable Model-Agnostic Explanations on Metagenomic Data. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2020. Communications in Computer and Information Science, vol 1306. Springer, Singapore. https://doi.org/10.1007/978-981-33-4370-2_24
Download citation
DOI: https://doi.org/10.1007/978-981-33-4370-2_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4369-6
Online ISBN: 978-981-33-4370-2
eBook Packages: Computer ScienceComputer Science (R0)