Abstract
In the human body, where the greatest concentration of bacteria is the gastrointestinal tract, it is considered to be a diverse and complex microbial population, involving many different diseases. The development of metagenomics has many achievements in evolution and biodiversity. The application of machine learning algorithms to solve metagenomics problems has helped researchers make new advances in the field of personalized medicine, especially the diagnosis and improvement of human health people. In this study, we propose an unattended binning approach combined with Mean-shift algorithm to improve predictive performance. We performed on the Inflammatory Bowel Disease (IDB) dataset with 6 subclasses. This clustering method has improved results when applying deep learning techniques and shows the promising potential of data preprocessing methods when applied on different datasets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Dahlhamer, J.M., Zammitti, E.P., Ward, B.W., Wheaton, A.G., Croft, J.B.: Prevalence of inflammatory bowel disease among adults aged \(\ge \) 18 years - United States. MMWR Morb Mortal Wkly Rep 2016(65), 1166–1169 (2015). https://doi.org/10.15585/mmwr.mm6542a3
Andreani, J., Million, M., Baudoin, J., et al.: Klenkia terrae resistant to DNA extraction in germ-free mice stools illustrates the extraction pitfall faced by metagenomics. Sci. Rep. 10, 10228 (2020). https://doi.org/10.1038/s41598-020-66627-0
Reiman, D., Metwally, A.A., Dai, Y.: PopPhy-CNN: Ation Neural Networks for Metage- nomic D Phylogenetic Tree Embedded Architecture for Convoluata, (2018). https://doi.org/10.1101/257931
Anna, P.C., Will, P.M.R., Martyn, W., Edward, O.P.-K.: A Fast Machine Learning Workflow for Rapid Phenotype Prediction from Whole Shotgun Metagenomes. vol. 33, No. 01: AAAI-19, IAAI-19, EAAI-20, (2019). https://doi.org/10.1609/aaai.v33i01.33019434
Nathan, L., Chelsea, J.-T., Ju, G.Z., Wei, W.: MetaPheno: a critical evaluation of deep learning and machine learning in metagenome-based disease prediction. Methods, vol. 166, pp. 74–82, ISSN 1046–2023 (2019). https://doi.org/10.1016/j.ymeth.2019.03.003
Harris, Z.N., Dhungel, E., Mosior, M., et al.: Massive metagenomic data analysis using abundance-based machine learning. Biol. Direct. 14, 12 (2019). https://doi.org/10.1186/s13062-019-0242-0
James, B.T., Luczak, B.B., Girgis, H.Z.: MeShClust: an intelligent tool for clustering DNA sequences. Nucleic Acids Res. 46(14), e83 (2018). https://doi.org/10.1093/nar/gky315
Barash, D., Comaniciu, D.: Meanshift clustering for DNA microarray analysis. In: Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference. CSB 2004, Stanford, CA, USA, 2004, pp. 578–579 (2004). https://doi.org/10.1109/CSB.2004.1332503
Sokol, H., Leducq, V., Aschard, H., et al.: Fungal microbiota dysbiosis in IBD. Gut. 66(6), 1039–1048 (2017). https://doi.org/10.1136/gutjnl-2015-310746
Diego, F., et al.: Phylogenetic convolutional neural networks in metagenomics. 19(2), 49 (2018). https://doi.org/10.1186/s12859-018-2033-5
Le Chatelier, E., Nielsen, T., Qin, J., et al.: Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013). https://doi.org/10.1038/nature12506
Thanh, H.N., et al.: Disease classification in metagenomics with 2d embeddings and deep learning. In: Proceedings of CAp, France (2018)
Girgis, H.Z., Mitchell, B.R., Dassopoulos, T., Mullin, G.: Hager G: An intelligent system to detect Crohn’s disease inflammation in Wireless Capsule Endoscopy videos. In: Proceedings IEEE International Symposium Biomed Imaging, pp. 1373–1376 (2010). https://doi.org/10.1109/ISBI.2010.5490253
Hai, T.N., Toan, B.T., Huong, H.L., Trung, P.L., Nghi, C.T.: Improving disease prediction using shallow convolutional neural networks on metagenomic data visualizations based on mean-shift clustering algorithm. Int. J. Adv. Comput. Sci. Appl. (IJACSA), 11(6) (2020). https://doi.org/10.14569/IJACSA.2020.0110607
Xing, L., Zhang, J., Liang, H., Li, Z.: Intelligent recognition of dominant colors for Chinese traditional costumes based on a mean shift clustering method. J. Textile Inst. (2018). https://doi.org/10.1080/00405000.2018.1423896
Thanh H.N., Jean-Daniel, Z.: Enhancing metagenome-based disease prediction by unsupervised binning approaches. In: The 2019 11th International Conference on Knowledge and Systems Engineering (KSEIEEE), pp 381–385, ISBN: 978-1-7281-3003-3, (2019). https://doi.org/10.1109/KSE.2019.8919295
Lo, C., Marculescu, R.: MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks. BMC Bioinform. 20, 314 (2019). https://doi.org/10.1186/s12859-019-2833-2
Rodriguez-Valera, F.: Environmental genomics, the big picture? FEMS Microbiol Lett. 231, 153–158 (2004). https://doi.org/10.1016/S0378-1097(04)00006-0
Edwards, R., Rohwer, F.: Viral metagenomics. Nat. Rev. Microbiol. 3, 504–510 (2005). https://doi.org/10.1038/nrmicro1163
Baghban, H., Rahmani, A.M.: A Heuristic on job scheduling in grid computing environment. In: 2008 Seventh International Conference on Grid and Cooperative Computing, Shenzhen, pp. 141–146 (2008). https://doi.org/10.1109/GCC.2008.22
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Phan, N.Y.K., Nguyen, H.T. (2020). Inflammatory Bowel Disease Classification Improvement with Metagenomic Data Binning Using Mean-Shift Clustering. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2020. Communications in Computer and Information Science, vol 1306. Springer, Singapore. https://doi.org/10.1007/978-981-33-4370-2_21
Download citation
DOI: https://doi.org/10.1007/978-981-33-4370-2_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4369-6
Online ISBN: 978-981-33-4370-2
eBook Packages: Computer ScienceComputer Science (R0)