Abstract
Unsupervised Feature Selection is an area of research that currently has received much attention in the scientific community due to its wide application in practical problems where unlabeled data arise. One of these problems is profiling the structure of bacterial communities in the oceans, where it is required to identify and select relevant features from unlabeled marine sediment samples. This paper introduces a methodology to identify and select a set of relevant features in this field. To select a subset of relevant features, we rely on a synergy between ranking-based unsupervised feature selection methods, an introduced internal validation index, and a clustering algorithm. According to the results obtained in our analyses, the proposed methodology can select those features that best discover cluster structures in this kind of data.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A feature is consistent if it takes similar values for the objects that are close to each other and dissimilar values for far apart objects.
- 2.
Maximum or minimum, depending on the internal evaluation index used.
- 3.
The value computed by these indices increases or decreases monotonically regarding the number of features.
References
Godoy-Lozano, E.E., et al.: Bacterial diversity and the geochemical landscape in the southwestern Gulf of Mexico. Front. Microbiol. 9, 2528 (2018)
Wang, Y., et al.: Comparison of the levels of bacterial diversity in freshwater, intertidal wetland, and marine sediments by using millions of illumina tags. Appl. Environ. Microbiol. 78(23), 8264–8271 (12 2012)
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
Solorio-Fernández, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: A review of unsupervised feature selection methods. Artif. Intell. Rev. 53(2), 907–948 (2020)
Dash, M., Liu, H., Yao, J.: Dimensionality reduction of unsupervised data. In: Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence, pp. 532–539. IEEE Computer Society (1997)
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems 18, vol. 186, pp. 507–514 (2005)
Chung, F.R.K.: Spectral Graph Theory. Reprinted edn, vol. 92. American Mathematical Soc. (1997)
Varshavsky, R., Gottlieb, A., Linial, M., Horn, D.: Novel unsupervised feature filtering of biological data. Bioinformatics 22(14), e507–e513 (2006)
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1151–1157. ACM (2007)
Solorio-Fernández, S., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: A new unsupervised spectral feature selection method for mixed data: a filter approach. Pattern Recogn. 72, 314–326 (2017)
Zhao, Z.A., Liu, H.: Spectral Feature Selection for Data Mining. CRC Press (2011)
Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: L2,1-Norm regularized discriminative feature selection for unsupervised learning. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 1589–1594 (2011)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, vol. 22. Academic Press (1990)
Li, Z., Yang, Y., Liu, J., Zhou, X., Lu, H.: Unsupervised feature selection using nonnegative spectral analysis. Proc. Natl. Conf. Artif. Intell. 2, 1026–1032 (2012)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2002)
Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)
Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.: Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl.-Based Syst. 118, 124–139 (2017)
Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 911–916. IEEE (2010)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. - Theory Methods 3(1), 1–27 (1974)
Morita, M., Sabourin, R., Bortolozzi, F., Suen, C.Y.: Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings, pp. 666–670. IEEE (2003)
Solorio-Fernández, S., Carrasco-Ochoa, J., Martínez-Trinidad, J.: A new hybrid filter–wrapper feature selection method for clustering based on ranking. Neurocomputing 214 (2016)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Acknowledgements
The first author gratefully acknowledges the Instituto Nacional de Astrofósica, Óptica y Electrínica (INAOE) for the collaboration grant awarded for developing this research. We also thank E. Ernestina Godoy-Lozano and collaborators to provide the data for the analysis presented in this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Solorio-Fernández, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F. (2021). Unsupervised Feature Selection Methodology for Analysis of Bacterial Taxonomy Profiles. In: Roman-Rangel, E., Kuri-Morales, Á.F., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-López, J.A. (eds) Pattern Recognition. MCPR 2021. Lecture Notes in Computer Science(), vol 12725. Springer, Cham. https://doi.org/10.1007/978-3-030-77004-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-77004-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77003-7
Online ISBN: 978-3-030-77004-4
eBook Packages: Computer ScienceComputer Science (R0)