Unsupervised Feature Selection for Efficient Exploration of High Dimensional Data

Chakrabarti, Arnab; Das, Abhijeet; Cochez, Michael; Quix, Christoph

doi:10.1007/978-3-030-82472-3_14

Arnab Chakrabarti¹²,
Abhijeet Das¹²,
Michael Cochez¹³ &
…
Christoph Quix^14,15

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12843))

Included in the following conference series:

European Conference on Advances in Databases and Information Systems

723 Accesses

Abstract

The exponential growth in the ability to generate, capture, and store high dimensional data has driven sophisticated machine learning applications. However, high dimensionality often poses a challenge for analysts to effectively identify and extract relevant features from datasets. Though many feature selection methods have shown good results in supervised learning, the major challenge lies in the area of unsupervised feature selection. For example, in the domain of data visualization, high-dimensional data is difficult to visualize and interpret due to the limitations of the screen, resulting in visual clutter. Visualizations are more interpretable when visualized in a low dimensional feature space. To mitigate these challenges, we present an approach to perform unsupervised feature clustering and selection using our novel graph clustering algorithm based on Clique-Cover Theory. We implemented our approach in an interactive data exploration tool which facilitates the exploration of relationships between features and generates interpretable visualizations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An information-theoretic graph-based approach for feature selection

Article 30 December 2019

Extending greedy feature selection algorithms to multiple solutions

Article Open access 01 May 2021

Unsupervised Feature Value Selection Based on Explainability

Notes

1.
Data Repository: https://figshare.com/s/1807247ef2165735465c.
2.
Colon Tumor Data: http://csse.szu.edu.cn/staff/zhuzx/Datasets.html.
3.
https://github.com/ryanrossi/pmc.
4.
VizExploreTool: http://dbis.rwth-aachen.de/cms/staff/chakrabarti/unsupervised-feature-selection/eval/view.

References

Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. ACM SIGMOD Rec. 28(2), 61–72 (1999)
Article Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings ACM SIGMOD Conference, pp. 94–105 (1998)
Google Scholar
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
MathSciNet Google Scholar
Augustson, J.G., Minker, J.: An analysis of some graph theoretical cluster techniques. J. ACM (JACM) 17(4), 571–588 (1970)
Article Google Scholar
Bonacich, P.: Some unique properties of eigenvector centrality. Soc. Netw., 555–564 (2007)
Google Scholar
Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings ACM SIGKDD, pp. 333–342 (2010)
Google Scholar
Elgazzar, H., Elmaghraby, A.: Evolutionary centrality and maximal cliques in mobile social networks. Int. J. Comput. Sci. Inf. Tech. 10 (2018)
Google Scholar
Erdös, P., Goodman, A.W., Pósa, L.: The representation of a graph by set intersections. Can. J. Math. 18, 106–112 (1966)
Article MathSciNet Google Scholar
Gramm, J., Guo, J., Hüffner, F., Niedermeier, R.: Data reduction and exact algorithms for clique cover. J. Exp. Algorithmics (JEA) 13, 2 (2009)
Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, pp. 507–514 (2006)
Google Scholar
Li, Z., Yang, Y., Liu, J., Zhou, X., Lu, H.: Unsupervised feature selection using nonnegative spectral analysis. In: Proceedings 26th AAAI Conference (2012)
Google Scholar
Lu, C., Yu, J.X., Wei, H., Zhang, Y.: Finding the maximum clique in massive graphs. PVLDB 10(11), 1538–1549 (2017)
Google Scholar
Mitra, P., Murthy, C., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)
Google Scholar
Paredes, R., Chávez, E.: Using the k-nearest neighbor graph for proximity searching in metric spaces. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 127–138. Springer, Heidelberg (2005). https://doi.org/10.1007/11575832_14
Chapter Google Scholar
Pavan, M., Pelillo, M.: A new graph-theoretic approach to clustering and segmentation. In: Proceedings IEEE Conference Computer Vision & Pattern Recognition (2003)
Google Scholar
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1–2), 23–69 (2003)
Article Google Scholar
Rossi, R.A., Gleich, D.F., Gebremedhin, A.H., Patwary, M.M.A.: Fast maximum clique algorithms for large graphs. In: Proceedings WWW, pp. 365–366 (2014)
Google Scholar
Solorio-Fernández, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: A review of unsupervised feature selection methods. Artif. Intell. Rev. 53(2), 907–948 (2019). https://doi.org/10.1007/s10462-019-09682-y
Article Google Scholar
Speed, T.: A correlation for the 21st century. Science 334(6062), 1502–1503 (2011)
Article Google Scholar
Tarjan, R.E., Trojanowski, A.E.: Finding a maximum independent set. SIAM J. Comput. 6(3), 537–546 (1977)
Article MathSciNet Google Scholar
Wright, M.N., Ziegler, A.: ranger: A fast implementation of random forests for high dimensional data in C++ and R. arXiv preprint arXiv:1508.04409 (2015)
Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: L2, 1-norm regularized discriminative feature selection for unsupervised. In: Proceedings IJCAI (2011)
Google Scholar
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings International Conference on Machine Learning, pp. 1151–1157. ACM (2007)
Google Scholar

Download references

Acknowledgment

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC-2023 Internet of Production – 390621612.

Author information

Authors and Affiliations

RWTH Aachen University, Aachen, Germany
Arnab Chakrabarti & Abhijeet Das
Vrije Universiteit Amsterdam, Amsterdam, Netherlands
Michael Cochez
Hochschule Niederrhein, University of Applied Sciences, Krefeld, Germany
Christoph Quix
Fraunhofer Institute for Applied Information Technology FIT, Sankt Augustin, Germany
Christoph Quix

Authors

Arnab Chakrabarti
View author publications
You can also search for this author in PubMed Google Scholar
Abhijeet Das
View author publications
You can also search for this author in PubMed Google Scholar
Michael Cochez
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Quix
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnab Chakrabarti .

Editor information

Editors and Affiliations

LIAS/ISAE-ENSMA, Futuroscope Chasseneuil Cedex, France
Ladjel Bellatreche
University of Tartu, Tartu, Estonia
Marlon Dumas
Aarhus University, Aarhus, Denmark
Panagiotis Karras
University of Tartu, Tartu, Estonia
Raimundas Matulevičius

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chakrabarti, A., Das, A., Cochez, M., Quix, C. (2021). Unsupervised Feature Selection for Efficient Exploration of High Dimensional Data. In: Bellatreche, L., Dumas, M., Karras, P., Matulevičius, R. (eds) Advances in Databases and Information Systems. ADBIS 2021. Lecture Notes in Computer Science(), vol 12843. Springer, Cham. https://doi.org/10.1007/978-3-030-82472-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-82472-3_14
Published: 16 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82471-6
Online ISBN: 978-3-030-82472-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics