Skip to main content

Unsupervised Feature Selection for Efficient Exploration of High Dimensional Data

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12843))

Included in the following conference series:

  • 723 Accesses

Abstract

The exponential growth in the ability to generate, capture, and store high dimensional data has driven sophisticated machine learning applications. However, high dimensionality often poses a challenge for analysts to effectively identify and extract relevant features from datasets. Though many feature selection methods have shown good results in supervised learning, the major challenge lies in the area of unsupervised feature selection. For example, in the domain of data visualization, high-dimensional data is difficult to visualize and interpret due to the limitations of the screen, resulting in visual clutter. Visualizations are more interpretable when visualized in a low dimensional feature space. To mitigate these challenges, we present an approach to perform unsupervised feature clustering and selection using our novel graph clustering algorithm based on Clique-Cover Theory. We implemented our approach in an interactive data exploration tool which facilitates the exploration of relationships between features and generates interpretable visualizations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Data Repository: https://figshare.com/s/1807247ef2165735465c.

  2. 2.

    Colon Tumor Data: http://csse.szu.edu.cn/staff/zhuzx/Datasets.html.

  3. 3.

    https://github.com/ryanrossi/pmc.

  4. 4.

    VizExploreTool: http://dbis.rwth-aachen.de/cms/staff/chakrabarti/unsupervised-feature-selection/eval/view.

References

  1. Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. ACM SIGMOD Rec. 28(2), 61–72 (1999)

    Article  Google Scholar 

  2. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings ACM SIGMOD Conference, pp. 94–105 (1998)

    Google Scholar 

  3. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)

    MathSciNet  Google Scholar 

  4. Augustson, J.G., Minker, J.: An analysis of some graph theoretical cluster techniques. J. ACM (JACM) 17(4), 571–588 (1970)

    Article  Google Scholar 

  5. Bonacich, P.: Some unique properties of eigenvector centrality. Soc. Netw., 555–564 (2007)

    Google Scholar 

  6. Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings ACM SIGKDD, pp. 333–342 (2010)

    Google Scholar 

  7. Elgazzar, H., Elmaghraby, A.: Evolutionary centrality and maximal cliques in mobile social networks. Int. J. Comput. Sci. Inf. Tech. 10 (2018)

    Google Scholar 

  8. Erdös, P., Goodman, A.W., Pósa, L.: The representation of a graph by set intersections. Can. J. Math. 18, 106–112 (1966)

    Article  MathSciNet  Google Scholar 

  9. Gramm, J., Guo, J., Hüffner, F., Niedermeier, R.: Data reduction and exact algorithms for clique cover. J. Exp. Algorithmics (JEA) 13, 2 (2009)

    Google Scholar 

  10. He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, pp. 507–514 (2006)

    Google Scholar 

  11. Li, Z., Yang, Y., Liu, J., Zhou, X., Lu, H.: Unsupervised feature selection using nonnegative spectral analysis. In: Proceedings 26th AAAI Conference (2012)

    Google Scholar 

  12. Lu, C., Yu, J.X., Wei, H., Zhang, Y.: Finding the maximum clique in massive graphs. PVLDB 10(11), 1538–1549 (2017)

    Google Scholar 

  13. Mitra, P., Murthy, C., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)

    Google Scholar 

  14. Paredes, R., Chávez, E.: Using the k-nearest neighbor graph for proximity searching in metric spaces. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 127–138. Springer, Heidelberg (2005). https://doi.org/10.1007/11575832_14

    Chapter  Google Scholar 

  15. Pavan, M., Pelillo, M.: A new graph-theoretic approach to clustering and segmentation. In: Proceedings IEEE Conference Computer Vision & Pattern Recognition (2003)

    Google Scholar 

  16. Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1–2), 23–69 (2003)

    Article  Google Scholar 

  17. Rossi, R.A., Gleich, D.F., Gebremedhin, A.H., Patwary, M.M.A.: Fast maximum clique algorithms for large graphs. In: Proceedings WWW, pp. 365–366 (2014)

    Google Scholar 

  18. Solorio-Fernández, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: A review of unsupervised feature selection methods. Artif. Intell. Rev. 53(2), 907–948 (2019). https://doi.org/10.1007/s10462-019-09682-y

    Article  Google Scholar 

  19. Speed, T.: A correlation for the 21st century. Science 334(6062), 1502–1503 (2011)

    Article  Google Scholar 

  20. Tarjan, R.E., Trojanowski, A.E.: Finding a maximum independent set. SIAM J. Comput. 6(3), 537–546 (1977)

    Article  MathSciNet  Google Scholar 

  21. Wright, M.N., Ziegler, A.: ranger: A fast implementation of random forests for high dimensional data in C++ and R. arXiv preprint arXiv:1508.04409 (2015)

  22. Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: L2, 1-norm regularized discriminative feature selection for unsupervised. In: Proceedings IJCAI (2011)

    Google Scholar 

  23. Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings International Conference on Machine Learning, pp. 1151–1157. ACM (2007)

    Google Scholar 

Download references

Acknowledgment

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC-2023 Internet of Production – 390621612.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnab Chakrabarti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chakrabarti, A., Das, A., Cochez, M., Quix, C. (2021). Unsupervised Feature Selection for Efficient Exploration of High Dimensional Data. In: Bellatreche, L., Dumas, M., Karras, P., Matulevičius, R. (eds) Advances in Databases and Information Systems. ADBIS 2021. Lecture Notes in Computer Science(), vol 12843. Springer, Cham. https://doi.org/10.1007/978-3-030-82472-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-82472-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-82471-6

  • Online ISBN: 978-3-030-82472-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics