Skip to main content

AutoCluster: Meta-learning Based Ensemble Method for Automated Unsupervised Clustering

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12714))

Included in the following conference series:

Abstract

Automated clustering automatically builds appropriate clustering models. The existing automated clustering methods are widely based on meta-learning. However, it still faces specific challenges: lacking comprehensive meta-features for meta-learning and general clustering validation index (CVI) as objective function. Therefore, we propose a novel automated clustering method named AutoCluster to address these problems, which is mainly composed of Clustering-oriented Meta-feature Extraction (CME) and Multi-CVIs Clustering Ensemble Construction (MC\(^2\)EC). CME captures the meta-features from spatial randomness and different learning properties of clustering algorithms to enhance meta-learning. MC\(^2\)EC develops a collaborative mechanism based on clustering ensemble to balance the measuring criterion of different CVIs and construct more appropriate clustering model for given datasets. Extensive experiments are conducted on 150 datasets from OpenML to create meta-data and 33 test datasets from three clustering benchmarks to validate the superiority of AutoCluster. The results show the superiority of AutoCluster for building an appropriate clustering model compared with classical clustering algorithms and CASH method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The supplementary material of this paper is available at https://github.com/wj-tian/AutoCluster.

  2. 2.

    http://cs.uef.fi/sipu/datasets/.

  3. 3.

    https://www.uni-marburg.de/fb12/arbeitsgruppen/datenbionik/data.

  4. 4.

    https://github.com/deric/clustering-benchmark.

References

  1. Adam, A., Blockeel, H.: Dealing with overlapping clustering: a constraint-based approach to algorithm selection. In: Meta-Learning and Algorithm Selection workshop-ECMLPKDD2015, vol. 1, pp. 43–54 (2015)

    Google Scholar 

  2. Arbelaitz, O., Gurrutxaga, I., Muguerza, J., PéRez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)

    Article  Google Scholar 

  3. De Souto, M.C., et al.: Ranking and selecting clustering algorithms using a meta-learning approach. In: 2008 IEEE International Joint Conference on Neural Networks, pp. 3729–3735 (2008)

    Google Scholar 

  4. Ferrari, D.G., De Castro, L.N.: Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods. Inf. Sci. 301, 181–194 (2015)

    Article  Google Scholar 

  5. Fränti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets. Appl. Intell. 48(12), 4743–4759 (2018)

    Article  Google Scholar 

  6. Garg, V., Kalai, A.T.: Supervising unsupervised learning. Adv. Neural Inf. Process. Syst. 31, 4991–5001 (2018)

    Google Scholar 

  7. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  8. Jamali, N., Sammut, C.: Majority voting: material classification by tactile sensing using surface texture. IEEE Trans. Robot. 27(3), 508–521 (2011)

    Article  Google Scholar 

  9. José-García, A., Gómez-Flores, W.: Automatic clustering using nature-inspired metaheuristics: a survey. Appl. Soft Comput. 41, 192–213 (2016)

    Article  Google Scholar 

  10. Li, Y.F., Wang, H., Wei, T., Tu, W.W.: Towards automated semi-supervised learning. In: AAAI, vol. 33, pp. 4237–4244 (2019)

    Google Scholar 

  11. Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: ICDM, pp. 911–916 (2010)

    Google Scholar 

  12. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  13. Pimentel, B.A., de Carvalho, A.C.: A new data characterization for selecting clustering algorithms using meta-learning. Inf. Sci. 477, 203–219 (2019)

    Article  Google Scholar 

  14. Ronan, T., Anastasio, S., Qi, Z., Sloutsky, R., Naegle, K.M., Tavares, P.H.S.V.: Openensembles: a python resource for ensemble clustering. J. Mach. Learn. Res. 19(1), 956–961 (2018)

    Google Scholar 

  15. Topchy, A., Jain, A.K., Punch, W.: Combining multiple weak clusterings. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 331–338 (2003)

    Google Scholar 

  16. Ultsch, A.: Clustering with som: U\(^{*}\) c. In: Proceedings of the Workshop on Self-Organizing Maps, 2005 (2005)

    Google Scholar 

  17. Vanschoren, J.: Meta-learning: a survey. CoRR abs/1810.03548 (2018)

    Google Scholar 

  18. Vanschoren, J., Van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2014)

    Article  Google Scholar 

  19. Vukicevic, M., Radovanovic, S., Delibašić, B., Suknovic, M.: Extending meta-learning framework for clustering gene expression data with component based algorithm design and internal evaluation measures. Int. J. Data Min. Bioinform. 14, 101–119 (2016)

    Article  Google Scholar 

  20. Zöller, M., Huber, M.F.: Benchmark and survey of automated machine learning frameworks. J. Artif. Intell. Res. 70, 409–472 (2021)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgment

This work is supported by the National Natural Science Foundation of China (No. 52073169) and the State Key Program of National Nature Science Foundation of China (Grant No. 61936001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Y., Li, S., Tian, W. (2021). AutoCluster: Meta-learning Based Ensemble Method for Automated Unsupervised Clustering. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12714. Springer, Cham. https://doi.org/10.1007/978-3-030-75768-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75768-7_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75767-0

  • Online ISBN: 978-3-030-75768-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics