Skip to main content

Ensemble Clustering for Boundary Detection in High-Dimensional Data

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14506))

  • 141 Accesses

Abstract

The emergence of novel data collection methods has led to the accumulation of vast amounts of unlabelled data. Discovering well separated groups of data samples through clustering is a critical but challenging task. In recent years various techniques to detect isolated and boundary points have been developed. In this work, we propose a clustering methodology that enables us to discover boundary data effectively, discriminating them from outliers. The proposed methodology utilizes a well established density based clustering method designed for high dimensional data, to develop a new ensemble scheme. The experimental results demonstrate very good performance, indicating that the approach has the potential to be used in diverse domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Boongoen, T., Iam-On, N.: Cluster ensembles: a survey of approaches with recent extensions and applications. Comput. Sci. Rev. 28, 1–25 (2018)

    Article  MathSciNet  Google Scholar 

  2. Cao, X.: High-dimensional cluster boundary detection using directed Markov tree. Pattern Anal. Appl. 24(1), 35–47 (2021)

    Article  Google Scholar 

  3. Cao, X., Qiu, B., Xu, G.: BorderShift: toward optimal MeanShift vector for cluster boundary detection in high-dimensional data. Pattern Anal. Appl. 22, 1015–1027 (2019)

    Article  MathSciNet  Google Scholar 

  4. Deng, Q., Ramsköld, D., Reinius, B., Sandberg, R.: Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343(6167), 193–196 (2014)

    Article  Google Scholar 

  5. Dias, M.L.D.: Fuzzy-c-means: an implementation of fuzzy \(c\)-means clustering algorithm (2019). https://git.io/fuzzy-c-means

  6. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  7. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)

    Google Scholar 

  8. Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recogn. 39(5), 761–765 (2006)

    Article  Google Scholar 

  9. Hofmeyr, D., Pavlidis, N.G.: PPCI: an R package for cluster identification using projection pursuit. R J. (2019)

    Google Scholar 

  10. McInnes, L., Healy, J., Astels, S.: HDBSCAN: hierarchical density based clustering. J. Open Sour. Softw. 2(11), 205 (2017)

    Article  Google Scholar 

  11. Pavlidis, N.G., Hofmeyr, D.P., Tasoulis, S.K.: Minimum density hyperplanes. J. Mach. Learn. Res. 17(156), 1–33 (2016)

    MathSciNet  Google Scholar 

  12. Qiu, B.Z., Yang, Y., Du, X.W.: BRINK: an algorithm of boundary points of clusters detecton based on local qualitative factors. J. Zhengzhou Univ. (Eng. Sci.) 33(3), 117–120 (2012)

    MathSciNet  Google Scholar 

  13. Qiu, B.-Z., Yue, F., Shen, J.-Y.: BRIM: an efficient boundary points detecting algorithm. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 761–768. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71701-0_83

    Chapter  Google Scholar 

  14. Qiu, B., Cao, X.: Clustering boundary detection for high dimensional space based on space inversion and Hopkins statistics. Knowl.-Based Syst. 98, 216–225 (2016)

    Article  Google Scholar 

  15. Tang, L., Wang, X., Liu, H.: Uncoverning groups via heterogeneous interaction analysis. In: 2009 Ninth IEEE International Conference on Data Mining, pp. 503–512. IEEE (2009)

    Google Scholar 

  16. Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. Data Eng. 14(3), 659–665 (2002)

    Article  Google Scholar 

  17. Xia, C., Hsu, W., Lee, M., Ooi, B.: BORDER: efficient computation of boundary points. IEEE Trans. Knowl. Data Eng. 18(3), 289–303 (2006)

    Article  Google Scholar 

  18. Zhang, M.: Weighted clustering ensemble: a review. Pattern Recogn. 124, 108428 (2022)

    Article  Google Scholar 

  19. Zheng, G.X., et al.: Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8(1), 14049 (2017)

    Article  Google Scholar 

Download references

Acknowledgment

We acknowledge support of this work by the project “Par-ICT CENG: Enhancing ICT research infrastructure in Central Greece to enable processing of Big data from sensor stream, multimedia content, and complex mathematical modeling and simulations” (MIS 5047244), which is implemented under the Action “Reinforcement of the Research and Innovation Infrastructure”, funded by the Operational Programme ”Competitiveness, Entrepreneurship and Innovation” (NSRF 2014–2020) and co-financed by Greece and the European Union (European Regional Development Fund).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis Anagnostou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anagnostou, P., Pavlidis, N.G., Tasoulis, S. (2024). Ensemble Clustering for Boundary Detection in High-Dimensional Data. In: Nicosia, G., Ojha, V., La Malfa, E., La Malfa, G., Pardalos, P.M., Umeton, R. (eds) Machine Learning, Optimization, and Data Science. LOD 2023. Lecture Notes in Computer Science, vol 14506. Springer, Cham. https://doi.org/10.1007/978-3-031-53966-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53966-4_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53965-7

  • Online ISBN: 978-3-031-53966-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics