Skip to main content
Log in

High-dimensional cluster boundary detection using directed Markov tree

  • Theoretical advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Hypersurface of an inscribed geometry decides the distribution of an embedded cluster, in which its boundary points approximately fit this surface. To detect these points, capturing the implicit features of a local space is used to distinguish whether the data is an inner or outer feature. However, this approximation on the boundary is coarse-grained and may be ineffective in a high-dimensional space due to unbalanced feature distribution. In this paper, we introduce a directed Markov tree in high-dimensional cluster boundary detection. The key idea is to project each one-dimensional subspace of a local high-dimensional feature space into a layer of a directed Markov tree, covering absorptive and reflective walls. We then derive a fine-grained detection coefficient against on the Markov process of knight’s tour over each layer of the tree. In this fine-grained view, the local feature space centered with a cluster boundary point has lower estimate on the tour cost than the internal data of the cluster. Based on this observation, we propose a knight algorithm to detect the boundary points of a high-dimensional feature space. Experiments on gene expression and video retrieval datasets demonstrate that the proposed algorithm can achieve a higher F-measure score than the other boundary detection baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://lib.stat.cmu.edu/datasets/biomed.data.html.

  2. http://archive.ics.uci.edu/ml/datasets.html.

  3. http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi.

  4. http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi.

  5. http://research.microsoft.com/en-us/um/people/jckrumm/wallflower/testimages.html.

  6. http://research.microsoft.com/en-us/um/people/jckrumm/wallflower/testimages.html.

References

  1. Aggarwal CC (2015) Outlier analysis. In: Data mining, Springer, pp 237–263

  2. Alzate C, Suykens JA (2008) Multiway spectral clustering with out-of-sample extensions through weighted kernel PCA. IEEE Trans Pattern Anal Mach Intell 32(2):335–347

    Article  Google Scholar 

  3. Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 436–442

  4. Brunet JP, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci 101(12):4164–4169

    Article  Google Scholar 

  5. Cao X, Qiu B, Xu G (2018) Bordershift: toward optimal meanshift vector for cluster boundary detection in high-dimensional data. Pattern Anal Appl 6:1–13

    Google Scholar 

  6. Chatzis SP, Varvarigou TA (2008) A fuzzy clustering approach toward hidden Markov random field models for enhanced spatially constrained image segmentation. IEEE Trans Fuzzy Syst 16(5):1351–1361

    Article  Google Scholar 

  7. Cheeseman PC, Stutz JC et al (1996) Bayesian classification (autoclass): theory and results. Adv Knowl Discov Data Min 180:153–180

    MATH  Google Scholar 

  8. Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on machine learning, p 29

  9. Ding C, He X, Zha H, Simon HD (2002) Adaptive dimension reduction for clustering high dimensional data. In: 2002 IEEE international conference on data mining, 2002. Proceedings., IEEE, pp 147–154

  10. Ferman AM, Tekalp AM (1998) Efficient filtering and clustering methods for temporal video segmentation and visual summarization. J Vis Commun Image Represent 9(4):336–351

    Article  Google Scholar 

  11. Fukunaga K, Hostetler L (1973) Optimization of k nearest neighbor density estimates. IEEE Trans Inf Theory 19(3):320–326

    Article  MathSciNet  Google Scholar 

  12. Fukunaga K, Narendra PM (1975) A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans Comput 100(7):750–753

    Article  Google Scholar 

  13. Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 10:993–1001

    Article  Google Scholar 

  14. He W, Li B, Song D (2018) Decision boundary analysis of adversarial examples. In: 6th international conference on learning representations, ICLR 2018

  15. Hjelm R, Jacob A, Che T, Trischler A, Cho K, Bengio Y (2018) Boundary-seeking generative adversarial networks. In: 6th international conference on learning representations, ICLR 2018; Conference date: 30-04-2018 Through 03-05-2018

  16. Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126

    Article  Google Scholar 

  17. Honda K, Ichihashi H (2005) Regularized linear fuzzy clustering and probabilistic PCA mixture models. IEEE Trans Fuzzy Syst 13(4):508–516

    Article  Google Scholar 

  18. Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254

    Article  Google Scholar 

  19. Li SZ, Chu R, Liao S, Zhang L (2007) Illumination invariant face recognition using near-infrared images. IEEE Trans Pattern Anal Mach Intell 29(4):627–639

    Article  Google Scholar 

  20. Li TH, Chang SJ, Tong W (2004) Fuzzy target tracking control of autonomous mobile robots by using infrared sensors. IEEE Trans Fuzzy Syst 12(4):491–501

    Article  Google Scholar 

  21. Liao S, Jain AK, Li SZ (2013) Partial face recognition: alignment-free approach. IEEE Trans Pattern Anal Mach Intell 35(5):1193–1205

    Article  Google Scholar 

  22. Melnik O (2002) Decision region connectivity analysis: a method for analyzing high-dimensional classifiers. Mach Learn 48(1–3):321–351

    Article  Google Scholar 

  23. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, pp 849–856

  24. Porter R, Canagarajah N (1996) A robust automatic clustering scheme for image segmentation using wavelets. IEEE Trans Image Process 5(4):662–665

    Article  Google Scholar 

  25. Qiu B, Cao X (2016) Clustering boundary detection for high dimensional space based on space inversion and Hopkins statistics. Knowl Based Syst 98:216–225

    Article  Google Scholar 

  26. Qiu B, Feng Y, Yi SJ (2007) Brim: an efficient boundary points detecting algorithm. In: Advances in knowledge discovery and data mining

  27. Qiu B, Yang Y, Xiaowu D (2012) Brink: an algorithm of boundary points of clusters detection based on local qualitative factors. J Zhengzhou Univ Eng Sci 33(3):117–121

    MathSciNet  Google Scholar 

  28. Ruggieri S (2002) Efficient c4. 5 [classification algorithm]. IEEE Trans Knowl Data Eng 14(2):438–444

    Article  Google Scholar 

  29. Smith FW (1968) Pattern classifier design by linear programming. IEEE Trans Comput 100(4):367–372

    Article  Google Scholar 

  30. Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the ninth ACM international conference on Multimedia, pp 107–118

  31. Xia C, Hsu W, Lee ML, Ooi BC (2006) Border: efficient computation of boundary points. IEEE Trans Knowl Data Eng 18(3):289–303

    Article  Google Scholar 

  32. Xue L, Qiu B (2009) Boundary points detection algorithm based on coefficient of variation. Pattern Recognit Artif Intell 22(5):799–802

    Google Scholar 

  33. Zhang H, Wang S, Xu X, Chow TW, Wu QJ (2018) Tree2vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 99:1–15

    MathSciNet  Google Scholar 

  34. Zhang H, Guo H, Wang X, Ji Y, Wu QJ (2020) Clothescounter: a framework for star-oriented clothes mining from videos. Neurocomputing 377:38–48

    Article  Google Scholar 

  35. Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Gr Stat 15(2):265–286

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofeng Cao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

As this article does not involve human participants, there is no such informed consent.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, X. High-dimensional cluster boundary detection using directed Markov tree. Pattern Anal Applic 24, 35–47 (2021). https://doi.org/10.1007/s10044-020-00897-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-020-00897-2

Keywords

Navigation