Abstract
Hypersurface of an inscribed geometry decides the distribution of an embedded cluster, in which its boundary points approximately fit this surface. To detect these points, capturing the implicit features of a local space is used to distinguish whether the data is an inner or outer feature. However, this approximation on the boundary is coarse-grained and may be ineffective in a high-dimensional space due to unbalanced feature distribution. In this paper, we introduce a directed Markov tree in high-dimensional cluster boundary detection. The key idea is to project each one-dimensional subspace of a local high-dimensional feature space into a layer of a directed Markov tree, covering absorptive and reflective walls. We then derive a fine-grained detection coefficient against on the Markov process of knight’s tour over each layer of the tree. In this fine-grained view, the local feature space centered with a cluster boundary point has lower estimate on the tour cost than the internal data of the cluster. Based on this observation, we propose a knight algorithm to detect the boundary points of a high-dimensional feature space. Experiments on gene expression and video retrieval datasets demonstrate that the proposed algorithm can achieve a higher F-measure score than the other boundary detection baselines.
Similar content being viewed by others
Notes
References
Aggarwal CC (2015) Outlier analysis. In: Data mining, Springer, pp 237–263
Alzate C, Suykens JA (2008) Multiway spectral clustering with out-of-sample extensions through weighted kernel PCA. IEEE Trans Pattern Anal Mach Intell 32(2):335–347
Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 436–442
Brunet JP, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci 101(12):4164–4169
Cao X, Qiu B, Xu G (2018) Bordershift: toward optimal meanshift vector for cluster boundary detection in high-dimensional data. Pattern Anal Appl 6:1–13
Chatzis SP, Varvarigou TA (2008) A fuzzy clustering approach toward hidden Markov random field models for enhanced spatially constrained image segmentation. IEEE Trans Fuzzy Syst 16(5):1351–1361
Cheeseman PC, Stutz JC et al (1996) Bayesian classification (autoclass): theory and results. Adv Knowl Discov Data Min 180:153–180
Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on machine learning, p 29
Ding C, He X, Zha H, Simon HD (2002) Adaptive dimension reduction for clustering high dimensional data. In: 2002 IEEE international conference on data mining, 2002. Proceedings., IEEE, pp 147–154
Ferman AM, Tekalp AM (1998) Efficient filtering and clustering methods for temporal video segmentation and visual summarization. J Vis Commun Image Represent 9(4):336–351
Fukunaga K, Hostetler L (1973) Optimization of k nearest neighbor density estimates. IEEE Trans Inf Theory 19(3):320–326
Fukunaga K, Narendra PM (1975) A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans Comput 100(7):750–753
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 10:993–1001
He W, Li B, Song D (2018) Decision boundary analysis of adversarial examples. In: 6th international conference on learning representations, ICLR 2018
Hjelm R, Jacob A, Che T, Trischler A, Cho K, Bengio Y (2018) Boundary-seeking generative adversarial networks. In: 6th international conference on learning representations, ICLR 2018; Conference date: 30-04-2018 Through 03-05-2018
Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126
Honda K, Ichihashi H (2005) Regularized linear fuzzy clustering and probabilistic PCA mixture models. IEEE Trans Fuzzy Syst 13(4):508–516
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Li SZ, Chu R, Liao S, Zhang L (2007) Illumination invariant face recognition using near-infrared images. IEEE Trans Pattern Anal Mach Intell 29(4):627–639
Li TH, Chang SJ, Tong W (2004) Fuzzy target tracking control of autonomous mobile robots by using infrared sensors. IEEE Trans Fuzzy Syst 12(4):491–501
Liao S, Jain AK, Li SZ (2013) Partial face recognition: alignment-free approach. IEEE Trans Pattern Anal Mach Intell 35(5):1193–1205
Melnik O (2002) Decision region connectivity analysis: a method for analyzing high-dimensional classifiers. Mach Learn 48(1–3):321–351
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, pp 849–856
Porter R, Canagarajah N (1996) A robust automatic clustering scheme for image segmentation using wavelets. IEEE Trans Image Process 5(4):662–665
Qiu B, Cao X (2016) Clustering boundary detection for high dimensional space based on space inversion and Hopkins statistics. Knowl Based Syst 98:216–225
Qiu B, Feng Y, Yi SJ (2007) Brim: an efficient boundary points detecting algorithm. In: Advances in knowledge discovery and data mining
Qiu B, Yang Y, Xiaowu D (2012) Brink: an algorithm of boundary points of clusters detection based on local qualitative factors. J Zhengzhou Univ Eng Sci 33(3):117–121
Ruggieri S (2002) Efficient c4. 5 [classification algorithm]. IEEE Trans Knowl Data Eng 14(2):438–444
Smith FW (1968) Pattern classifier design by linear programming. IEEE Trans Comput 100(4):367–372
Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the ninth ACM international conference on Multimedia, pp 107–118
Xia C, Hsu W, Lee ML, Ooi BC (2006) Border: efficient computation of boundary points. IEEE Trans Knowl Data Eng 18(3):289–303
Xue L, Qiu B (2009) Boundary points detection algorithm based on coefficient of variation. Pattern Recognit Artif Intell 22(5):799–802
Zhang H, Wang S, Xu X, Chow TW, Wu QJ (2018) Tree2vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 99:1–15
Zhang H, Guo H, Wang X, Ji Y, Wu QJ (2020) Clothescounter: a framework for star-oriented clothes mining from videos. Neurocomputing 377:38–48
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Gr Stat 15(2):265–286
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
As this article does not involve human participants, there is no such informed consent.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cao, X. High-dimensional cluster boundary detection using directed Markov tree. Pattern Anal Applic 24, 35–47 (2021). https://doi.org/10.1007/s10044-020-00897-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-020-00897-2