An adaptive kernelized rank-order distance for clustering non-spherical data with high noise

Huang, Tianyi; Wang, Shiping; Zhu, William

doi:10.1007/s13042-020-01068-9

An adaptive kernelized rank-order distance for clustering non-spherical data with high noise

Original Article
Published: 17 February 2020

Volume 11, pages 1735–1747, (2020)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Tianyi Huang¹,
Shiping Wang² &
William Zhu¹

448 Accesses
37 Citations
Explore all metrics

Abstract

Clustering is a fundamental research topic in unsupervised learning. Similarity measure is a key factor for clustering. However, it is still challenging for existing similarity measures to cluster non-spherical data with high noise levels. Rank-order distance is proposed to well capture the structures of non-spherical data by sharing the neighboring information of the samples, but it cannot well tolerate high noise. In order to address above issue, we propose KROD, a new similarity measure incorporating rank-order distance with Gaussian kernel. By reducing the noise in the neighboring information of samples, KROD improves rank-order distance to tolerate high noise, thus the structures of non-spherical data with high noise levels can be well captured. Then, KROD strengthens these captured structures by Gaussian kernel so that the samples in the same cluster are closer to each other and can be easily clustered correctly. Experiment illustrates that KROD can effectively improve existing methods for discovering non-spherical clusters with high noise levels. The source code can be downloaded from https://github.com/grcai.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Manifold clustering optimized by adaptive aggregation strategy

Article 12 October 2022

A New K-Multiple-Means Clustering Method

Spherical k-Means++ Clustering

References

Ashby F Gregory, Ennis Daniel M (2007) Similarity measures. Scholarpedia 2(12):4116
Google Scholar
Bache Kevin, Lichman Moshe (2013) Uci machine learning repository
Berkhin Pavel (2006) A survey of clustering data mining techniques. In: Grouping Multidimensional Data - Recent Advances in Clustering, pp 25–71
Cai Deng, He Xiaofei, Wang Xuanhui, Bao Hujun, Han Jiawei (2009) Locality preserving nonnegative matrix factorization. In: IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17, 2009, volume 9, pp 1010–1015
Cai Zhiling, Yang Xiaofei, Huang Tianyi, Zhu William (2020) A new similarity combining reconstruction coefficient with pairwise distance for agglomerative clustering. Inf Sci 508:173–182
MathSciNet Google Scholar
Chen Xinlei, Cai Deng (2011) Large scale spectral clustering with landmark-based representation. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA, August 7–11, 2011, volume 5, pp 314–418
Cheng Yizong (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17(8):790–799
Google Scholar
Cox Trevor F, Cox Michael AA (2000) Multidimensional scaling. Chapman and Hall/CRC, London
MATH Google Scholar
Deng Li (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29(6):141–142
Google Scholar
Mingjing Du, Ding Shifei, Jia Hongjie (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145
Google Scholar
Duda Richard O, Hart Peter E (1973) Pattern classification and scene analysis. A Wiley-Interscience publication, Wiley
MATH Google Scholar
Ester Martin, Kriegel Hans-Peter, Sander Jörg, Xu Xiaowei (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA, pp 226–231
Fukunaga Keinosuke, Hostetler Larry (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40
MathSciNet MATH Google Scholar
Gentner Dedre, Markman Arthur B (1997) Structure mapping in analogy and similarity. Am Psychol 52(1):45–56
Google Scholar
Guha Sudipto, Rastogi Rajeev, Shim Kyuseok (1998) Cure: an efficient clustering algorithm for large databases. In: ACM Sigmod Record, volume 27, pp 73–84
Guo Zhishuai, Huang Tianyi, Cai Zhiling, Zhu William (2018) A new local density for density peak clustering. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 426–438
He Kaiming, Wen Fang, Sun Jian (2013) K-means hashing: An affinity-preserving quantization method for learning binary compact codes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2938–2945
He Xiaofei, Cai Deng, Niyogi Partha (2005) Laplacian score for feature selection. In: Advances in Neural Information Processing Systems 18: Annual Conference on Neural Information Processing Systems 2005, Vancouver, British Columbia, Canada, December 5-8, pp 507–514
Hein Matthias, Maier Markus (2007) Manifold denoising. Adv Neural Inf Process Syst 19:561–568
Google Scholar
Huang Dong, Wang Chang-Dong, Wu Jiansheng, Lai Jian-Huang, Kwoh Chee Keong (2019) Ultra-scalable spectral clustering and ensemble clustering. IEEE Transactions on Knowledge and Data Engineering
Hull Jonathan J (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Google Scholar
Jain Anil K, Murty M Narasimha, Flynn Patrick J (1999) Data clustering: a review. ACM Comput Surveys (CSUR) 31(3):264–323
Google Scholar
Jarvis Raymond Austin, Patrick Edward A (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 100(11):1025–1034
Google Scholar
Jia Yangqing, Shelhamer Evan, Donahue Jeff, Karayev Sergey, Long Jonathan, Girshick Ross, Guadarrama Sergio, Darrell Trevor (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp 675–678
Jolliffe Ian (2011) Principal component analysis. In: International encyclopedia of statistical science, Springer, pp 1094–1096
Karypis George, Han Eui-Hong, Kumar Vipin (1999) Chameleon: hierarchical clustering using dynamic modeling. IEEE Comput 32(8):68–75
Google Scholar
Kuhn Harold W (2010) The hungarian method for the assignment problem. In: 50 Years of Integer Programming 1958-2008—from the Early Years to the State-of-the-Art, pp 29–47
Li Ruijia, Yang Xiaofei, Qin Xiaolong, Zhu William (2019) Local gap density for clustering high-dimensional data with varying densities. Knowledge-Based Systems
Liang Zhou, Chen Pei (2016) Delta-density based clustering with a divide-and-conquer strategy: 3dc clustering. Pattern Recogn Lett 73:52–59
Google Scholar
Liu Ziwei, Luo Ping, Wang Xiaogang, Tang Xiaoou (2015) Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV)
Lyons Michael J, Akamatsu Shigeru, Kamachi Miyuki, Gyoba Jiro (1998) Coding facial expressions with gabor wavelets. In: 3rd International Conference on Face & Gesture Recognition (FG ’98), April 14–16, 1998, Nara, Japan, pp 200–205
van der Maaten Laurens, Hinton Geoffrey (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
MATH Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, Oakland, CA, USA, pp 281–297
Milligan Glenn W, Soon Shih Chung, Sokol Lisa M (1983) The effect of cluster size, dimensionality, and the number of clusters on recovery of true cluster structure. IEEE Trans Pattern Anal Mach Intell 1:40–47
Nayar S, Nene Sammeer A, Murase Hiroshi (1996) Columbia object image library (coil 100). department of comp. Technical Report CUCS-006-96
Nene Sameer A, Nayar Shree K, Murase Hiroshi et al (1996) Columbia object image library (coil-20)
Ng Andrew Y, Jordan Michael I, Weiss Yair (2001) On spectral clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3-8, 2001, Vancouver, British Columbia, Canada], pages 849–856
Nie Feiping, Wang Xiaoqian, Jordan Michael I, Huang Heng (2016) The constrained laplacian rank algorithm for graph-based clustering. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA., pages 1969–1976
Otto Charles, Wang Dayong, Jain Anil K (2018) Clustering millions of faces by identity. IEEE Trans Pattern Anal Mach Intell 40(2):289–303
Google Scholar
Pedregosa Fabian, Varoquaux Gaël, Gramfort Alexandre, Michel Vincent, Thirion Bertrand, Grisel Olivier, Blondel Mathieu, Prettenhofer Peter, Weiss Ron, Dubourg Vincent, VanderPlas Jake, Passos Alexandre, Cournapeau David, Brucher Matthieu, Perrot Matthieu, Duchesnay Edouard (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Peterson Leif E (2009) K-nearest neighbor. Scholarpedia 4(2):1883
Google Scholar
Ray S, Turi RH (1999) Determination of number of clusters in k-means clustering and application in colour image segmentation. In: Proceedings of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques, Calcutta, India, pp 137–143
Rodriguez Alex, Laio Alessandro (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Google Scholar
Sammon John W (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 100(5):401–409
Google Scholar
Seifoddini Hamid K (1989) Single linkage versus average linkage clustering in machine cells formation applications. Comput Ind Eng 16(3):419–426
Google Scholar
Shi Jianbo, Malik Jitendra (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Google Scholar
Singh Dinesh, Febbo Phillip G, Ross Kenneth, Jackson Donald G, Manola Judith, Ladd Christine, Tamayo Pablo, Renshaw Andrew A, D’Amico Anthony V, Richie Jerome P et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Google Scholar
Tryon Robert Choate (1939) Cluster analysis: Correlation profile and orthometric (factor) analysis for the isolation of unities in mind and personality. Edwards brother, Incorporated, lithoprinters and publishers
Rui Xu, Wunsch Donald (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Google Scholar
Yang Jianwei, Parikh Devi, Batra Dhruv (2016) Joint unsupervised learning of deep representations and image clusters. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 5147–5156
Zhang Tian, Ramakrishnan Raghu, Livny Miron (1996) Birch: an efficient data clustering method for very large databases. In: ACM Sigmod Record, volume 25, pp 103–114
Zhang Wei, Wang Xiaogang, Zhao Deli, Tang Xiaoou (2012) Graph degree linkage: Agglomerative clustering on a directed graph. In: Computer Vision—ECCV 2012—12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part I, pp 428–441
Zhang Wei, Zhao Deli, Wang Xiaogang (2013) Agglomerative clustering via maximum incremental path integral. Pattern Recogn 46(11):3056–3065
MATH Google Scholar
Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A, Liu H (2010) Advancing feature selection research. ASU feature selection repository, pp 1–28
Zhu Chunhui, Wen Fang, Sun Jian (2011) A rank-order distance based clustering algorithm for face tagging. In: The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011, pp 481–488

Download references

Acknowledgements

This work is supported in part by The National Nature Science Foundation of China under Grant No. 61772120.

Author information

Authors and Affiliations

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
Tianyi Huang & William Zhu
College of Mathematics and Computer Sciences, Fuzhou University, Fuzhou, China
Shiping Wang

Authors

Tianyi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shiping Wang
View author publications
You can also search for this author in PubMed Google Scholar
William Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, T., Wang, S. & Zhu, W. An adaptive kernelized rank-order distance for clustering non-spherical data with high noise. Int. J. Mach. Learn. & Cyber. 11, 1735–1747 (2020). https://doi.org/10.1007/s13042-020-01068-9

Download citation

Received: 28 May 2019
Accepted: 16 January 2020
Published: 17 February 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s13042-020-01068-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive kernelized rank-order distance for clustering non-spherical data with high noise

Abstract

Access this article

Similar content being viewed by others

Manifold clustering optimized by adaptive aggregation strategy

A New K-Multiple-Means Clustering Method

Spherical k-Means++ Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An adaptive kernelized rank-order distance for clustering non-spherical data with high noise

Abstract

Access this article

Similar content being viewed by others

Manifold clustering optimized by adaptive aggregation strategy

A New K-Multiple-Means Clustering Method

Spherical k-Means++ Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation