Skip to main content

Advertisement

Log in

Robust semi-supervised clustering via data transductive warping

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In practical applications, we are more likely to face semi-supervised data with a small amount of independent class label or constraint information and many unlabeled instances. For semi-supervised clustering, taking advantage of the small portion of preliminary label information can significantly improve the discriminability of representations. Spectral clustering has the benefits of handling any shape data distribution and converging to the optimal global solution but is susceptible to noisy data. However, it is inevitable to contain noise for real-world applications that significantly reduce clustering performance. Motivated by this, we propose a novel Robust Semi-supervised Spectral Clustering method (named RSSC) to address clustering on noise semi-supervised datasets. Specifically, in terms of data transductive warping, we map the entire semi-supervised dataset into a new data space where labeled data is close to the canonical coordinate system, and unlabeled data with similar characteristics should be close to those labeled data. The noise data is close to the origin of the coordinate and form the noise cluster because there is no guidance. Finally, samples in the same cluster are close, and different clusters are separated. Extensive experimental results on sixteen real-world datasets demonstrate that RSSC outperforms other state-of-the-art clustering methods on performance and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Aggarwal CC (2018) An introduction to cluster analysis. In: Data clustering, Chapman and hall/CRC, pp 1–28

  2. Alok AK, Saha S, Ekbal A (2017) Semi-supervised clustering for gene-expression data in multiobjective optimization framework. Int J Mach Learn Cybern 8(2):421–439

    Article  Google Scholar 

  3. Amigó E, Gonzalo J, Artiles J, Verdejo F (2009) A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12(4):461–486

    Article  Google Scholar 

  4. Ammour A, Aouraghe I, Khaissidi G, Mrabti M, Aboulem G, Belahsen F (2020) A new semi-supervised approach for characterizing the arabic on-line handwriting of parkinson’s disease patients. Comput Methods Prog Biomed 183:104979

    Article  Google Scholar 

  5. Baghshah MS, Shouraki SB (2009) Semi-supervised metric learning using pairwise constraints. In: Twenty-first international joint conference on artificial intelligence, pp 1217–1222

  6. Eric B (2013) Semi-supervised clustering methods. Wiley Interdisciplinary Reviews: Computational Statistics 5(5):349–361

    Article  Google Scholar 

  7. Bojchevski A, Matkovic Y, Günnemann S (2017) Robust spectral clustering for noisy data: Modeling sparse corruptions improves latent embeddings. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 737–746

  8. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  MATH  Google Scholar 

  9. Fang X, Yong X u, Li X, Lai Z, Wong WK (2015) Robust semi-supervised subspace clustering via non-negative low-rank representation. IEEE Trans Cybern 46(8):1828–1838

    Article  Google Scholar 

  10. Hariri S, Kind MC, Brunner RJ (2021) Extended isolation forest. IEEE Trans Knowl Data Eng 33(4):1479–1489

    Article  Google Scholar 

  11. Ienco D, Pensa RG (2018) Semi-supervised clustering with multiresolution autoencoders. In: 2018 International joint conference on neural networks (IJCNN), IEEE, pp 1–8

  12. Ionescu C, Popa A, Sminchisescu C (2017) Large-scale data-dependent kernel approximation. In: Artificial intelligence and statistics, PMLR, pp 19–27

  13. Kang Z, Shi G, Huang S, Chen W, Pu X, Zhou JT, Xu Z (2020) Multi-graph fusion for multi-view spectral clustering. Knowl-Based Syst 189:105102

    Article  Google Scholar 

  14. Kim Y, Do H, Kim SB (2020) Outer-points shaver: Robust graph-based clustering via node cutting. Pattern Recogn 97:107001

    Article  Google Scholar 

  15. Kong W, Hu S, Zhang J, Dai G (2013) Robust and smart spectral clustering from normalized cut. Neural Comput Applic 23(5):1503–1512

    Article  Google Scholar 

  16. Lai Y, He S, Lin Z, Yang F, Zhou Qi-Feng, Zhou X (2019) An adaptive robust semi-supervised clustering framework using weighted consensus of random k-means ensemble. IEEE Trans Knowl Data Eng 33(5):1877–1890

    Google Scholar 

  17. Li X, Yin H, Ke Z, Zhou X (2020) Semi-supervised clustering with deep metric learning and graph embedding. World Wide Web 23(2):781–798

    Article  Google Scholar 

  18. Li Z, Liu J, Chen S, Tang X (2007) Noise robust spectral clustering. In: 2007 IEEE 11Th international conference on computer vision, IEEE Computer Society, pp 1–8

  19. Liu H, Li J, Yue W u, Yun F u (2019) Clustering with outlier removal. IEEE Transactions on Knowledge and Data Engineering 33(6):2369–2379

    Article  Google Scholar 

  20. Ma Y, Ganapathiraman V, Zhang X (2019) Learning invariant representations with kernel warping. In: The 22nd international conference on artificial intelligence and statistics, PMLR, pp 1003–1012

  21. Mai X, Couillet R (2018) Semi-supervised spectral clustering. In: 2018 52Nd asilomar conference on signals, systems, and computers, IEEE, pp 2012–2016

  22. Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654

    Article  Google Scholar 

  23. Ning J, Chen L, Chen J (2018) Relative density-based outlier detection algorithm. In: Proceedings of the 2018 2nd international conference on computer science and artificial intelligence, ACM, pp 227–231

  24. Ott L, Pang L, Ramos FT, Chawla S (2014) On integrated clustering and outlier detection. Advances in Neural Information Processing Systems 27:1359–1367

    Google Scholar 

  25. Peng S, Ser W, Chen B, Lin Z (2021) Robust semi-supervised nonnegative matrix factorization for image clustering. Pattern Recogn 111:107683

    Article  Google Scholar 

  26. Qian H, Pan SJ, Miao C (2019) Distribution-based semi-supervised learning for activity recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 7699– 7706

  27. Rossi RG, Marcacini RM, Rezende SO et al (2013) Benchmarking text collections for classification and clustering tasks. Tech. rep. 395, Institute of Mathematics and Computer Sciences University of Sao Paulo

  28. Sanodiya RK, Saha S, Mathew J (2019) A kernel semi-supervised distance metric learning with relative distance: Integration with a moo approach. Expert Syst Appl 125:233–248

    Article  Google Scholar 

  29. Shen P, Du X, Li C (2016) Distributed semi-supervised metric learning. IEEE Access 4:8558–8571

    Article  Google Scholar 

  30. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  31. Śmieja M, Struski Ł, Figueiredo MAT (2020) A classification-based approach to semi-supervised clustering with pairwise constraints. Neural Netw 127:193–203

    Article  Google Scholar 

  32. Smola AJ, Kondor R (2003) Kernels and regularization on graphs. In: Learning theory and kernel machines, vol 2777, Springer, pp 144–158

  33. Bo T, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180

    Article  Google Scholar 

  34. Tang Y, Wang J, Gao B, Dellandréa E, Gaizauskas R, Chen L (2016) Large scale semi-supervised object detection using visual and semantic knowledge transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2119–2128

  35. Tao Z, Liu H, Li S, Ding Z, Yun F u (2019) Robust spectral ensemble clustering via rank minimization. ACM Transactions on Knowledge Discovery from Data (TKDD) 13(1):4:1–4:25

    Article  Google Scholar 

  36. Engelen JEV, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109 (2):373–440

    Article  MathSciNet  MATH  Google Scholar 

  37. Vladimir V (2006) Transductive inference and semi-supervised learning. In: Semi-supervised learning, The MIT Press, pp 452–472

  38. Veras R, Aires K, Britto L et al (2018) Medical image segmentation using seeded fuzzy c-means: a semi-supervised clustering algorithm. In: 2018 International joint conference on neural networks (IJCNN), IEEE, pp 1–7

  39. Wang F, Sun J (2015) Survey on distance metric learning and dimensionality reduction in data mining. Data Mining and Knowledge Discovery 29(2):534–564

    Article  MathSciNet  MATH  Google Scholar 

  40. Wang J, Tian F, Liu CH, Wang X (2015) Robust semi-supervised nonnegative matrix factorization. In: 2015 International joint conference on neural networks (IJCNN), IEEE, pp 1–8

  41. Wu W, Jia Y, Kwong S, Hou J (2018) Pairwise constraint propagation-induced symmetric nonnegative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems 29(12):6348–6361

    Article  MathSciNet  Google Scholar 

  42. Xu X, Ding S, Wang L, Wang Y (2020) A robust density peaks clustering algorithm with density-sensitive similarity. Knowl-Based Syst 200:106028

    Article  Google Scholar 

  43. Xu Z, Ke Y (2016) Effective and efficient spectral clustering on text and link data. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, pp 357–366

  44. Lu Y, Liu Y (2018) Ensemble biclustering gene expression data based on the spectral clustering. Neural Comput Applic 30(8):2403–2416

    Article  Google Scholar 

  45. Yu Z, Luo P, Liu J, Wong H-S, You J, Han G, Zhang J (2018) Semi-supervised ensemble clustering based on selected constraint projection. IEEE Trans Knowl Data Eng 30(12):2394– 2407

    Article  Google Scholar 

  46. Zhou Z, Si G, Zhang Y, Zheng K (2018) Robust clustering by identifying the veins of clusters based on kernel density estimation. Knowl-Based Syst 159:309–320

    Article  Google Scholar 

  47. Zhu X (2017) Semi-supervised learning. In: Encyclopedia of machine learning and data mining, Springer, pp 1142–1147

  48. Zhu X, Kandola JS, Ghahramani Z, Lafferty JD (2004) Nonparametric transforms of graph kernels for semi-supervised learning. In: Advances in neural information processing systems, vol 17, pp 1641–1648

Download references

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China under grants 61906056, 61876001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Zhou.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, P., Wang, N., Zhao, S. et al. Robust semi-supervised clustering via data transductive warping. Appl Intell 53, 1254–1270 (2023). https://doi.org/10.1007/s10489-022-03493-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03493-5

Keywords