Skip to main content

Advertisement

Log in

Conscience online learning: an efficient approach for robust kernel-based clustering

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Kernel-based clustering is one of the most popular methods for partitioning nonlinearly separable datasets. However, exhaustive search for the global optimum is NP-hard. Iterative procedure such as k-means can be used to seek one of the local minima. Unfortunately, it is easily trapped into degenerate local minima when the prototypes of clusters are ill-initialized. In this paper, we restate the optimization problem of kernel-based clustering in an online learning framework, whereby a conscience mechanism is easily integrated to tackle the ill-initialization problem and faster convergence rate is achieved. Thus, we propose a novel approach termed conscience online learning (COLL). For each randomly taken data point, our method selects the winning prototype based on the conscience mechanism to bias the ill-initialized prototype to avoid degenerate local minima and efficiently updates the winner by the online learning rule. Therefore, it can more efficiently obtain smaller distortion error than k-means with the same initialization. The rationale of the proposed COLL method is experimentally analyzed. Then, we apply the COLL method to the applications of digit clustering and video clustering. The experimental results demonstrate the significant improvement over existing kernel-based clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abolhassani B, Salt JE, Dodds DE (2004) A two-phase genetic k-means algorithm for placement of radioports in cellular networks. IEEE Trans Syst Man Cybern B Cybern 34: 533–538

    Article  Google Scholar 

  2. Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html

  3. Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin

    MATH  Google Scholar 

  4. Bradley PS, Fayyad UM (1998) Refining initial points for k-means clustering of the 15th international conference on machine learning

  5. Cheung Y-M (2005) On rival penalization controlled competitive learning for clustering with automatic cluster number selection. IEEE Trans Knowl Data Eng 17: 1583–1588

    Article  Google Scholar 

  6. Denton AM, Besemann CA, Dorr DH (2009) Pattern-based time-series subsequence clustering using radial distribution functions. Knowl Inf Syst 18: 1–27

    Article  Google Scholar 

  7. DeSieno D (1988) Adding a conscience to competitive learning. IEEE international conference on neural network

  8. Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means, spectral clustering and normalized clustering of the 10th ACM SIGKDD international conference on knowledge discovery and data mining

  9. http://www.open-video.org (n.d.) The Open Video Project is managed at the Interaction Design Laboratory, at the School of Information and Library Science, University of North Carolina at Chapel Hill

  10. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2: 193–218

    Article  Google Scholar 

  11. Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5): 550–554

    Article  Google Scholar 

  12. Jin R, Goswami A, Agrawal G (2006) Fast and exact out-of-core and distributed k-means clustering. Knowl Inf Syst 10: 17–40

    Article  Google Scholar 

  13. Jing L, Ng MK, Huang JZ (2010) Knowledge-based vector space model for text clustering. Knowl Inf Syst 25: 35–55

    Article  Google Scholar 

  14. Khan SS, Ahmad A (2004) Cluster center initialization algorithm for k-means clustering. Pattern Recognit Lett 25: 1293–1302

    Article  Google Scholar 

  15. Krishna K, Murty MN (1999) Genetic k-means algorithm. IEEE Trans Syst Man Cybern B Cybern 29(3): 433–439

    Article  Google Scholar 

  16. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. http://yann.lecun.com/exdb/mnist/

    Google Scholar 

  17. Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recognit 36: 451–461

    Article  Google Scholar 

  18. Liu B, Xia Y, Yu PS (2000) Clustering through decision tree construction. In: Proceedings of the 9th international conference on information and knowledge management

  19. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, California, pp 281–297

  20. Nayak R (2008) Fast and effective clustering of XML data using structural information. Knowl Inf Syst 14: 197–215

    Article  Google Scholar 

  21. Schölkopf B (2000) The kernel trick for distances. Adv Neural Inf Process Syst 301–307

  22. Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10: 1299–1319

    Article  Google Scholar 

  23. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

  24. Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617

    MathSciNet  Google Scholar 

  25. Strehl A, Ghosh J, Mooney RJ (2000) Impact of similarity measures on web-page clustering. In: Proceedings of the AAAI workshop on AI for web search (AAAI 2000)

  26. Su Z, Yang Q, Zhang H, Xu X, Hu Y-H, Ma S (2002) Correlation-based web document clustering for adaptive web interface design. Knowl Inf Syst 4: 151–167

    Article  Google Scholar 

  27. Takacs B, Demiris Y (2010) Spectral clustering in multi-agent systems. Knowl Inf Syst 25: 607–622

    Article  Google Scholar 

  28. Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multime Comput Commun Appl 3(1): 1–37

    Article  Google Scholar 

  29. Tzortzis GF, Likas AC (2009) The global kernel k-means algorithms for clustering in feature space. IEEE Trans Neural Netw 20(7): 1181–1194

    Article  Google Scholar 

  30. Wang C-D, Lai J-H (2011) Energy based competitive learning. Neurocomputing 74: 2265–2275

    Article  MathSciNet  Google Scholar 

  31. Wang C-D, Lai J-H, Zhu J-Y (2010) A conscience on-line learning approach for kernel-based clustering. In: Proceedings of the 10th international conference on data mining. pp 531–540

  32. Wang J, Wu X, Zhang C (2005) Support vector machines based on k-means clustering for real-time business intelligence systems. Int J Bus Intell Data Min 1: 54–64

    Article  Google Scholar 

  33. Wang K, Xu C, Liu B (1999) Clustering transactions using large items. In: Proceedings of the 8th international conference on information and knowledge management

  34. Wu J, Xiong H, Chen J (2009) Adapting the right measures for k-means clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining

  35. Wu J, Xiong H, Chen J, Zhou W (2007) A generalization of proximity functions for k-means. In: Proceedings of the 7th international conference on data mining

  36. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14: 1–37

    Article  Google Scholar 

  37. Xiong H, Steinbach M, Ruslim A, Kumar V (2009) Characterizing pattern preserving clustering. Knowl Inf Syst 19: 311–336

    Article  Google Scholar 

  38. Xu L, Krzyżak A, Oja E (1993) Rival penalized competitive learning for clustering analysis, rbf net, and curve detection. IEEE Trans Neural Netw 4(4): 636–649

    Article  Google Scholar 

  39. Xu R, Wunsch DI (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3): 645–678

    Article  Google Scholar 

  40. Zhang Y-J, Liu Z-Q (2002) Self-splitting competitive learning: a new on-line clustering paradigm. IEEE Trans Neural Netw 13(2): 369–380

    Article  Google Scholar 

  41. Zhang Z, Dai BT, Tung AK (2006) On the lower bound of local optimums in k-means algorithm. In: Proceedings of the 6th international conference on data mining

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian-Huang Lai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, CD., Lai, JH. & Zhu, JY. Conscience online learning: an efficient approach for robust kernel-based clustering. Knowl Inf Syst 31, 79–104 (2012). https://doi.org/10.1007/s10115-011-0416-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0416-2

Keywords