Skip to main content
Log in

A novel pruning approach for robust data clustering

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this paper, we make an effort to overcome the sensitivity of traditional clustering algorithms to noisy data points (noise and outliers). A novel pruning method, in terms of information theory, is therefore proposed to phase out noisy points for robust data clustering. This approach identifies and prunes the noisy points based on the maximization of mutual information against input data distributions such that the resulting clusters are least affected by noise and outliers, where the degree of robustness is controlled through a separate parameter to make a trade-off between rejection of noisy points and optimal clustered data. The pruning approach is general, and it can improve the robustness of many existing traditional clustering methods. In particular, we apply the pruning approach to improve the robustness of fuzzy c-means clustering and its extensions, e.g., fuzzy c-spherical shells clustering and kernel-based fuzzy c-means clustering. As a result, we obtain three clustering algorithms that are the robust versions of the existing ones. The effectiveness of the proposed pruning approach is supported by experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In this paper, the traditional clustering algorithms specially mean the basic K-means and fuzzy c-means algorithms, but not their improved versions, such as the enhanced LBG (less sensitive to data initialization) [18] or the possibilistic c-means (less sensitive to noisy data points) [12].

  2. Most researchers assume that they can provide their clustering algorithms with a suitable initialization. Others use multiple (random) initializations that guarantee (with a given probability) that at least one initialization is good.

  3. The average mutual information can be written in either of the two following forms [2]: \(I=\sum_{j=1}^l\sum_{k=1}^c {u_j u_{k|j}\log\frac{{u_{j|k}}}{{u_j}}}\) or \(I=\sum_{j=1}^l\sum_{k=1}^c {u_j u_{k|j}\log\frac{{u_{k|j}}}{{u_k}}}.\)

  4. Please note the prior distribution u j is equal to \(\frac{{1}}{{l}}\) in the clustering procedure of traditional clustering algorithms, while it is used to phase out the noisy points in the proposed pruning method as discussed later in this section.

  5. We run the four clustering algorithms on X12 with the same initial cluster centers ([−3.34, 1.67][1.67, 0.00]) as in [17]. It is observed that the result of FCM is nearly identical to that in [17]; however, the results of PCM and PFCM are a bit worse than those in [17]. For the purpose of fair comparison, the numerical results of PCM and PFCM in Table 1 are directly taken from [17], while the numerical results of FCM and RFCM are generated by our program.

References

  1. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York

    MATH  Google Scholar 

  2. Blahut RE (1972) Computation of channel capacity and rate-distortion functions. IEEE Trans Inform Theory 18(4):460–473

    Article  MATH  MathSciNet  Google Scholar 

  3. Blahut RE (1987) Principle and practice of information theory. Addison-Wesley, Massachusetts

    Google Scholar 

  4. Dave RN (1990) Fuzzy shell-clustering and applications to circle detection in digital images. Int J Gen Syst 16(4):343–355

    Article  MathSciNet  Google Scholar 

  5. Dave RN, Bhaswan K (1992) Adaptive fuzzy c-shells clustering and detection of ellipse. IEEE Trans Neural Netw 3(5):643–662

    Article  Google Scholar 

  6. Dave RN, Krishnapuram R (1997) Robust clustering methods: a unified view. IEEE Trans Fuzzy Syst 5(2):270–293

    Article  Google Scholar 

  7. Gath I, Geva AB (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 11(7):773–781

    Article  Google Scholar 

  8. Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw 13(3):780–784

    Article  Google Scholar 

  9. Cuillén A, Pomares H, Rojas I, Gonzélez J, Herrera LJ, Rojas F, Valenzuela O (2007) Studying possibility in a clustering algorithm for RBFNN design for function approximation. Neural Comput Appl 17(1):75–89

    Article  Google Scholar 

  10. Gustafson EE, Kessel WC (1979) Fuzzy clustering with a fuzzy covariance matrix. In: Proceedings of the IEEE conference decision control. San Diego, CA, pp 761–766

  11. Krishnapuram R, Nasraoui O, Frigui H (1992) The fuzzy c-spherical shells algorithm: a new approach. IEEE Trans Neural Netw 3(5):663–671

    Article  Google Scholar 

  12. Krishnapuram R, Keller JM (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110

    Article  Google Scholar 

  13. Krishnapuram R, Keller JM (1996) The possibilistic c-means algorithm: insights and recommendations. IEEE Trans Fuzzy Syst 4(3):385–393

    Article  Google Scholar 

  14. MacQueen S (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, pp 281–297

  15. Man Y, Gath I (1994) Detection and separation of ring-shaped clusters using fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 16(8):855–861

    Article  Google Scholar 

  16. Müller KR, Mike S, Ratsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel based learning algorithms. IEEE Trans Neural Netw 12(2):181–201

    Article  Google Scholar 

  17. Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clusteirng algorithm. IEEE Trans Fuzzy Syst 13(4):517–530

    Article  MathSciNet  Google Scholar 

  18. Patané G, Russo M (2001) The enhanced LBG algorithm. Neural Netw 14(9):1219–1237

    Article  Google Scholar 

  19. Rose K (1998) Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proc IEEE 86(11):2210–2239

    Article  Google Scholar 

  20. Schölkopf B, Smola AJ, Müller KR (1996) Nonlinear component analysis as a kernel eigenvalue problem. Technical report, Max Planck Institute for Biological Cybernetics, Tubingen, Germany

  21. Selim SZ, Ismail MA (1984) K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell 6(1):81–86

    Article  MATH  Google Scholar 

  22. Song Q (2005) A robust information clustering algorithm. Neural Comput 17(12):2672–2698

    Article  MATH  Google Scholar 

  23. Still S, Bialek W (2004) How many clusters? An information-theoretic perspective. Neural Comput 16:2483–2506

    Article  MATH  Google Scholar 

  24. Yang XL, Song Q, Zhang WB (2006) A kernel-based deterministic annealing algorithm for data clustering. IEE Vis Image Signal Process 153(5):557–568

    Article  Google Scholar 

  25. Yang XL, Song Q, Wang Y (2007) A weighted support vector machine for data classification. Int J Pattern Recogn Artif Intell 21(5):961–976

    Article  Google Scholar 

  26. Zhang JS, Leung YW (2004) Improved possibilistic c-means clustering algorithms. IEEE Trans Fuzzy Syst 12(2):209–217

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors sincerely thank the anonymous reviewers for their insightful comments and valuable suggestions on an earlier version of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu-Lei Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, XL., Song, Q., Wu, YL. et al. A novel pruning approach for robust data clustering. Neural Comput & Applic 18, 759–768 (2009). https://doi.org/10.1007/s00521-009-0281-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-009-0281-z

Keywords

Navigation