Incremental Kernel Fuzzy c-Means

Havens, Timothy C.; Bezdek, James C.; Palaniswami, Marimuthu

doi:10.1007/978-3-642-27534-0_1

Timothy C. Havens⁵,
James C. Bezdek⁶ &
Marimuthu Palaniswami⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 399))

Included in the following conference series:

International Joint Conference on Computational Intelligence

989 Accesses
7 Citations

Abstract

The size of everyday data sets is outpacing the capability of computational hardware to analyze these data sets. Social networking and mobile computing alone are producing data sets that are growing by terabytes every day. Because these data often cannot be loaded into a computer’s working memory, most literal algorithms (algorithms that require access to the full data set) cannot be used. One type of pattern recognition and data mining method that is used to analyze databases is clustering; thus, clustering algorithms that can be used on large data sets are important and useful. We focus on a specific type of clustering: kernelized fuzzy c-means (KFCM). The literal KFCM algorithm has a memory requirement of O(n ²), where n is the number objects in the data set. Thus, even data sets that have nearly 1,000,000 objects require terabytes of working memory—infeasible for most computers. One way to attack this problem is by using incremental algorithms; these algorithms sequentially process chunks or samples of the data, combining the results from each chunk. Here we propose three new incremental KFCM algorithms: rseKFCM, spKFCM, and oKFCM. We assess the performance of these algorithms by, first, comparing their clustering results to that of the literal KFCM and, second, by showing that these algorithms can produce reasonable partitions of large data sets. In summary, the rseKFCM is the most efficient of the three, exhibiting significant speedup at low sampling rates. The oKFCM algorithm seems to produce the most accurate approximation of KFCM, but at a cost of low efficiency. Our recommendation is to use rseKFCM at the highest sample rate allowable for your computational and problem needs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Belabbas, M., Wolfe, P.: Spectral methods in machine learning and new strategies for very large datasets. Proc. National Academy of Sciences 106(2), 369–374 (2009)
Article Google Scholar
Bezdek, J.: A convergence theorem for the fuzzy isodata clustering algorithms. IEEE Trans. Pattern Analysis and Machine Intelligence 2, 1–8 (1980)
Article MATH Google Scholar
Bezdek, J.: Pattern Recognition With Fuzzy Objective Function Algorithms. Plenum, New York (1981)
Book MATH Google Scholar
Bezdek, J., Hathaway, R.: Convergence of alternating optmization. Nueral, Parallel, and Scientific Computations 11(4), 351–368 (2003)
MATH Google Scholar
Bezdek, J., Keller, J., Krishnapuram, R., Pal, N.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer, Norwell (1999)
Book MATH Google Scholar
Bo, W., Nevatia, R.: Cluster boosted tree classifier for multi-view, multi-pose object detection. In: Proc. ICCV (October 2007)
Google Scholar
Cannon, R., Dave, J., Bezdek, J.: Efficient implementation of the fuzzy c-means algorithm. IEEE Trans. Pattern Analysis and Machine Intelligence 8, 248–255 (1986)
Article MATH Google Scholar
Cheng, T., Goldgof, D., Hall, L.: Fast clustering with application to fuzzy rule generation. In: Proc. IEEE Int. Conf. Fuzzy Systems, Tokyo, Japan, pp. 2289–2295 (1995)
Google Scholar
Chitta, R., Jin, R., Havens, T., Jain, A.: Approximate kernel k-means: Solution to large scale kernel clustering. In: Proc. ACM SIGKDD (2011)
Google Scholar
Dhillon, I., Guan, Y., Kulis, B.: Kernel k-means, spectral clustering, and normalized cuts. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery Data Mining, pp. 551–556 (August 2004)
Google Scholar
Drineas, P., Mahoney, M.: On the nystrom method for appoximating a gram matrix for improved kernel-based learning. The J. of Machine Learning Research 6, 2153–2175 (2005)
MATH Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley-Interscience (October 2000)
Google Scholar
Eschrich, S., Ke, J., Hall, L., Goldgof, D.: Fast accurate fuzzy clustering through data reduction. IEEE Trans. Fuzzy Systems 11, 262–269 (2003)
Article Google Scholar
Frigui, H.: Simultaneous Clustering and Feature Discrimination with Applications. In: Advances in Fuzzy Clustering and Feature Discrimination with Applications, pp. 285–312. John Wiley and Sons (2007)
Google Scholar
Hartigan, J.: Clustering Algorithms. Wiley, New York (1975)
MATH Google Scholar
Hathaway, R., Bezdek, J.: NERF c-MEANS: Non-euclidean relational fuzzy clustering. Pattern Recognition 27, 429–437 (1994)
Article Google Scholar
Hathaway, R., Bezdek, J.: Extending fuzzy and probabilistic clustering to very large data sets. Computational Statistics and Data Analysis 51, 215–234 (2006)
Article MathSciNet MATH Google Scholar
Hathaway, R., Bezdek, J., Tucker, W.: An improved convergence theory for the fuzzy isodata clustering algorithms. In: Bezdek, J. (ed.) Analysis of Fuzzy Information, vol. 3, pp. 123–132. CRC Press, Boca Raton (1987)
Google Scholar
Hathaway, R., Davenport, J., Bezdek, J.: Relational duals of the c-means clustering algorithms. Pattern Recognition 22(2), 205–212 (1989)
Article MathSciNet MATH Google Scholar
Hathaway, R., Huband, J., Bezdek, J.: A kernelized non-euclidean relational fuzzy c-means algorithm. In: Proc. IEEE Int. Conf. Fuzzy Systems, pp. 414–419 (2005)
Google Scholar
Havens, T., Chitta, R., Jain, A., Jin, R.: Speedup of fuzzy and possibilistic c-means for large-scale clustering. In: Proc. IEEE Int. Conf. Fuzzy Systems, Taipei, Taiwan (2011)
Google Scholar
Hore, P., Hall, L., Goldgof, D.: Single pass fuzzy c means. In: Proc. IEEE Int. Conf. Fuzzy Systems, London, England, pp. 1–7 (2007)
Google Scholar
Hore, P., Hall, L., Goldgof, D., Gu, Y., Maudsley, A.: A scalable framework for segmenting magentic resonance images. J. Signal Process. Syst. 54(1-3), 183–203 (2009)
Article Google Scholar
Huber, P.: Massive Data Sets Workshop: The Morning After. In: Massive Data Sets, pp. 169–184. National Academy Press (1997)
Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classification 2, 193–218 (1985)
Article MATH Google Scholar
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
MATH Google Scholar
Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)
Article Google Scholar
Johnson, S.: Hierarchical clustering schemes. Psychometrika 2, 241–254 (1967)
Article MATH Google Scholar
Khan, S., Situ, G., Decker, K., Schmidt, C.: Go Figure: Automated Gene Ontology annotation. Bioinf. 19(18), 2484–2485 (2003)
Google Scholar
Kolen, J., Hutcheson, T.: Reducing the time complexity of the fuzzy c-means algorithm. IEEE Trans. Fuzzy Systems 10, 263–267 (2002)
Article Google Scholar
Krishnapuram, R., Keller, J.: A possibilistic approach to clustering. IEEE Trans. on Fuzzy Sys. 1(2) (May 1993)
Google Scholar
Kumar, S., Mohri, M., Talwalkar, A.: Sampling techniques for the nystrom method. In: Proc. Conf. Artificial Intelligence and Statistics, pp. 304–311 (2009)
Google Scholar
Lloyd, S.: Least square quantization in pcm. Tech. rep., Bell Telephone Laboratories (1957)
Google Scholar
Lloyd, S.: Least square quantization in pcm. IEEE Trans. Information Theory 28(2), 129–137 (1982)
Article MathSciNet MATH Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symp. Math. Stat. and Prob., pp. 281–297. University of California Press (1967)
Google Scholar
Pal, N., Bezdek, J.: Complexity reduction for “large image” processing. IEEE Trans. Systems, Man, and Cybernetics B (32), 598–611 (2002)
Article Google Scholar
Provost, F., Jensen, D., Oates, T.: Efficient progressive sampling. In: Proc. KDDM, pp. 23–32 (1999)
Google Scholar
Rand, W.: Objective criteria for the evaluation of clustering methods. J. Amer. Stat. Asooc. 66(336), 846–850 (1971)
Article Google Scholar
Shankar, B.U., Pal, N.: FFCM: an effective approach for large data sets. In: Proc. Int. Conf. Fuzzy Logic, Neural Nets, and Soft Computing, Fukuoka, Japan, p. 332 (1994)
Google Scholar
The UniProt Consotium: The universal protein resource (UniProt). Nucleic Acids Res. 35, D193–D197 (2007)
Google Scholar
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, San Diego (2009)
MATH Google Scholar
Tucker, W.: Counterexamples to the convergence theorem for fuzzy isodata clustering algorithms. In: Bezdek, J. (ed.) Analysis of Fuzzy Information, vol. 3, pp. 109–122. CRC Press, Boca Raton (1987)
Google Scholar
Wu, Z., Xie, W., Yu, J.: Fuzzy c-means clustering algorithm based on kernel method. In: Proc. Int. Conf. Computational Intelligence and Multimedia Applications, pp. 49–54 (September 2003)
Google Scholar
Xu, R., Wunsch II, D.: Clustering. IEEE Press, Psicataway (2009)
MATH Google Scholar
Zhang, R., Rudnicky, A.: A large scale clustering scheme for kernel k-means. In: Proc. Int. Conf. Pattern Recognition, pp. 289–292 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Michigan State University, East Lansing, MI, 48824, U.S.A.
Timothy C. Havens
University of Melbourne, Parkville, Victoria, 3010, Australia
James C. Bezdek & Marimuthu Palaniswami

Authors

Timothy C. Havens
View author publications
You can also search for this author in PubMed Google Scholar
James C. Bezdek
View author publications
You can also search for this author in PubMed Google Scholar
Marimuthu Palaniswami
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Timothy C. Havens .

Editor information

Editors and Affiliations

, Images, Signals and Intelligence, University Paris-Est Creteil (UPEC), LISSI EA 3956, Paris, 77127, France
Kurosh Madani
, Departamento de Engenharia Informatica, University of Coimbra, Polo II - Pinhal de Marrocos, Coimbra, 3030, Portugal
António Dourado Correia
Systems and Robotics Institute, Evolutionary Systems and Biomedical, Instituto Superior Tecnico IST, Av. Rovisco Pais, Lisboa, 1049-001, Portugal
Agostinho Rosa
INSTICC, Polytechnic Institute of Setúbal, Setubal, 2910-595, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Havens, T.C., Bezdek, J.C., Palaniswami, M. (2012). Incremental Kernel Fuzzy c-Means. In: Madani, K., Dourado Correia, A., Rosa, A., Filipe, J. (eds) Computational Intelligence. IJCCI 2010. Studies in Computational Intelligence, vol 399. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27534-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-27534-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27533-3
Online ISBN: 978-3-642-27534-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics