A study of large-scale data clustering based on fuzzy clustering

Li, Yangyang; Yang, Guoli; He, Haiyang; Jiao, Licheng; Shang, Ronghua

doi:10.1007/s00500-015-1698-1

A study of large-scale data clustering based on fuzzy clustering

Methodologies and Application
Published: 12 May 2015

Volume 20, pages 3231–3242, (2016)
Cite this article

Soft Computing Aims and scope Submit manuscript

Yangyang Li¹,
Guoli Yang¹,
Haiyang He¹,
Licheng Jiao¹ &
…
Ronghua Shang¹

749 Accesses
12 Citations
Explore all metrics

Abstract

Large-scale data are any data that cannot be loaded into the main memory of the ordinary. This is not the objective definition of large-scale data, but it is easy to understand what the large-scale data is. We first introduce some present algorithms to clustering large-scale data, some data stream clustering algorithms based on FCM algorithms are also introduced. In this paper, we propose a new structure to cluster large-scale data and two new data stream clustering algorithms based on the structure are propose in Sects. 3 and 4. In our method, we load the objects in the dataset one by one. We set a threshold of the membership, if the membership of one object and a cluster center is bigger than the threshold, the object is assigned to the cluster and the location of nearest cluster center will be updated, else the object is put into the temporary matrix; we call it pool. When the pool is full, we cluster the data in the pool and update the location of cluster centers. The two algorithms are based on the data stream structure. The difference of the two algorithms is the how the objects in the data are weighed. We test our algorithms on handwritten digits images dataset and several large-scale UCI datasets and make a comparison with some presented algorithms. The experiments proved that our algorithm is more suitable to cluster large-scale datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Data clustering: application and trends

Article 27 November 2022

Comprehensive survey on hierarchical clustering algorithms and the recent developments

Article 26 December 2022

References

Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn, pp 343–370. doi:10.1007/BF00116829
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Soc Ind Appl Math, pp 1027–1035. http://dl.acm.org/citation.cfm?id=1283494
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New York. doi:10.1007/978-1-4757-0450-1
Bradley PS, Fayyad UM, Reina C (1998) Scaling clustering algorithms to large databases. KDD. 1998: 9–15. http://www.aaai.org/Library/KDD/1998/kdd98-002.php
Cannon R, Dave J, Bezdek JC (1986) Efficient implementation of fuzzy c-means algorithm. IEEE Tans Patten Anal March Intell PAMI–8(2):248–255. doi:10.1109/TPAMI.1986.4767778
Article MATH Google Scholar
Cheng T, Goldgof D, Hall L (1995) Fast clustering with application to fuzzy rule generation. In: Proceedings of IEEE international conference fuzzy system, Tokyo, Japan, pp 2289–2295. doi:10.1109/FUZZY.1995.409998
Chu C, Kim SK, Lin YA (2007) Map-reduce for machine learning on multicore. Adv Neural Inf Process Syst, 19: 281. http://papers.nips.cc/paper/3150-map-reduce-for-machine-learning-on-multicore
Duda RO, Peter EH, D GS (1999) attern classification. Wiley, New York. http://as.wiley.com/WileyCDA/WileyTitle/productCd-0471056693.html
Edelstein HA (1999) Introduction to data mining and knowledge discovery. 3rd Edition, Crows Corporation, Potomac. Two Crows Corporation. ISBN:1-892095-02-5. http://www.twocrows.com/intro-dm.pdf
Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. In: Proceedings of ACM-SIGMOD international conference management of data (SIGMOD’ 98), ACM Press. New York, pp 73–84. doi:10.1016/S0306-4379(01)00008-4
Han JW, Micheline K, Jian P (2011) Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems. July 2011. ISBN: 978-0123814791. http://web.engr.illinois.edu/~hanj
Hansen H, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79:191–215. doi:10.1007/BF02614317
MathSciNet MATH Google Scholar
Henzinger MR, Raghavan P, Rajagopalan S (1998) Computing on data streams, SRC technical notes. http://www.eecs.harvard.edu/~michaelm/E210/datastreams.pdf
Hathaway RJ, Bezdek JC (2006) Extending fuzzy and probabilistic clustering to very large data sets. Comput Stat Data Anal 51(1):215–234. doi:10.1016/j.csda.2006.02.008
Article MathSciNet MATH Google Scholar
Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28:100–108. doi:10.2307/2346830
Article MATH Google Scholar
Hore P, Hall LO, Goldgof DB (2007) Single pass fuzzy c means. IEEE international fuzzy systems conference, Imperial College, London, UK, 23–26 July, 2007, Proceedings pp 1–7. doi:10.1109/FUZZY.2007.4295372
Hore P, Hall LO, Goldgof DB (2009) A scalable framework for segmenting magnetic resonance images. J Signal Process Syst 54(1–3):183–203. doi:10.1007/s11265-008-0243-1
Article Google Scholar
Huber PJ (1996) Massive data sets workshop: the morning after[C] Massive data sets. In: Proceedings of a workshop. National Academy Press, Washington, DC. http://www.nap.edu/openbook.php?record_id=5505&page=169
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. Wiley, New York. doi:10.1002/9780470316801
Google Scholar
Kolen J, Hutcheson T (2002) Reducing the time complexity of fuzzy c-mean algorithm. IEEE Tans Fuzzy Syst 10(2):263–267. doi:10.1109/91.995126
Article Google Scholar
Ng RT, Han J (2002) CLARANS: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng, 14(5), pp 1003–1016. doi:10.1109/TKDE.2002.1033770
Richard OD (2008) Sequential k-means clustering. http://www.cs.princeton.edu/courses/archive/fall08/cos436/Duda/C/sk_means.htm
Shankar BU, Pal NR FFCM (1994) An effective approach for large data sets. In: Proceedings of international conference fuzzy logic neural nets soft comput., Fukuoka, Japan, pp 332. http://www.researchgate.net/publication/246178981_Ffcm_An_effective_approach_for_large_data_sets
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington. doi:10.1145/507338.507355
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings ACM SIGMOD conference, Montreal, Canada, pp 103–114. doi:10.1145/233269.233324
Zhong S (2005) Efficient online spherical k-means clustering. Neural Networks, IJCNN’05. Proceedings. IEEE international joint conference, 5: 3180-3185. doi:10.1109/IJCNN.2005.1556436

Download references

Acknowledgments

This work was supported by the Program for New Century Excellent Talents in University (No. NCET-12-0920), the Program for New Scientific and Technological Star of Shaanxi Province (No. 2014KJXX-45), the National Natural Science Foundation of China (Nos. 61272279, 61272282, 61371201, and 61203303), the Fundamental Research Funds for the Central Universities (Nos. K5051302049, K5051302023, K50511020011, K5051302002 and K5051302028), the Provincial Natural Science Foundation of Shaanxi of China (No. 2011JQ8020), the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) (No. B07048) and EU IRSES project (No. 247619).

Author information

Authors and Affiliations

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, International Research Center for Intelligent Perception and Computation, Xidian University, Xi’an, 710071, China
Yangyang Li, Guoli Yang, Haiyang He, Licheng Jiao & Ronghua Shang

Authors

Yangyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Guoli Yang
View author publications
You can also search for this author in PubMed Google Scholar
Haiyang He
View author publications
You can also search for this author in PubMed Google Scholar
Licheng Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Ronghua Shang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yangyang Li.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Yang, G., He, H. et al. A study of large-scale data clustering based on fuzzy clustering. Soft Comput 20, 3231–3242 (2016). https://doi.org/10.1007/s00500-015-1698-1

Download citation

Published: 12 May 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s00500-015-1698-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A study of large-scale data clustering based on fuzzy clustering

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Data clustering: application and trends

Comprehensive survey on hierarchical clustering algorithms and the recent developments

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A study of large-scale data clustering based on fuzzy clustering

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Data clustering: application and trends

Comprehensive survey on hierarchical clustering algorithms and the recent developments

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation