Abstract
Data are getting larger, and most of them are necessary for our businesses. Rapid explosion of data brings us a number of challenges relating to its complexity and how the most important knowledge can be captured in reasonable time. Fuzzy C-means (FCM)—one of the most efficient clustering algorithms which have been widely used in pattern recognition, data compression, image segmentation, computer vision and many other fields—also faces the problem of processing large datasets. In this paper, we propose some novel hybrid clustering algorithms based on incremental clustering and initial selection to tune up FCM for the Big Data problem. The first algorithm determines meshes of rectangle covering data points as the representatives, while the second one considers data points that have high influence to others as the representatives. The representatives are then clustered by FCM, and the new centers are selected as initial ones for clustering of the dataset. Theoretical analyses of the new algorithms including comparison of quality of solutions when clustering the representatives set versus the entire set are examined. The experimental results on both simulated and real datasets show that total computational time of the new methods including time of finding representatives and clustering is faster than those of other relevant algorithms. The validation on clustering quality is also examined. The findings of this paper have great impact and significance to researches in the fields of soft computing and Big Data processing. It is obvious that computing methodologies nowadays are facing with huge amount of diverse and complex data structures. Speed of processing is the main priority when considering effectiveness of a specific method. The findings demonstrated practical algorithms and investigated their characteristics that could be referenced by other researchers in similar applications. The usefulness and significance of this research are clearly demonstrated within the extent of real-life applications.




















Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Aaron, B., Tamir, D., Rishe, N., Kandel, A.: Dynamic incremental fuzzy C-means clustering. In: 6th International Conferences on Pervasive Patterns and Applications (PATTERNS 2014), pp. 28–37 (2014)
Anderson, D.T., Luke, R.H., Keller, J.M.: Speedup of fuzzy clustering through stream processing on graphics processing units. IEEE Trans. Fuzzy Syst. 16(4), 1101–1106 (2008)
Arora S., Chana, I.: A survey of clustering techniques for big data analysis. In: 2014 5th IEEE International Conference on the Next Generation Information Technology Summit (Confluence), pp. 59–65 (2014)
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2), 191–203 (1984)
Borgelt, C., Kruse R.: Speeding up fuzzy clustering with neural network techniques. In: Proceeding of the 12th IEEE International Conference on Fuzzy Systems (FUZZ ‘03), St. Louis, Missouri, USA, Vol. 2, pp. 852–856 (2003)
Cheng, T.W., Goldgof, D.B., Hall, L.O.: Fast fuzzy clustering. Fuzzy Sets Syst. 93(1), 49–56 (1998)
Cuong, B.C., Son, L.H., Chau, H.T.M.: Some context fuzzy clustering methods for classification problems. In: Proceedings of the 2010 Symposium on Information and Communication Technology, Hanoi, Vietnam, pp. 34–40 (2010)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Patt Anal. Mach. Intell. 2, 224–227 (1979)
Dong, Y., Zhuang, Y.: Fuzzy Hierarchical clustering algorithm facing large databases. In: Proceeding of the 5th IEEE World Congress on Intelligent Control and Automation, Hangzhou, China, Vol. 5, pp. 4282–4286 (2004)
Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A.Y., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data: taxonomy & empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)
Fan, J., Li, J.: A fixed suppressed rate selection method for suppressed fuzzy c-means clustering algorithm. Appl. Math. 5, 1275–1283 (2014)
Feng, X.B., Yao, F., Li, Z.G., Yang, X.J.: Improved fuzzy C-means based on the optimal number of clusters. Appl. Mech. Mater. 392, 803–807 (2013)
Gobi, A.F., Pedrycz, W.: The potential of fuzzy neural networks in the realization of approximation reasoning engines. Fuzzy Sets Syst. 157(22), 2954–2973 (2006)
Hall, L.O.: Exploring big data with scalable soft clustering. In: Synergies of Soft Computing and Statistics for Intelligent Data Analysis, pp. 11–15. Springer, Berlin (2013)
Hu, Y., Qu, F., Wen, C.: An unsupervised possibilistic c-means clustering algorithm with data reduction. In: 10th IEEE International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2013), pp. 29–33 (2013)
Hung, M. C., Yang, D.L. An efficient Fuzzy C-means clustering algorithm. In: Proceedings of the IEEE International Conference on Data Mining 2001 (ICDM 2001), San Jose, CA, USA, pp. 225–232 (2001)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Kothari, D., Narayanan, S.T., Devi, K.K.: Extended fuzzy C-means with random sampling techniques for clustering large data. Int. J. Innov. Res. Adv. Eng. 1(1), 1–4 (2014)
Levy, R.: Probabilistic models in the study of language, Ms. University of California, San Diego (2010)
Marsaglia, G.: Random variables and computers. In: Information Theory Statistical Decision Functions Random Process, pp. 499–510 (1962)
Ozturk, C., Hancer, E., Karaboga, D.: Improved clustering criterion for image clustering with artificial bee colony algorithm. Pattern Anal. Appl. 18(3), 587–599 (2015)
Parker, J.K., Hall, L.O.: Accelerating fuzzy-c means using an estimated subsample size. IEEE Trans. Fuzzy Syst. 22(5), 1229–1244 (2014)
Parvin, H., Minaei-Bidgoli, B.: A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm. Pattern Anal. Appl. 18(1), 87–112 (2015)
Qu, F., Hu, Y., Xue, Y., Yang, Y.: A modified possibilistic fuzzy c-means clustering algorithm. In: 2013 IEEE 9th International Conference on Natural Computation (ICNC 2013), pp. 858–862 (2013)
Rahimi S., Zargham M., Thakre A., Chhillar D.: A parallel Fuzzy C-Mean algorithm for image segmentation. In: Proceeding of the IEEE Annual Meeting of the Fuzzy Information Processing Society (NAFIPS ‘04), Vol. 1, pp. 234–237 (2004)
Ramathilagam, S., Devi, R., Kannan, S.R.: Extended fuzzy c-means: an analyzing data clustering problems. Cluster Comput. 16(3), 389–406 (2013)
Sarma, T.H., Viswanath, P., Reddy, B.E.: Speeding-up the kernel k-means clustering method: a prototype based hybrid approach. Pattern Recogn. Lett. 34(5), 564–573 (2013)
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Computational Science and its Applications–ICCSA 2014 (pp. 707–720). Springer International Publishing (2014)
Son, L.H., Cuong, B.C., Lanzi, P.L., Thong, N.T.: A novel intuitionistic fuzzy clustering method for geo-demographic analysis. Expert Syst. Appl. 39(10), 9848–9859 (2012)
Son, L.H., Cuong, B.C., Long, H.V.: Spatial interaction—modification model and applications to geo-demographic analysis. Knowl. Based Syst. 49, 152–170 (2013)
Son, L.H., Lanzi, P.L., Cuong, B.C., Hung, H.A.: Data mining in GIS: A novel context-based fuzzy geographically weighted clustering algorithm. Int. J. Mach. Learn. Comput. 2(3), 235–238 (2012)
Son, L.H.: Enhancing clustering quality of geo-demographic analysis using context fuzzy clustering type-2 and particle swarm optimization. Appl. Soft Comput. 22, 566–584 (2014)
Son, L.H.: HU-FCF: a hybrid user-based fuzzy collaborative filtering method in recommender systems. Expert Syst. Appl. 41(15), 6861–6870 (2014)
Son, L.H.: Optimizing municipal solid waste collection using chaotic particle swarm optimization in GIS based environments: a case study at Danang City, Vietnam. Expert Syst. Appl. 41(18), 8062–8074 (2014)
Son, L.H.: DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets. Expert Syst. Appl. 42(1), 51–66 (2015)
Son, L.H.: Dealing with the new user cold-start problem in recommender systems: a comparative review. Inform. Syst. 58, 87–104 (2015)
Son, L.H.: HU-FCF++: a novel hybrid method for the new user cold-start problem in recommender systems. Eng. Appl. Artif. Intell. 41, 207–222 (2015)
Son, L.H., Linh, N.D., Long, H.V.: A lossless DEM compression for fast retrieval method using fuzzy clustering and MANFIS neural network. Eng. Appl. Artif. Intell. 29, 33–42 (2014)
Son, L.H., Thong, N.T.: Intuitionistic fuzzy recommender systems: an effective tool for medical diagnosis. Knowl.-Based Syst. 74, 133–150 (2015)
Szilágyi, L., Szilágyi, S.M.: Generalization rules for the suppressed fuzzy c-means clustering algorithm. Neurocomputing 139, 298–309 (2014)
Szilagyi, L., Denesi, G., Szilagyi, S.M.: Fast color reduction using approximative c-means clustering models. In: 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 14’), pp. 194–201 (2014)
Taherdangkoo, M., Bagheri, M.H.: A powerful hybrid clustering method based on modified stem cells and Fuzzy C-means algorithms. Eng. Appl. Artif. Intell. 26(5), 1493–1502 (2013)
Thong, N.T., Son, L.H.: HIFCF: an effective hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis. Expert Syst. Appl. 42(7), 3682–3701 (2015)
Thong, P.H., Son, L.H.: A new approach to multi-variables fuzzy forecasting using picture fuzzy clustering and picture fuzzy rules interpolation method. In: Proceeding of 6th International Conference on Knowledge and Systems Engineering (KSE 2014), Hanoi, Vietnam, pp 679–690 (2014)
UCI Machine Learning Repository. (2015). Datasets, Available at: https://archive.ics.uci.edu/ml/datasets.html. Accessed: 11/03/2015
Wang, J., Chung, F.L., Wang, S., Deng, Z.: Double indices-induced FCM clustering and its integration with fuzzy subspace clustering. Pattern Anal. Appl. 17(3), 549–566 (2014)
Wang, Y., Chen, L., Mei, J.P.: Incremental fuzzy clustering with multiple medoids for large data. IEEE Trans. Fuzzy Syst. 22(6), 1557–1568 (2014)
Zang, X., Vista IV, F.P., Chong, K.T.: Fast global kernel fuzzy c-means clustering algorithm for consonant/vowel segmentation of speech signal. J Zhejiang Univ. Sci. C 15(7), 551–563 (2014)
Zhang, Q., Chen, Z.: A weighted kernel possibilistic c-means algorithm based on cloud computing for clustering big data. Int. J. Commun Syst 27(9), 1378–1391 (2014)
Zhang, Z., Havens, T.C.: Scalable approximation of kernel fuzzy c-means. In: 2013 IEEE International Conference on Big Data, pp. 161–168 (2013)
Zhao, Y., Wu, X., Kong, S.G., Zhang, L.: Joint segmentation and pairing of multispectral chromosome images. Pattern Anal. Appl. 16(4), 497–506 (2013)
Acknowledgments
The authors are greatly indebted to the editor-in-chief, Prof. Shun-Feng Su and anonymous reviewers for their comments and their valuable suggestions that improved the quality and clarity of paper. A great thank was dedicated to Msc. Nguyen Duc Thien for his discussion and supports in theoretical validation of this paper. We acknowledge the Center for High Performance Computing, VNU for running the codes in the IBM 1350 system.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Son, L.H., Tien, N.D. Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering. Int. J. Fuzzy Syst. 19, 1585–1602 (2017). https://doi.org/10.1007/s40815-016-0260-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40815-016-0260-3