Abstract
Over the recent decades, there has been an exponential growth of data streaming from various data sources, such as social networks and data centers. As data grows larger, clustering has become a challenging task. This paper proposes a new multi fuzzy ensemble cluster for big data to address this challenge. This approach is based on the use of the data reduction and the cluster forests (CF) strategy (FCFDR). It comprises two tasks. The first consists of making many clustering instances based on hybridizing feature selection (FS) with instance selection (IS), which are used to select more representative features from big data, to simultaneously reduce the high dimensional data. The selected features are then classified using the objective function. Yielding the initial fuzzy co-association matrix and regularized it. The second task, consists of aggregating the clustering instances into one final result vector, using normalized cut algorithm (Ncut). A big datasets from UCI was used to validate the effectiveness of our approach. The results prove the efficiency of the proposed approach in terms of clustering accuracy and quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wang, Y., Chen, L., Mei, J.-P.: Incremental fuzzy clustering with multiple medoids for large data. IEEE Trans. Fuzzy Syst. 22(6), 1557–1568 (2014)
Sajana, T., Sheela Rani, C.M., Narayana, K.V.: A survey on clustering techniques for big data mining. Indian J. Sci. Technol. 9, 1–12 (2016)
Li, F., Nath, S.: Scalable data summarization on big data. Distrib. Parallel Databases 32, 313–314 (2014)
Suqin, J., Hongbo, S., Yali, L., Yali, L.: Scalable bootstrap attribute reduction for massive data. Int. J. High Perform. Comput. Networking 12(4), 410–417 (2018)
Mihail, P., James, K., James, B., Alina, Z.: Random projections fuzzy c-means (RPFCM) for big data clustering. In: 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (2015)
Weiling, C.: A dimension reduction algorithm preserving both global and local clustering structure. Knowl.-Based Syst. 118, 191–203 (2017)
ur Rehman, M.H., et al.: Big data reduction methods: a survey. Data Sci. Eng. 1(1007), 265–284 (2016). https://doi.org/10.1007/s41019-016-0022-0
Fan, J., Sun, Q., Zhou, W.-X., Zhu, Z.: Principal component analysis for big data. Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA, vol. 1, no. 1, pp. 1–20 (2018)
Ramadevi, G.N., Usharani, K.: Study on dimensionality reduction techniques and application. Department of Computer Science, S.P.M.V.V, Tirupati, India, vol. 4, no. 1, pp. 134–139 (2013)
Vantuch, T., Snasel, V., Zelinka, I.: Dimensionality reduction method’s comparison based on statistical dependencies. Procedia Comput. Sci. 83, 1025–1031 (2016)
Bhosale, H.S., Gadekar, D.P.: A review paper on big data and hadoop. Int. J. Sci. Res. Publ. India 4(10), 1–6 (2014)
Manogaran, G., Lopez, D., Thota, C., Abbas, K.M., Pyne, S., Sundarasekar, R.: Big data analytics in healthcare Internet of Things. In: Qudrat-Ullah, H., Tsasis, P. (eds.) Innovative Healthcare Systems for the 21st Century. UCS, pp. 263–284. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55774-8_10
Liu, B., Songrui, H., Dongjian, H., Yin, Z., Mohsen, G.: A spark-based parallel fuzzy c-means segmentation algorithm for agricultural image big data. IEEE Access 7, 42169–42180 (2019)
Lahmar, I., Ben Ayed, A., Ben Halima, M., Alimi, A.M.: Cluster forest based fuzzy logic for massive data clustering. In: Ninth International Conference on Machine Vision (ICMV 2016), Nice, France, 18–20 November 2016. International Society for Optics and Photonics, SPIE, vol. 10341, pp. 103412J-1–103412J-5.7 (2016)
Rong, M., Gong, D., Gao, X.: Feature selection and its use in big data: challenges, methods, and trends. IEEE Access 7, 19709–19725 (2019)
Xuan, J., et al.: Towards effective bug triage with software data reduction techniques. IEEE Trans. Knowl. Data Eng. 27(1), 264–280 (2015)
Leo, B.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Yan, D., Chen, A., Jordan, M.I.: Cluster forests. Comput. Stat. Data Anal. 66, 178–192 (2013)
Lichman, M.: UCI Machine Learning Repository. Irvine, University of California, Irvine, School of Information and Computer Sciences (2018)
del RĂo, S., LĂłpez, V., BenĂtez, J.M., Herrera, F.: On the use of mapreduce for imbalanced big data using random forest. Inf. Sci. 285, 112–137 (2014)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Meila, M., Shortreed, S., Xu, L.: Regularized spectral learning. Technical report, Department of Statistics, University of Washington (2005)
Diego, G., Sergio, R., Salvador, G., Francisco, H.: Principal components analysis random discretization ensemble for big data. Knowl.-Based Syst. 150, 166–174 (2018)
RamĂrez-Gallego, S., Krawczyk, B., GarcĂa, S., WoĹşniak, M., BenĂtez, J.M., Herrera, F.: Nearest neighbor classification for high-speed big data streams using spark. IEEE Trans. Syst. Man Cybern. Syst. 47(10), 2727–2739 (2017)
Huang, D., Wang, C.D., Wu, J.S., Lai, J.H., Kwoh, C.K.: Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans. Knowl. Data Eng. 32(6), 1212–1226 (2019)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lahmar, I., Zaier, A., Yahia, M., Boaullegue, R. (2022). A Multiple Fuzzy C-Means Ensemble Cluster Forest for Big Data. In: Abraham, A., et al. Hybrid Intelligent Systems. HIS 2021. Lecture Notes in Networks and Systems, vol 420. Springer, Cham. https://doi.org/10.1007/978-3-030-96305-7_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-96305-7_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96304-0
Online ISBN: 978-3-030-96305-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)