A Multiple Fuzzy C-Means Ensemble Cluster Forest for Big Data

Lahmar, Ines; Zaier, Aida; Yahia, Mohamed; Boaullegue, Ridha

doi:10.1007/978-3-030-96305-7_41

Ines Lahmar¹⁶,
Aida Zaier¹⁷,
Mohamed Yahia¹⁸ &
…
Ridha Boaullegue¹⁷

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 420))

Included in the following conference series:

International Conference on Hybrid Intelligent Systems

749 Accesses

Abstract

Over the recent decades, there has been an exponential growth of data streaming from various data sources, such as social networks and data centers. As data grows larger, clustering has become a challenging task. This paper proposes a new multi fuzzy ensemble cluster for big data to address this challenge. This approach is based on the use of the data reduction and the cluster forests (CF) strategy (FCFDR). It comprises two tasks. The first consists of making many clustering instances based on hybridizing feature selection (FS) with instance selection (IS), which are used to select more representative features from big data, to simultaneously reduce the high dimensional data. The selected features are then classified using the objective function. Yielding the initial fuzzy co-association matrix and regularized it. The second task, consists of aggregating the clustering instances into one final result vector, using normalized cut algorithm (Ncut). A big datasets from UCI was used to validate the effectiveness of our approach. The results prove the efficiency of the proposed approach in terms of clustering accuracy and quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Robust Clustering Based Possibilistic Type-2 Fuzzy C-means for Noisy Datasets

Cluster Forests Based Fuzzy C-Means for Data Clustering

Application of Bio-inspired Methods Within Cluster Forest Algorithm

References

Wang, Y., Chen, L., Mei, J.-P.: Incremental fuzzy clustering with multiple medoids for large data. IEEE Trans. Fuzzy Syst. 22(6), 1557–1568 (2014)
Article Google Scholar
Sajana, T., Sheela Rani, C.M., Narayana, K.V.: A survey on clustering techniques for big data mining. Indian J. Sci. Technol. 9, 1–12 (2016)
Article Google Scholar
Li, F., Nath, S.: Scalable data summarization on big data. Distrib. Parallel Databases 32, 313–314 (2014)
Article Google Scholar
Suqin, J., Hongbo, S., Yali, L., Yali, L.: Scalable bootstrap attribute reduction for massive data. Int. J. High Perform. Comput. Networking 12(4), 410–417 (2018)
Article Google Scholar
Mihail, P., James, K., James, B., Alina, Z.: Random projections fuzzy c-means (RPFCM) for big data clustering. In: 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (2015)
Google Scholar
Weiling, C.: A dimension reduction algorithm preserving both global and local clustering structure. Knowl.-Based Syst. 118, 191–203 (2017)
Article Google Scholar
ur Rehman, M.H., et al.: Big data reduction methods: a survey. Data Sci. Eng. 1(1007), 265–284 (2016). https://doi.org/10.1007/s41019-016-0022-0
Fan, J., Sun, Q., Zhou, W.-X., Zhu, Z.: Principal component analysis for big data. Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA, vol. 1, no. 1, pp. 1–20 (2018)
Google Scholar
Ramadevi, G.N., Usharani, K.: Study on dimensionality reduction techniques and application. Department of Computer Science, S.P.M.V.V, Tirupati, India, vol. 4, no. 1, pp. 134–139 (2013)
Google Scholar
Vantuch, T., Snasel, V., Zelinka, I.: Dimensionality reduction method’s comparison based on statistical dependencies. Procedia Comput. Sci. 83, 1025–1031 (2016)
Article Google Scholar
Bhosale, H.S., Gadekar, D.P.: A review paper on big data and hadoop. Int. J. Sci. Res. Publ. India 4(10), 1–6 (2014)
Google Scholar
Manogaran, G., Lopez, D., Thota, C., Abbas, K.M., Pyne, S., Sundarasekar, R.: Big data analytics in healthcare Internet of Things. In: Qudrat-Ullah, H., Tsasis, P. (eds.) Innovative Healthcare Systems for the 21st Century. UCS, pp. 263–284. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55774-8_10
Liu, B., Songrui, H., Dongjian, H., Yin, Z., Mohsen, G.: A spark-based parallel fuzzy c-means segmentation algorithm for agricultural image big data. IEEE Access 7, 42169–42180 (2019)
Article Google Scholar
Lahmar, I., Ben Ayed, A., Ben Halima, M., Alimi, A.M.: Cluster forest based fuzzy logic for massive data clustering. In: Ninth International Conference on Machine Vision (ICMV 2016), Nice, France, 18–20 November 2016. International Society for Optics and Photonics, SPIE, vol. 10341, pp. 103412J-1–103412J-5.7 (2016)
Google Scholar
Rong, M., Gong, D., Gao, X.: Feature selection and its use in big data: challenges, methods, and trends. IEEE Access 7, 19709–19725 (2019)
Article Google Scholar
Xuan, J., et al.: Towards effective bug triage with software data reduction techniques. IEEE Trans. Knowl. Data Eng. 27(1), 264–280 (2015)
Article Google Scholar
Leo, B.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Yan, D., Chen, A., Jordan, M.I.: Cluster forests. Comput. Stat. Data Anal. 66, 178–192 (2013)
Article MathSciNet Google Scholar
Lichman, M.: UCI Machine Learning Repository. Irvine, University of California, Irvine, School of Information and Computer Sciences (2018)
Google Scholar
del Río, S., López, V., Benítez, J.M., Herrera, F.: On the use of mapreduce for imbalanced big data using random forest. Inf. Sci. 285, 112–137 (2014)
Article Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Meila, M., Shortreed, S., Xu, L.: Regularized spectral learning. Technical report, Department of Statistics, University of Washington (2005)
Google Scholar
Diego, G., Sergio, R., Salvador, G., Francisco, H.: Principal components analysis random discretization ensemble for big data. Knowl.-Based Syst. 150, 166–174 (2018)
Article Google Scholar
Ramírez-Gallego, S., Krawczyk, B., García, S., Woźniak, M., Benítez, J.M., Herrera, F.: Nearest neighbor classification for high-speed big data streams using spark. IEEE Trans. Syst. Man Cybern. Syst. 47(10), 2727–2739 (2017)
Article Google Scholar
Huang, D., Wang, C.D., Wu, J.S., Lai, J.H., Kwoh, C.K.: Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans. Knowl. Data Eng. 32(6), 1212–1226 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

MACS Laboratory, University of Gabes, Gabes, Tunisia
Ines Lahmar
Innov’Com Lab, University of Carthage Tunis, 1002, Tunis, Tunisia
Aida Zaier & Ridha Boaullegue
SYSCOM Laboratory ENIT, University Tunis El Manar, 1002, Tunis, Tunisia
Mohamed Yahia

Authors

Ines Lahmar
View author publications
You can also search for this author in PubMed Google Scholar
Aida Zaier
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Yahia
View author publications
You can also search for this author in PubMed Google Scholar
Ridha Boaullegue
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Ajith Abraham
Campus Centre de Créteil, Université Paris-Est Créteil, Créteil, France
Patrick Siarry
Department of Computer Science, Università degli Studi di Milano, Milan, Milano, Italy
Vincenzo Piuri
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Niketa Gandhi
University of Bari, Bari, Italy
Gabriella Casalino
Division of Graduate Studies and Research, Tijuana Institute of Technology, Tijuana, Mexico
Oscar Castillo
Ontario Tech University, Oshawa, ON, Canada
Patrick Hung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lahmar, I., Zaier, A., Yahia, M., Boaullegue, R. (2022). A Multiple Fuzzy C-Means Ensemble Cluster Forest for Big Data. In: Abraham, A., et al. Hybrid Intelligent Systems. HIS 2021. Lecture Notes in Networks and Systems, vol 420. Springer, Cham. https://doi.org/10.1007/978-3-030-96305-7_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-96305-7_41
Published: 04 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96304-0
Online ISBN: 978-3-030-96305-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics