Abstract
In recent years, big data plays a significant role in data storage development in high demand. The big data consists of a large number of datasets and it becomes trouble in handling large traditional based database management systems. Big data turns out to be more populous; since it has the capability in managing different data sources and formats under several advanced technologies. On the other hand, few research works are ineffective while dealing with today’s issues. So to overcome such shortcomings, this paper proposes a novel adaptive hybrid mutation black widow optimization (AHMBWO) based clustering approach for distributed data management system in HDFS. Also, the proposed AHMBWO approach summarizes three different phases namely the construction of resource description framework (RDF) graphs, AHMBWO based clustering approach for distributed data management system in HDFS as well as placement and partition for handling and managing the distribution of data. In addition to this, seven test functions are employed to compute the performances of the proposed AHMBWO algorithm. Then the evaluation results based on the clustering process of the proposed AHMBWO with several other approaches such as BWO, PSO, GA and BBO to test the validity of various approaches for the respective datasets. The experimental analysis reveals that the proposed AHMBWO approach provides better performances with less execution time when compared with all other approaches.
Similar content being viewed by others
References
Sreedhar, C., Kasiviswanath, N., & Reddy, P. C. (2017). Clustering large datasets using K-means modified inter and intra clustering (KM-I2C) in Hadoop. Journal of Big Data, 4(1), 27
Zhou, W., Feng, D., Tan, Z., & Zheng, Y. (2018). Improving big data storage performance in hybrid environment. Journal of Computational Science, 26, 409–418
Sreedhar, C., Kasiviswanath, N., & Reddy, P. C. (2015). A survey on big data management and job scheduling. International Journal of Computers and Applications, 130(13), 41–49
Sun, G., Joo, Y., Chen, Y., Chen, Y., & Xie, Y. (2014). A hybrid solid-state storage architecture for the performance, energy consumption, and lifetime improvement. In Emerging memory technologies (pp. 51–77). Springer.
Maheswari, K., & Ramakrishnan, M. (2019). Kernelized spectral clustering based conditional map reduce function with big data. International Journal of Computers and Applications, 2019, 1–11
Badri, S. J. (2019). A novel map-scan-reduce based density peaks clustering and privacy protection approach for large datasets. International Journal of Computers and Applications, 2019, 1–11
Ming, Y., Zhu, E., Wang, M., Liu, Q., Liu, X., & Yin, J. (2019). Scalable k-means for large-scale clustering. Intelligent Data Analysis, 23(4), 825–838
Katal, A., Wazid, M., & Goudar, R. H. (2013). Big data: Issues, challenges, tools and good practices. In 2013 6th international conference on contemporary computing (IC3). IEEE.
Kumar, D., & Jha, V. K. (2020). An improved query optimization process in big data using ACO-GA algorithm and HDFS map reduce technique. Distributed and Parallel Databases, 2020, 1–18
Siddiqui, I. F., Qureshi, N. M. F., Chowdhry, B. S., & Uqaili, M. A. (2020). Pseudo cache based IoT small files management framework in HDFS cluster. Wireless Personal Communications, 113(3), 1495–1522
Maghsoudloo, M., Khoshavi, N., & Elastic, H. D. F. S. (2020). Interconnected distributed architecture for availability–scalability enhancement of large-scale cloud storages. The Journal of Supercomputing, 76(1), 174–203
Jin, R., Kou, C., Liu, R., & Li, Y. (2013). Efficient parallel spectral clustering algorithm design for large data sets under cloud computing environment. Journal of Cloud Computing: Advances, Systems and Applications, 2(1), 18
Tang, Y., Fan, A., Wang, Y., & Yao, Y. (2014). mDHT: A multi-level-indexed DHT algorithm to extra-large-scale data retrieval on HDFS/Hadoop architecture. Personal and Ubiquitous Computing, 18(8), 1835–1844
Ansari, Z., Afzal, A., & Sardar, T. H. (2019). Data categorization using hadoop MapReduce-based parallel K-means clustering. Journal of The Institution of Engineers (India): Series B, 100(2), 95–103
Sinha, A., & Jana, P. K. (2018). A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets. The Journal of Supercomputing, 74(4), 1562–1579
Xuan, P., Ligon, W. B., Srimani, P. K., Ge, R., & Luo, F. (2017). Accelerating big data analytics on HPC clusters using two-level storage. Parallel Computing, 61, 18–34
Singh, H., & Bawa, S. (2017). A MapReduce-based scalable discovery and indexing of structured big data. Future Generation Computer Systems, 73, 32–43
Wang, M., & Zhang, Q. (2020). Optimized data storage algorithm of IoT based on cloud computing in distributed system. Computer Communications, 157, 124–131
Hajeer, M., Dasgupta, D., Semenov, A., & Veijalainen, J, (2014). Distributed evolutionary approach to data clustering and modelling. In 2014 IEEE symposium computational intelligence and data mining (CIDM).
Huang, J., Abadi, D. J., & Ren, K. (2011). Scalable SPARQL querying of large RDF graphs. Proceedings of the VLDB Endowment, 4(11), 1123–2113
Hajeer, M., & Dasgupta, D. (2017). Handling big data using a data-aware HDFS and evolutionary clustering technique. IEEE Transactions on Big Data, 5(2), 134–147
Sebastian, P. A., & Peter, K. V. (2009). Spiders of India. Universities Press, India. Retrieved https://books.google.com/books?id=9oVHO-3ZGx4C
Hayyolalam, V., & Kazem, A. A. P. (2020). Black widow optimization algorithm: A novel meta-heuristic approach for solving engineering optimization problems. Engineering Applications of Artificial Intelligence, 87, 103249
Hamdan, M. (2010). On the disruption-level of polynomial mutation for evolutionary multi-objective optimisation algorithms. Computers, Informatics, 29(5), 783–800
Zhou, C., Gao, H. B., Gao, L., & Zhang, W.-G. (2003). Particle swarm optimization (PSO) algorithm. Application Research of Computers, 12, 7–11
Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern recognition, 33(9), 1455–1465
Rahmati, S. H. A., & Zandieh, M. (2012). A new biogeography-based optimization (BBO) algorithm for the flexible job shop scheduling problem. The International Journal of Advanced Manufacturing Technology, 58(9–12), 1115–1129
Ackermann, M. R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., & Sohler, C. (2012). StreamKM++: A clustering algorithm for data streams. ACM Journal of Experimental Algorithmics, 17(1), 327–338
Sirmacek, B., & Kivits, M. (2019). Semantic segmentation of skin lesions using a small data set. Preprint arXiv:1910.10534.
Sundararaj, V., Muthukumar, S., & Kumar, R. S. (2018). An optimal cluster formation based energy efficient dynamic scheduling hybrid MAC protocol for heavy traffic load in wireless sensor networks. Computers and Security, 77, 277–288
Ravikumar, S., & Kavitha, D. (2020). IoT based home monitoring system with secure data storage by Keccak–Chaotic sequence in cloud server. Journal of Ambient Intelligence and Humanized Computing, 2020, 1–13
Sundararaj, V. (2017). Optimized denoising scheme via opposition based self-adaptive learning PSO algorithm for wavelet based ECG signal noise reduction. International Journal of Biomedical Engineering and Technology, 1(1), 1
Rejeesh, M. R. (2019). Interest point based face recognition using adaptive neuro fuzzy inference system. Multimedia Tools and Applications, 78(16), 22691–22710
Sundararaj, V. (2016). An efficient threshold prediction scheme for wavelet based ECG signal noise reduction using variable step size firefly algorithm. International Journal of Intelligent Engineering and Systems, 9(3), 117–126
Vinu, S. (2019). Optimal task assignment in mobile cloud computing by queue based ant-bee algorithm. Wireless Personal Communications, 104(1), 173–197
Rejeesh, M. R., & Thejaswini, P. (2020). MOTF: Multi-objective optimal trilateral filtering based partial moving frame algorithm for image denoising. Multimedia Tools and Applications, 79(37), 28411–28430
Sundararaj, V., Anoop, V., Dixit, P., Arjaria, A., Chourasia, U., Bhambri, P., Rejeesh, M. R., & Sundararaj, R. (2020). CCGPA-MPPT: Cauchy preferential crossover-based global pollination algorithm for MPPT in photovoltaic system. Progress in Photovoltaics: Research and Applications, 28(11), 1128–1145
Jose, J., Gautam, N., Tiwari, M., Tiwari, T., Suresh, A., Sundararaj, V., & Rejeesh, M. R. (2021). An image quality enhancement scheme employing adolescent identity search algorithm in the NSST domain for multimodal medical image fusion. Biomedical Signal Processing and Control, 66, 102480
Kavitha, D., & Ravikumar, S. (2021). IOT and context-aware learning-based optimal neural network model for real-time health monitoring. Transactions on Emerging Telecommunications Technologies, 32(1), e4132
Kavitha, D., & Ravikumar, S. (2015). A survey of different software security attacks and risk analysis based on security threats. International Journal of Innovative Research in Computer and Communication Engineering, 3, 3452–3458
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ravikumar, S., Kavitha, D. A New Adaptive Hybrid Mutation Black Widow Clustering Based Data Partitioning for Big Data Analysis. Wireless Pers Commun 120, 1313–1339 (2021). https://doi.org/10.1007/s11277-021-08516-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-021-08516-x