Abstract
In information analysis and systematic extraction of complex or huge dataset, big data plays a vital role. The massive growth of large-scale data causes a major issue in big data and hence it is required to classify the big data to solve data imbalance issues. The huge data can be explored in an efficient way by converting it into valuable knowledge and this data can be processed in the distributed environment with different application framework. In recent decades, spark framework gained more significance in big data domain due to its increasing achievement in incremental and iterative approaches. Due to imbalance of data distribution, big data classification with large sized dataset results a challenging task with the conventional methods as it leads wrong decision in generating classification result. In this paper, an efficient Shuffled Student Psychology Optimization_Deep Q network is proposed for big data classification with spark framework in order to overcome the issues faced by the traditional methods. Here, master and slave sets are used to perform unique operations, like data partitioning, feature fusion and data augmentation process in order to accomplish the task of data classification by proposed approach. The developed technique attained the maximum TPR of 0.960, accuracy of 0.942, and TNR of 0.929.
Similar content being viewed by others
Data Availability Statement
The data underlying this article are available in Adult dataset, “https://archive.ics.uci.edu/ml/datasets/Adult”, and Credit Approval dataset, “https://archive.ics.uci.edu/ml/datasets/Credit+Approval”.
References
Lozada, N., Arias-Pérez, J., & Perdomo-Charry, G. (2019). Big data analytics capability and co-innovation: An empirical study. Heliyon, 5(10), e02541.
Banchhor, C., & Srinivasu, N. (2020). Integrating Cuckoo search-Grey wolf optimization and Correlative Naive Bayes classifier with Map Reduce model for big data classification. Data and Knowledge Engineering, 127, 101788.
Tabesh, P., Mousavidin, E., & Hasani, S. (2019). Implementing big data strategies: A managerial perspective. Business Horizons, 62(3), 347–358.
Sathyaraj, R., Ramanathan, L., Lavanya, K., & Balasubramanian, V. (2020). Chicken swarm foraging algorithm for big data classification using the deep belief network classifier. Data Technologies and Applications.
Ramsingh, J., & Bhuvaneswari, V. (2018). An efficient Map reduce-based hybrid NBC-TFIDF algorithm to mine the public sentiment on diabetes mellitus—A big data approach. Journal of King Saud University-Computer and Information Sciences.
Dubey, A. K., Kumar, A., & Agrawal, R. (2020). An efficient ACO-PSO-based framework for data classification and preprocessing in big data. Evolutionary Intelligence, 9, 1–4.
Fong, S., Wong, R., & Vasilakos, A. V. (2015). Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Transactions on Services Computing, 9(1), 33–45.
Maillo, J., Triguero, I., & Herrera, F. (2020). Redundancy and complexity metrics for big data classification: towards smart data. IEEE Access, 8, 87918–87928.
Triguero, I., García-Gil, D., Maillo, J., Luengo, J., García, S., & Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), e1289.
Gokulkumari, G. (2020). An overview of big data management and its applications. Journal of Networking and Communication Systems, 3(3), 11–20.
Jadhav, A. N., & Gomathi, N. (2019). DIGWO: Hybridization of dragonfly algorithm with improved grey wolf optimization algorithm for data clustering. Multimedia Research, 2(3), 1–11.
Arnaiz-González, Á., González-Rogel, A., Díez-Pastor, J. F., & López-Nozal, C. (2017). MR-DIS: Democratic instance selection for big data by MapReduce. Progress in Artificial Intelligence, 6(3), 211–219.
Dean, J., & Ghemawat, S. (2004). Mapreduce: Simplified data processing on large clusters. In Proceedings of OSDI (pp. 137–150).
Aha, D. (1997). Lazy learning. Kluwer.
Lopez, V., del Rio, S., Manuel Benitez, J., & Herrera, F. (2014). On the use of MapReduce to build linguistic fuzzy rule based classification systems for big data. In IEEE international conference on fuzzy systems (FUZZ-IEEE), Beijing.
Mujeeb, S. M., Sam, R. P., & Madhavi, K. (2020). Adaptive hybrid optimization enabled stack autoencoder-based MapReduce framework for big data classification. In Proceedings of international conference on emerging trends in information technology and engineering (ic-ETITE) (pp. 1–5). IEEE.
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. Hot Cloud, 10(95), 10–10.
Tang, S., He, B., Yu, C., Li, Y., & Li, K. (2020). A survey on spark ecosystem: Big data processing infrastructure, machine learning, and applications. IEEE Transactions on Knowledge and Data Engineering.
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., & Stoica, I. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In 9th {USENIX} symposium on networked systems design and implementation (pp. 15–28).
Zarindast, A., & Sharma, A. (2021). Big Data application in congestion detection and classification using Apache spark.
Suthaharan, S. (2014). Big data classification: Problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS—Performance Evaluation Review, 41(4), 70–73.
Ramsingh, J., & Bhuvaneswari, V. (2015). An insight on big data analytics using pig script. International Journal of Emerging Trends and Technology in Computer Science (IJETTCS), 4(6), 2278–6856.
Carlin, S., & Curran, K. (2012). Cloud computing technologies. International Journal of Cloud Computing and Services Science, 1(2), 59.
Mujeeb, S. M., Sam, R. P., & Madhavi, K. (2021). Adaptive exponential bat algorithm and deep learning for big data classification. Sādhanā, 46(1), 1–5.
García-Gil, D., Luengo, J., García, S., & Herrera, F. (2019). Enabling smart data: Noise filtering in big data classification. Information Sciences, 479, 135–152.
Hassib, E. M., El-Desouky, A. I., Labib, L. M., & El-kenawy, E. S. (2020). WOA+ BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network. Soft Computing, 24(8), 5573–5592.
Hernández, G., Zamora, E., Sossa, H., Téllez, G., & Furlán, F. (2020). Hybrid neural networks for big data classification. Neuro Computing, 21(390), 327–340.
Ravindran, S., & Aghila, G. (2020). A data-independent reusable projection (DIRP) technique for dimension reduction in big data classification using k-nearest neighbor (k-NN). National Academy Science Letters, 43(1), 13–21.
Kaveh, A., & Zaerreza, A. ( 2020). Shuffled shepherd optimization method: A new meta-heuristic algorithm. Engineering Computations.
Das, B., Mukherjee, V., & Das, D. (2020). Student psychology based optimization algorithm: A new population based optimization algorithm for solving optimization problems. Advances in Engineering Software, 146, 102804.
Chen, Z., Chen, Y., Wu, L., Cheng, S., & Lin, P. (2019). Deep residual network based fault detection and diagnosis of photovoltaic arrays using current-voltage curves and ambient conditions. Energy Conversion and Management, 198, 111793.
Osborne, J. (2010). Improving your data transformations: Applying the Box–Cox transformation. Practical Assessment, Research, and Evaluation, 15(1), 12.
Feng, Q., Chen, L., Chen, C. P., & Guo, L. (2020). Deep fuzzy clustering—A representation learning approach. IEEE Transactions on Fuzzy Systems, 28(7), 1420–1433.
Sasaki, H., Horiuchi, T., & Kato, S. (2017). A study on vision-based mobile robot learning by deep Q-network. In 2017 56th annual conference of the society of instrument and control engineers of Japan (SICE) (pp. 799–804).
Adult dataset. https://archive.ics.uci.edu/ml/datasets/Adult. Accessed on July 2021
Credit Approval dataset. https://archive.ics.uci.edu/ml/datasets/Credit+Approval. Accessed on July 2021
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kantapalli, B., Markapudi, B.R. SSPO-DQN spark: shuffled student psychology optimization based deep Q network with spark architecture for big data classification. Wireless Netw 29, 369–385 (2023). https://doi.org/10.1007/s11276-022-03103-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11276-022-03103-9