Skip to main content
Log in

SSPO-DQN spark: shuffled student psychology optimization based deep Q network with spark architecture for big data classification

  • Original Paper
  • Published:
Wireless Networks Aims and scope Submit manuscript

Abstract

In information analysis and systematic extraction of complex or huge dataset, big data plays a vital role. The massive growth of large-scale data causes a major issue in big data and hence it is required to classify the big data to solve data imbalance issues. The huge data can be explored in an efficient way by converting it into valuable knowledge and this data can be processed in the distributed environment with different application framework. In recent decades, spark framework gained more significance in big data domain due to its increasing achievement in incremental and iterative approaches. Due to imbalance of data distribution, big data classification with large sized dataset results a challenging task with the conventional methods as it leads wrong decision in generating classification result. In this paper, an efficient Shuffled Student Psychology Optimization_Deep Q network is proposed for big data classification with spark framework in order to overcome the issues faced by the traditional methods. Here, master and slave sets are used to perform unique operations, like data partitioning, feature fusion and data augmentation process in order to accomplish the task of data classification by proposed approach. The developed technique attained the maximum TPR of 0.960, accuracy of 0.942, and TNR of 0.929.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability Statement

The data underlying this article are available in Adult dataset, “https://archive.ics.uci.edu/ml/datasets/Adult”, and Credit Approval dataset, “https://archive.ics.uci.edu/ml/datasets/Credit+Approval”.

References

  1. Lozada, N., Arias-Pérez, J., & Perdomo-Charry, G. (2019). Big data analytics capability and co-innovation: An empirical study. Heliyon, 5(10), e02541.

    Article  Google Scholar 

  2. Banchhor, C., & Srinivasu, N. (2020). Integrating Cuckoo search-Grey wolf optimization and Correlative Naive Bayes classifier with Map Reduce model for big data classification. Data and Knowledge Engineering, 127, 101788.

    Article  Google Scholar 

  3. Tabesh, P., Mousavidin, E., & Hasani, S. (2019). Implementing big data strategies: A managerial perspective. Business Horizons, 62(3), 347–358.

    Article  Google Scholar 

  4. Sathyaraj, R., Ramanathan, L., Lavanya, K., & Balasubramanian, V. (2020). Chicken swarm foraging algorithm for big data classification using the deep belief network classifier. Data Technologies and Applications.

  5. Ramsingh, J., & Bhuvaneswari, V. (2018). An efficient Map reduce-based hybrid NBC-TFIDF algorithm to mine the public sentiment on diabetes mellitus—A big data approach. Journal of King Saud University-Computer and Information Sciences.

  6. Dubey, A. K., Kumar, A., & Agrawal, R. (2020). An efficient ACO-PSO-based framework for data classification and preprocessing in big data. Evolutionary Intelligence, 9, 1–4.

    Google Scholar 

  7. Fong, S., Wong, R., & Vasilakos, A. V. (2015). Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Transactions on Services Computing, 9(1), 33–45.

    Google Scholar 

  8. Maillo, J., Triguero, I., & Herrera, F. (2020). Redundancy and complexity metrics for big data classification: towards smart data. IEEE Access, 8, 87918–87928.

    Article  Google Scholar 

  9. Triguero, I., García-Gil, D., Maillo, J., Luengo, J., García, S., & Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), e1289.

    Google Scholar 

  10. Gokulkumari, G. (2020). An overview of big data management and its applications. Journal of Networking and Communication Systems, 3(3), 11–20.

    Google Scholar 

  11. Jadhav, A. N., & Gomathi, N. (2019). DIGWO: Hybridization of dragonfly algorithm with improved grey wolf optimization algorithm for data clustering. Multimedia Research, 2(3), 1–11.

    Google Scholar 

  12. Arnaiz-González, Á., González-Rogel, A., Díez-Pastor, J. F., & López-Nozal, C. (2017). MR-DIS: Democratic instance selection for big data by MapReduce. Progress in Artificial Intelligence, 6(3), 211–219.

    Article  Google Scholar 

  13. Dean, J., & Ghemawat, S. (2004). Mapreduce: Simplified data processing on large clusters. In Proceedings of OSDI (pp. 137–150).

  14. Aha, D. (1997). Lazy learning. Kluwer.

    Book  MATH  Google Scholar 

  15. Lopez, V., del Rio, S., Manuel Benitez, J., & Herrera, F. (2014). On the use of MapReduce to build linguistic fuzzy rule based classification systems for big data. In IEEE international conference on fuzzy systems (FUZZ-IEEE), Beijing.

  16. Mujeeb, S. M., Sam, R. P., & Madhavi, K. (2020). Adaptive hybrid optimization enabled stack autoencoder-based MapReduce framework for big data classification. In Proceedings of international conference on emerging trends in information technology and engineering (ic-ETITE) (pp. 1–5). IEEE.

  17. Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. Hot Cloud, 10(95), 10–10.

    Google Scholar 

  18. Tang, S., He, B., Yu, C., Li, Y., & Li, K. (2020). A survey on spark ecosystem: Big data processing infrastructure, machine learning, and applications. IEEE Transactions on Knowledge and Data Engineering.

  19. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., & Stoica, I. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In 9th {USENIX} symposium on networked systems design and implementation (pp. 15–28).

  20. Zarindast, A., & Sharma, A. (2021). Big Data application in congestion detection and classification using Apache spark.

  21. Suthaharan, S. (2014). Big data classification: Problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS—Performance Evaluation Review, 41(4), 70–73.

    Article  Google Scholar 

  22. Ramsingh, J., & Bhuvaneswari, V. (2015). An insight on big data analytics using pig script. International Journal of Emerging Trends and Technology in Computer Science (IJETTCS), 4(6), 2278–6856.

    Google Scholar 

  23. Carlin, S., & Curran, K. (2012). Cloud computing technologies. International Journal of Cloud Computing and Services Science, 1(2), 59.

    Google Scholar 

  24. Mujeeb, S. M., Sam, R. P., & Madhavi, K. (2021). Adaptive exponential bat algorithm and deep learning for big data classification. Sādhanā, 46(1), 1–5.

    Article  MathSciNet  Google Scholar 

  25. García-Gil, D., Luengo, J., García, S., & Herrera, F. (2019). Enabling smart data: Noise filtering in big data classification. Information Sciences, 479, 135–152.

    Article  Google Scholar 

  26. Hassib, E. M., El-Desouky, A. I., Labib, L. M., & El-kenawy, E. S. (2020). WOA+ BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network. Soft Computing, 24(8), 5573–5592.

    Article  Google Scholar 

  27. Hernández, G., Zamora, E., Sossa, H., Téllez, G., & Furlán, F. (2020). Hybrid neural networks for big data classification. Neuro Computing, 21(390), 327–340.

    Google Scholar 

  28. Ravindran, S., & Aghila, G. (2020). A data-independent reusable projection (DIRP) technique for dimension reduction in big data classification using k-nearest neighbor (k-NN). National Academy Science Letters, 43(1), 13–21.

    Article  MathSciNet  Google Scholar 

  29. Kaveh, A., & Zaerreza, A. ( 2020). Shuffled shepherd optimization method: A new meta-heuristic algorithm. Engineering Computations.

  30. Das, B., Mukherjee, V., & Das, D. (2020). Student psychology based optimization algorithm: A new population based optimization algorithm for solving optimization problems. Advances in Engineering Software, 146, 102804.

    Article  Google Scholar 

  31. Chen, Z., Chen, Y., Wu, L., Cheng, S., & Lin, P. (2019). Deep residual network based fault detection and diagnosis of photovoltaic arrays using current-voltage curves and ambient conditions. Energy Conversion and Management, 198, 111793.

    Article  Google Scholar 

  32. Osborne, J. (2010). Improving your data transformations: Applying the Box–Cox transformation. Practical Assessment, Research, and Evaluation, 15(1), 12.

    Google Scholar 

  33. Feng, Q., Chen, L., Chen, C. P., & Guo, L. (2020). Deep fuzzy clustering—A representation learning approach. IEEE Transactions on Fuzzy Systems, 28(7), 1420–1433.

    Google Scholar 

  34. Sasaki, H., Horiuchi, T., & Kato, S. (2017). A study on vision-based mobile robot learning by deep Q-network. In 2017 56th annual conference of the society of instrument and control engineers of Japan (SICE) (pp. 799–804).

  35. Adult dataset. https://archive.ics.uci.edu/ml/datasets/Adult. Accessed on July 2021

  36. Credit Approval dataset. https://archive.ics.uci.edu/ml/datasets/Credit+Approval. Accessed on July 2021

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhaskar Kantapalli.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kantapalli, B., Markapudi, B.R. SSPO-DQN spark: shuffled student psychology optimization based deep Q network with spark architecture for big data classification. Wireless Netw 29, 369–385 (2023). https://doi.org/10.1007/s11276-022-03103-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11276-022-03103-9

Keywords

Navigation