Skip to main content
Log in

An efficient ACO-PSO-based framework for data classification and preprocessing in big data

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Big data is prominent for the systematic extraction and analysis of a huge or complex dataset. It is also helpful in the management of data as compared to the traditional data-processing mechanisms. In this paper, an efficient ant colony optimization (ACO) and particle swarm optimization (PSO)-based framework have been proposed for data classification and preprocessing in the big data environment. It shows that the content part can be collaborated and fetched for analysis from the volume and velocity integration. Then weight marking has been done through the volume and the data variety. In the end, the ranking has been done through the velocity and variety aspects of big data. Data preprocessing has been performed from weights assigned on the basis of size, content, and keywords. ACO and PSO are then applied considering different computation aspects like uniform distribution, random initialization, epochs, iterations, and time constraint in case of both minimization and maximization. The weight assignments have been done automatically and through an unbiased random mechanism. It has been done on a scale of 0–1 for all the separated data. Then simple adaptive weight (SAW) method has been applied for prioritization and ranking. The overall average classification accuracy obtained in the case of PSO-SAW is 98%, and in the case of ACO-SAW, it is 95%. PSO-SAW approach outperforms in all cases, in comparison to ACO-SAW.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Lozada N, Arias-Pérez J, Perdomo-Charry G (2019) Big data analytics capability and co-innovation: an empirical study. Heliyon 5(10):e02541

    Article  Google Scholar 

  2. Banchhor C, Srinivasu N (2019) Integrating Cuckoo search-Grey wolf optimization and Correlative Naive Bayes classifier with Map Reduce model for big data classification. Data Knowl Eng 27:101788

    Google Scholar 

  3. Tabesh P, Mousavidin E, Hasani S (2019) Implementing big data strategies: a managerial perspective. Bus Horiz 62(3):347–358

    Article  Google Scholar 

  4. Baig MI, Shuib L, Yadegaridehkordi E (2019) Big data adoption: state of the art and research challenges. Inf Process Manag 56(6):102095

    Article  Google Scholar 

  5. Ghasemaghaei M (2019) Understanding the impact of big data on firm performance: the necessity of conceptually differentiating among big data characteristics. Int J Inf Manag 25:102055

    Google Scholar 

  6. Rabhi L, Falih N, Afraites A, Bouikhalene B (2019) Big data approach and its applications in various fields. Procedia Comput Sci 1(155):599–605

    Article  Google Scholar 

  7. Fong S, Wong R, Vasilakos AV (2015) Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans Serv Comput 9(1):33–45

    Google Scholar 

  8. Sternberg F, Pedersen KH, Ryelund NK, Mukkamala RR, Vatrapu R (2018) Analysing customer engagement of Turkish airlines using big social data. In: 2018 IEEE international congress on big data (BigData Congress) 2018 Jul 2. IEEE, pp 74–81

  9. Mande R, JayaLakshmi G, Yelavarti KC (2018) Leveraging distributed data over big data analytics platform for healthcare services. In: 2018 2nd international conference on trends in electronics and informatics (ICOEI) 2018 May 11. IEEE, pp 1115–1119

  10. Subbalakshmi S, Prabhu CS (2018) Protagonist of big data and predictive analytics using data analytics. In: 2018 International conference on computational techniques, electronics and mechanical systems (CTEMS) 2018 Dec 21. IEEE, pp 276–279

  11. Leung CK, Middleton R, Pazdor AG, Won Y (2018) Mining ‘following’patterns from big but sparsely distributed social network data. In: 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM) 2018 Aug 28. IEEE, pp 916–919

  12. Zhang H, Wang H, Li J, Gao H (2018) A generic data analytics system for manufacturing production. Big Data Min Anal 1(2):160–171

    Article  Google Scholar 

  13. Peng Z (2019) Stocks analysis and prediction using big data analytics. In: 2019 International conference on intelligent transportation, big data and smart city (ICITBS) 2019 Jan 12. IEEE, pp 309–312

  14. Adil B, Abdelhadi F, Mohamed B, Haytam H (2019) A spark based big data analytics framework for competitive intelligence. In: 2019 1st international conference on smart systems and data science (ICSSD) 2019 Oct 3. IEEE, pp 1–6

  15. Elsayed M, Abdelwahab A, Ahdelkader H (2019) A proposed framework for improving analysis of big unstructured data in social media. In: 2019 14th International conference on computer engineering and systems (ICCES) 2019 Dec 17. IEEE, pp 61–65

  16. Li M, Wang H, Li J (2019) Mining conditional functional dependency rules on big data. Big Data Min Anal 3(1):68–84

    Article  Google Scholar 

  17. Ahn S, Couture SV, Cuzzocrea A, Dam K, Grasso GM, Leung CK, McCormick KL, Wodi BH (2019) A fuzzy logic based machine learning tool for supporting big data business analytics in complex artificial intelligence environments. In: 2019 IEEE international conference on fuzzy systems (FUZZ-IEEE) 2019 Jun 23. IEEE, pp 1–6

  18. Al Hadwer A, Gillis D, Rezania D (2019) Big data analytics for higher education in the cloud era. In: 2019 IEEE 4th international conference on big data analytics (ICBDA) 2019 Mar 15. IEEE, pp 203–207

  19. Jha BK, Sivasankari GG, Venugopal KR (2020) Fraud detection and prevention by using big data analytics. In: 2020 Fourth international conference on computing methodologies and communication (ICCMC) 2020 Mar 11. IEEE, pp 267–274

  20. El-Hasnony IM, Barakat SI, Elhoseny M, Mostafa RR (2020) Improved feature selection model for big data analytics. IEEE Access 7(8):66989–67004

    Article  Google Scholar 

  21. Hussin SK, Omar YM, Abdelmageid SM, Marie MI (2020) Traditional machine learning and big data analytics in virtual screening: a comparative study. Int J Adv Comput Res 10(47):72–88

    Article  Google Scholar 

  22. Omollo R, Alago S (2020) Data modeling techniques used for big data in enterprise networks. Int J Adv Technol Eng Explor 7(65):79–92

    Article  Google Scholar 

  23. Dorigo M, Gambardella LM (1997) Ant colonies for the travelling salesman problem. Biosystems 43(2):73–81

    Article  Google Scholar 

  24. Dorigo M, Stützle T (2003) The ant colony optimization metaheuristic: algorithms, applications, and advances. In: Glover F, Kochenberger GA (eds) Handbook of metaheuristics. Springer, Boston, pp 250–285

    Chapter  Google Scholar 

  25. Elhoseny M, Shankar K, Uthayakumar J (2019) Intelligent diagnostic prediction and classification system for chronic kidney disease. Sci Rep 9(1):1–4

    Article  Google Scholar 

  26. Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626

    Article  Google Scholar 

  27. Xue B, Zhang M, Browne WN (2015) A comprehensive comparison on evolutionary feature selection approaches to classification. Int J Comput Intell Appl 14(02):1550008

    Article  Google Scholar 

  28. Tsanas A, Little MA, McSharry PE (2010) A simple filter benchmark for feature selection. J Mach Learn Res 1:1–24

    Google Scholar 

  29. Mladenić D (2005) Feature selection for dimensionality reduction. In: International statistical and optimization perspectives workshop” subspace, latent structure and feature selection” 2005 Feb 23. Springer, Berlin, pp 84–102

  30. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks 1995 Nov 27, vol 4. IEEE, pp 1942–1948

  31. Shi Y, Eberhart RC (1998) Parameter selection in particle swarm optimization. In: International conference on evolutionary programming 1998 Mar 25. Springer, Berlin, pp 591–600

  32. Anupama KS, Gowri SS, Rao BP, Rajesh P (2015) Application of MADM algorithms to network selection. Int J Innov Res Electr Electron Instrum Control Eng 3(6):64–67

    Google Scholar 

  33. Adriyendi A (2015) Multi-attribute decision making using simple additive weighting and weighted product in food choice. Int J Inf Eng Electron Bus 6:8–14

    Google Scholar 

  34. Huang CL (2009) ACO-based hybrid classification system with feature subset selection and model parameters optimization. Neurocomputing 73(1–3):438–448

    Article  Google Scholar 

  35. Sabino JA, Leal JE, Stützle T, Birattari M (2010) A multi-objective ant colony optimization method applied to switch engine scheduling in railroad yards. Pesquisa Oper 30(2):486–514

    Article  Google Scholar 

  36. Juang CF (2010) Combination of particle swarm and ant colony optimization algorithms for fuzzy systems design. Fuzzy Syst 1:195

    Google Scholar 

  37. Fayed HA, Atiya AF (2019) Speed up grid-search for parameter selection of support vector machines. Appl Soft Comput 1(80):202–210

    Article  Google Scholar 

  38. Eberhart RC, Shi Y (2000) Comparing inertia weights and constriction factors in particle swarm optimization. In: Proceedings of the 2000 congress on evolutionary computation. CEC00 (Cat. No. 00TH8512) 2000 Jul 16, vol 1. IEEE, pp 84–88

  39. Zhang W, Ma D, Wei JJ, Liang HF (2014) A parameter selection strategy for particle swarm optimization based on particle positions. Expert Syst Appl 41(7):3576–3584

    Article  Google Scholar 

  40. Xu M, Gu J (2015) Parameter selection for particle swarm optimization based on stochastic multi-objective optimization. In: 2015 Chinese automation congress (CAC) 2015 Nov 27. IEEE, pp 2074–2079

  41. Rezaee Jordehi A, Jasni J (2013) Parameter selection in particle swarm optimisation: a survey. J Exp Theor Artif Intell 25(4):527–542

    Article  Google Scholar 

  42. Patel SK, Sharma AK (2019) Improved PSO based job scheduling algorithm for resource management in grid computing. Int J Adv Technol Eng Explor 6(54):152–161

    Article  Google Scholar 

  43. Wu S (2018) A PID controller parameter tuning method based on improved PSO. Int J Adv Comput Res 8(34):41–46

    Article  Google Scholar 

  44. He Y, Ma WJ, Zhang JP (2016) The parameters selection of PSO algorithm influencing on performance of fault diagnosis. In: MATEC web of conferences 2016, vol 63. EDP Sciences, p 02019

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashutosh Kumar Dubey.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dubey, A.K., Kumar, A. & Agrawal, R. An efficient ACO-PSO-based framework for data classification and preprocessing in big data. Evol. Intel. 14, 909–922 (2021). https://doi.org/10.1007/s12065-020-00477-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-020-00477-7

Keywords

Navigation