WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network

Hassib, Eslam. M.; El-Desouky, Ali. I.; Labib, Labib. M.; El-kenawy, El-Sayed M.

doi:10.1007/s00500-019-03901-y

WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network

Focus
Published: 11 March 2019

Volume 24, pages 5573–5592, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

Eslam. M. Hassib ORCID: orcid.org/0000-0002-2736-0773¹,
Ali. I. El-Desouky¹,
Labib. M. Labib¹ &
…
El-Sayed M. El-kenawy²

1573 Accesses
81 Citations
Explore all metrics

Abstract

Nowadays, big data plays a substantial part in information knowledge analysis, manipulation, and forecasting. Analyzing and extracting knowledge from such big datasets are a very challenging task due to the imbalance of data distribution, which could lead to a biased classification results and wrong decisions. The standard classifiers are not capable of handling such datasets. Hence, a new technique for dealing with such datasets is required. This paper proposes a novel classification framework for big data that consists of three developed phases. The first phase is the feature selection phase, which uses the Whale optimization algorithm (WOA) for finding the best set of features. The second phase is the preprocessing phase, which uses the SMOTE algorithm and the LSH-SMOTE algorithm for solving the class imbalance problem. Lastly, the third phase is WOA + BRNN algorithm, which is using the Whale optimization algorithm for training a deep learning approach called bidirectional recurrent neural network for the first time. Our proposed algorithm WOA-BRNN has been tested against nine highly imbalanced datasets one of them is big dataset in terms of area under curve (AUC) against four of the most common use machine learning algorithms (Naïve Bayes, AdaBoostM1, decision table, random tree), in addition to GWO-MLP (training multilayer perceptron using Gray Wolf Optimizer), then we test our algorithm over four well-known datasets against GWO-MLP and particle swarm optimization (PSO-MLP), genetic algorithm (GA-MLP), ant colony optimization (ACO-MLP), evolution strategy (ES-MLP), and population-based incremental learning (PBIL-MLP) in terms of classification accuracy. Experimental results proved that our proposed algorithm WOA + BRNN has achieved promising accuracy and high local optima avoidance, and outperformed four of the most common use machine learning algorithms, and GWO-MLP in terms of AUC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

CatBoost for big data: an interdisciplinary review

Article Open access 04 November 2020

Survey on deep learning with class imbalance

Article Open access 19 March 2019

Machine Learning: A Review of the Algorithms and Its Applications

References

Ahmed E et al (2017) The role of big data analytics in Internet of Things. Comput Netw 129:459–471
Article Google Scholar
Al-Smadi M et al (2018) Deep recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. J Comput Sci 27:386–393
Article Google Scholar
Ballabio D, Grisoni F, Todeschini R (2018) Multivariate comparison of classification performance measures. Chemometr Intell Lab Syst 174:33–44
Article Google Scholar
Barrow D, Kourentzes N (2018) The impact of special days in call arrivals forecasting: a neural network approach to modelling special days. Eur J Oper Res 264(3):967–977
Article MathSciNet Google Scholar
Bennin KE et al (2018) Mahakil: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. IEEE Trans Software Eng 44(6):534–550
Article Google Scholar
Chaudhary P, Gupta BB (2017) A novel framework to alleviate dissemination of XSS worms in online social network (OSN) using view segregation. Neural Netw World 27(1):5
Article Google Scholar
Chaudhary P, Gupta S, Gupta BB (2016) Auditing defense against XSS worms in online social network-based web applications. In: Gupta B, Agrawal DP, Yamaguchi S (eds) Handbook of research on modern cryptographic solutions for computer and cyber security. IGI Global, Pennsylvania, pp 216–245
Chapter Google Scholar
Chawla NV et al (2012) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Din S et al (2018) Service orchestration of optimizing continuous features in industrial surveillance using big data based fog-enabled internet of things. IEEE Access 6:21582–21591
Article Google Scholar
Faris H, Aljarah I, Mirjalili S (2016) Training feedforward neural networks using multi-verse optimizer for binary classification problems. Appl Intell 45(2):322–332
Article Google Scholar
Goodfellow I et al (2016) Deep learning, vol 1. MIT Press, Cambridge
MATH Google Scholar
Grover V et al (2018) Creating strategic business value from big data analytics: a research framework. J Manag Inf Syst 35(2):388–423
Article Google Scholar
Guan Y et al (2017) FPGA-based accelerator for long short-term memory recurrent neural networks. In: Design automation conference (ASP-DAC), 2017 22nd Asia and South Pacific. IEEE
Gupta BB (ed) (2018) Computer and cyber security: principles, algorithm, applications, and perspectives. CRC Press, New York
Google Scholar
Haixiang G et al (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Article Google Scholar
Hassib EM et al (2018) LSH-SMOTE: a modified SMOTE algorithm for imbalanced data-sets. Ciência e Técnica Vitivinícola 33:50–65
Google Scholar
Huang W et al (2015) Scalable Gaussian process regression using deep neural networks. In: IJCAI
Huang J et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR, vol 4
Kim JS, Jung S (2015) Implementation of the RBF neural chip with the back-propagation algorithm for on-line learning. Appl Soft Comput 29:233–244
Article Google Scholar
Li J et al (2017) Rare event prediction using similarity majority under-sampling technique. In: International conference on soft computing in data science. Springer, Singapore
Linggard R, Myers DJ, Nightingale C (eds) (2012) Neural networks for vision, speech and natural language, vol 1. Springer, Berlin
MATH Google Scholar
Liu W et al (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26
Article Google Scholar
Manogaran G, Thota C, Lopez D (2018) Human–computer interaction with big data analytics. In: Lopez D, Durai MA (eds) HCI challenges and privacy preservation in big data security. IGI Global, Pennsylvania, pp 1–22
Google Scholar
Mirjalili S (2015) How effective is the Grey Wolf optimizer in training multi-layer perceptrons. Appl Intell 43(1):150–161
Article Google Scholar
Mirjalili S (2016) Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl 27(4):1053–1073
Article Google Scholar
Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67
Article Google Scholar
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Article Google Scholar
Pascanu R, Montufar G, Bengio Y (2013) On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv preprint arXiv:1312.6098
Piri S, Delen D, Liu T (2018) A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets. Decis Support Syst 106:15–29
Article Google Scholar
Plageras AP et al (2017) Efficient large-scale medical data (ehealth big data) analytics in internet of things. In: 2017 IEEE 19th conference on business informatics (CBI), vol 2. IEEE
Plageras AP et al (2018) Efficient IoT-based sensor BIG Data collection—processing and analysis in smart buildings. Future Gener Comput Syst 82:349–357
Article Google Scholar
Pour SG, Girosi F (2016) Joint prediction of chronic conditions onset: comparing multivariate probits with multiclass support vector machines. In: Symposium on conformal and probabilistic prediction with applications. Springer, Cham
Qin P, Xu W, Guo J (2017) Designing an adaptive attention mechanism for relation classification. In: 2017 International joint conference on neural networks (IJCNN). IEEE
Rennie JD et al (2003) Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th international conference on machine learning (icml-03)
Rezaeianzadeh M et al (2014) Flood flow forecasting using ANN, ANFIS and regression models. Neural Comput Appl 25(1):25–37
Article Google Scholar
Sahoo RR, Ray M (2018) Metaheuristic techniques for test case generation: a review. J Inf Technol Res 11(1):158–171
Article Google Scholar
Salehinejad H et al (2017) Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Article Google Scholar
Schuster M, Paliwal KK, Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Ryan EE (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Article Google Scholar
Sivakumar S, Sivakumar S (2017) Marginally stable triangular recurrent neural network architecture for time series prediction. IEEE Trans Cybern 48(10):2836–2850
Article Google Scholar
Sivarajah U et al (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263–286
Article Google Scholar
Song Q, Guo Y, Shepperd M (2018) A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans Software Eng. https://doi.org/10.1109/TSE.2018.2836442
Article Google Scholar
Storey VC, Song I-Y (2017) Big data technologies and management: what conceptual modeling can do. Data Knowl Eng 108:50–67
Article Google Scholar
Voyant C et al (2017) Machine learning methods for solar radiation forecasting: a review. Renewable Energy 105:569–582
Article Google Scholar
Wang L, Zeng Y, Chen T (2015) Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst Appl 42(2):855–863
Article Google Scholar
Wang Y, Kung LA, Byrd TA (2018) Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol Forecast Soc Chang 126:3–13
Article Google Scholar
Warde-Farley D (2018) Feedforward deep architectures for classification and synthesis
Zalesky A et al (2016) Connectome sensitivity or specificity: which is more important? Neuroimage 142:407–420
Article Google Scholar
Zhou L et al (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361
Article Google Scholar

Download references

Acknowledgements

The authors would like to extend their sincere thanks and appreciation to the anonymous reviewers for their valuable comments and feedback, which were extremely helpful in improving the quality of the paper.

Author information

Authors and Affiliations

Computer Engineering and Systems Department, Faculty of Engineering, Mansoura University, Mansoura, Egypt
Eslam. M. Hassib, Ali. I. El-Desouky & Labib. M. Labib
Department of Computer and Systems Engineering, Delta Higher Institute for Engineering &Technology (DHIET), Mansoura, Egypt
El-Sayed M. El-kenawy

Authors

Eslam. M. Hassib
View author publications
You can also search for this author in PubMed Google Scholar
Ali. I. El-Desouky
View author publications
You can also search for this author in PubMed Google Scholar
Labib. M. Labib
View author publications
You can also search for this author in PubMed Google Scholar
El-Sayed M. El-kenawy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eslam. M. Hassib.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by B. B. Gupta.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hassib, E.M., El-Desouky, A.I., Labib, L.M. et al. WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network. Soft Comput 24, 5573–5592 (2020). https://doi.org/10.1007/s00500-019-03901-y

Download citation

Published: 11 March 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s00500-019-03901-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network

Abstract

Access this article

Similar content being viewed by others

CatBoost for big data: an interdisciplinary review

Survey on deep learning with class imbalance

Machine Learning: A Review of the Algorithms and Its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network

Abstract

Access this article

Similar content being viewed by others

CatBoost for big data: an interdisciplinary review

Survey on deep learning with class imbalance

Machine Learning: A Review of the Algorithms and Its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation