Skip to main content
Log in

An Optimized Ensemble Support Vector Machine-Based Extreme Learning Model for Real-Time Big Data Analytics and Disaster Prediction

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

The capacity to interact with environments, understand them, and make judgments on time defines smartness, the foundation of smart cities, and civilizations. The main motivation of this study is to satisfy the need for a real-time disaster-related application that increases the demand for novel techniques that are scalable with big data. The main aim of this paper is to analyze the input data to find the crucial features and accurately classify them into their appropriate disaster class with the help of social media. The disaster dataset contains numerous features which increase the dimensionality of the dataset. The existing techniques consume higher runtime memory for large training datasets and suffered from different drawbacks such as oversampling, computational cost, low speed, data imbalance, concept drift, and computational complexity. To overcome these drawbacks, this study presents a novel city councils evolution (CCE)-optimized ensemble support vector machine-based extreme learning machine (ESVM-ELM) model on Apache Spark for predicting disaster events in big data. The traditional serial processing issue is overcome in this paper using an appropriate parallelization technique which improves the speedup of the model and improves the time taken for classification. The ESVM-ELM model performs well with imbalanced datasets and handles the concept drift problem efficiently. The use of the CCE algorithm for optimizing the ESVM-ELM model offers improved accuracy, a better convergence rate, and minimal computational complexity. The efficiency of our model is demonstrated by validation using the disaster tweets dataset and comparison with the four underlying approaches, namely, naïve Bayes, ELM, FCM, and Log-Based Abnormal Task Detection. The cross-validation method is utilized in this paper to generate an ensemble of ELM classifiers for decision-making utilizing an ESVM-ELM algorithm. The proposed model offers improvements in terms of accuracy, precision, recall, and F-measure values when compared to different baseline models. The experimental results demonstrated the efficiency of the ESVM-ELM model in improving the prediction accuracy, speedup, and scale-up for big data classification with reasonable processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Code Availability

Not applicable.

References

  1. Chen J, Li K, Tang Z, Bilal K, Yu S, Weng C, Li K. A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Trans Parallel Distrib Syst. 2016;28(4):919–33.

    Article  Google Scholar 

  2. García-Gil D, Ramírez-Gallego S, García S, Herrera F. A comparison of scalability for batch big data processing on Apache Spark and Apache Flink. Big Data Anal. 2017;2(1):1–11.

    Article  Google Scholar 

  3. Assefi M, Behravesh E, Liu G,  Tafti AP. December. Big data machine learning using Apache Spark MLlib. In 2017 IEEE international conference on big data (big data) 2017;3492–3498. IEEE

  4. Nair LR, Shetty SD, Shetty SD. Applying spark-based machine learning model on streaming big data for health status prediction. Comput Electr Eng. 2018;65:393–9.

    Article  Google Scholar 

  5. Fu J, Sun J, Wang K. December. Spark–a big data processing platform for machine learning. In 2016 International Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII) 2016;48–51. IEEE.

  6. Salloum S, Dautov R, Chen X, Peng PX, Huang JZ. Big data analytics on Apache Spark. Int J Data Sci Anal. 2016;1(3):145–64.

    Article  Google Scholar 

  7. Shoro AG, Soomro TR. Big data analysis: Apache Spark perspective. Glob J Comput Sci Technol. 2015.

  8. Alsheikh MA, Niyato D, Lin S, Tan HP, Han Z. Mobile big data analytics using deep learning and apache spark. IEEE Network. 2016;30(3):22–9.

    Article  Google Scholar 

  9. Daghistani T, AlGhamdi H, Alshammari R, AlHazme RH. Predictors of outpatients’ no-show: big data analytics using Apache Spark. J Big Data. 2020;7(1):1–15.

    Article  Google Scholar 

  10. Mitra A, Bera B, Das AK, Jamal SS, You I. Impact on blockchain-based AI/ML-enabled big data analytics for cognitive Internet of Things environment. Comput Commun. 2023;197:173–85.

    Article  Google Scholar 

  11. Alotaibi S, Mehmood R, Katib I, Rana O,  Albeshri A. Sehaa: a big data analytics tool for healthcare symptoms and diseases detection using Twitter, Apache Spark, and machine learning. Appl Sci. 2020;10(4), p.1398.2.

  12. Kadkhodaei H, Moghadam AME, Dehghan M. Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm. Expert Syst Appl. 2021;183: 115369.

    Article  Google Scholar 

  13. Fernandez-Basso C, Ruiz MD, Martin-Bautista MJ. Spark solutions for discovering fuzzy association rules in big data. Int J Approximate Reasoning. 2021;137:94–112.

    Article  MathSciNet  MATH  Google Scholar 

  14. Mansour RF, Abdel-Khalek S, Hilali-Jaghdam I, Nebhen J, Cho W, Joshi GP. An intelligent outlier detection with machine learning empowered big data analytics for mobile edge computing. Clust Comput. 2021;1–13.

  15. Kumar A, Jaiswal A. A deep swarm-optimized model for leveraging industrial data analytics in cognitive manufacturing. IEEE Trans Industr Inf. 2020;17(4):2938–46.

    Article  Google Scholar 

  16. Islam MT, Srirama SN, Karunasekera S, Buyya R. Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J Syst Softw. 2020;162: 110515.

    Article  Google Scholar 

  17. Hadi MS, Lawey AQ, El-Gorashi TE, Elmirghani JM. Patient-centric HetNets powered by machine learning and big data analytics for 6G networks. IEEE Access. 2020;8:85639–55.

    Article  Google Scholar 

  18. Xu Y, Liu H, Long Z. A distributed computing framework for wind speed big data forecasting on Apache Spark. Sustainable Energy Technol Assess. 2020;37: 100582.

    Article  Google Scholar 

  19. Jayasri NP, Aruna R. Big data analytics in health care by data mining and classification techniques. ICT Express. 2022;8(2):250–7.

    Article  Google Scholar 

  20. Banchhor C, Srinivasu N. Analysis of Bayesian optimization algorithms for big data classification based on Map Reduce framework. J Big Data. 2021;8(1):81.

    Article  Google Scholar 

  21. Surantha N, Lesmana TF, Isa SM. Sleep stage classification using extreme learning machine and particle swarm optimization for healthcare big data. J Big Data. 2021;8(1):1–17.

    Article  Google Scholar 

  22. Razali NAM, Malizan NA, Hasbullah NA, Wook M, Zainuddin NM, Ishak KK, Ramli S, Sukardi S. Political security threat prediction framework using hybrid lexicon-based approach and machine learning technique. IEEE Access. 2023;11:17151–64.

    Article  Google Scholar 

  23. Elkano M, Galar M, Sanz J, Bustince H. CHI-BD: A fuzzy rule-based classification system for big data classification problems. Fuzzy Sets Syst. 2018;348:75–101.

    Article  MathSciNet  Google Scholar 

  24. Jain DK, Boyapati P, Venkatesh J, Prakash M. An intelligent cognitive-inspired computing with big data analytics framework for sentiment analysis and classification. Inf Process Manage. 2022;59(1): 102758.

    Article  Google Scholar 

  25. Sangaiah AK, Goli A, Tirkolaee EB, Ranjbar-Bourani M, Pandey HM, Zhang W. Big data-driven cognitive computing system for optimization of social media analytics. Ieee Access. 2020;8:82215–26.

    Article  Google Scholar 

  26. Pira E. City councils evolution: a socio-inspired metaheuristic optimization algorithm. J Ambient Intell Humaniz Comput. 2022;1–50.

  27. Aburomman AA, Reaz MBI. A novel SVM-kNN-PSO ensemble method for intrusion detection system. Appl Soft Comput. 2016;38:360–72.

    Article  Google Scholar 

  28. Gu J, Wang L, Wang H, Wang S. A novel approach to intrusion detection using SVM ensemble with feature augmentation. Comput Secur. 2019;86:53–62.

    Article  Google Scholar 

  29. SV. (2020, November 12). Disaster tweets. Kaggle. Retrieved October 29, 2022, from https://www.kaggle.com/datasets/vstepanenko/disaster-tweets

  30. Natural language processing with disaster tweets. Kaggle. (n.d.). Retrieved October 29, 2022, from https://www.kaggle.com/competitions/nlp-getting-started/overview

Download references

Author information

Authors and Affiliations

Authors

Contributions

JJ, SD, and DNK agreed on the content of the study. JJ, SD, and DNK collected all the data for analysis. JJ, SD, and DNK agreed on the methodology. JJ, SD, and DNK completed the analysis based on agreed steps. Results and conclusions are discussed and written together. All authors read and approved the final manuscript.

Corresponding author

Correspondence to J. Jagadeesan.

Ethics declarations

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jagadeesan, J., D., S. & Kirupanithi, D.N. An Optimized Ensemble Support Vector Machine-Based Extreme Learning Model for Real-Time Big Data Analytics and Disaster Prediction. Cogn Comput 15, 2152–2174 (2023). https://doi.org/10.1007/s12559-023-10176-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-023-10176-x

Keywords

Navigation