Abstract
The capacity to interact with environments, understand them, and make judgments on time defines smartness, the foundation of smart cities, and civilizations. The main motivation of this study is to satisfy the need for a real-time disaster-related application that increases the demand for novel techniques that are scalable with big data. The main aim of this paper is to analyze the input data to find the crucial features and accurately classify them into their appropriate disaster class with the help of social media. The disaster dataset contains numerous features which increase the dimensionality of the dataset. The existing techniques consume higher runtime memory for large training datasets and suffered from different drawbacks such as oversampling, computational cost, low speed, data imbalance, concept drift, and computational complexity. To overcome these drawbacks, this study presents a novel city councils evolution (CCE)-optimized ensemble support vector machine-based extreme learning machine (ESVM-ELM) model on Apache Spark for predicting disaster events in big data. The traditional serial processing issue is overcome in this paper using an appropriate parallelization technique which improves the speedup of the model and improves the time taken for classification. The ESVM-ELM model performs well with imbalanced datasets and handles the concept drift problem efficiently. The use of the CCE algorithm for optimizing the ESVM-ELM model offers improved accuracy, a better convergence rate, and minimal computational complexity. The efficiency of our model is demonstrated by validation using the disaster tweets dataset and comparison with the four underlying approaches, namely, naïve Bayes, ELM, FCM, and Log-Based Abnormal Task Detection. The cross-validation method is utilized in this paper to generate an ensemble of ELM classifiers for decision-making utilizing an ESVM-ELM algorithm. The proposed model offers improvements in terms of accuracy, precision, recall, and F-measure values when compared to different baseline models. The experimental results demonstrated the efficiency of the ESVM-ELM model in improving the prediction accuracy, speedup, and scale-up for big data classification with reasonable processing time.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Code Availability
Not applicable.
References
Chen J, Li K, Tang Z, Bilal K, Yu S, Weng C, Li K. A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Trans Parallel Distrib Syst. 2016;28(4):919–33.
García-Gil D, Ramírez-Gallego S, García S, Herrera F. A comparison of scalability for batch big data processing on Apache Spark and Apache Flink. Big Data Anal. 2017;2(1):1–11.
Assefi M, Behravesh E, Liu G, Tafti AP. December. Big data machine learning using Apache Spark MLlib. In 2017 IEEE international conference on big data (big data) 2017;3492–3498. IEEE
Nair LR, Shetty SD, Shetty SD. Applying spark-based machine learning model on streaming big data for health status prediction. Comput Electr Eng. 2018;65:393–9.
Fu J, Sun J, Wang K. December. Spark–a big data processing platform for machine learning. In 2016 International Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII) 2016;48–51. IEEE.
Salloum S, Dautov R, Chen X, Peng PX, Huang JZ. Big data analytics on Apache Spark. Int J Data Sci Anal. 2016;1(3):145–64.
Shoro AG, Soomro TR. Big data analysis: Apache Spark perspective. Glob J Comput Sci Technol. 2015.
Alsheikh MA, Niyato D, Lin S, Tan HP, Han Z. Mobile big data analytics using deep learning and apache spark. IEEE Network. 2016;30(3):22–9.
Daghistani T, AlGhamdi H, Alshammari R, AlHazme RH. Predictors of outpatients’ no-show: big data analytics using Apache Spark. J Big Data. 2020;7(1):1–15.
Mitra A, Bera B, Das AK, Jamal SS, You I. Impact on blockchain-based AI/ML-enabled big data analytics for cognitive Internet of Things environment. Comput Commun. 2023;197:173–85.
Alotaibi S, Mehmood R, Katib I, Rana O, Albeshri A. Sehaa: a big data analytics tool for healthcare symptoms and diseases detection using Twitter, Apache Spark, and machine learning. Appl Sci. 2020;10(4), p.1398.2.
Kadkhodaei H, Moghadam AME, Dehghan M. Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm. Expert Syst Appl. 2021;183: 115369.
Fernandez-Basso C, Ruiz MD, Martin-Bautista MJ. Spark solutions for discovering fuzzy association rules in big data. Int J Approximate Reasoning. 2021;137:94–112.
Mansour RF, Abdel-Khalek S, Hilali-Jaghdam I, Nebhen J, Cho W, Joshi GP. An intelligent outlier detection with machine learning empowered big data analytics for mobile edge computing. Clust Comput. 2021;1–13.
Kumar A, Jaiswal A. A deep swarm-optimized model for leveraging industrial data analytics in cognitive manufacturing. IEEE Trans Industr Inf. 2020;17(4):2938–46.
Islam MT, Srirama SN, Karunasekera S, Buyya R. Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J Syst Softw. 2020;162: 110515.
Hadi MS, Lawey AQ, El-Gorashi TE, Elmirghani JM. Patient-centric HetNets powered by machine learning and big data analytics for 6G networks. IEEE Access. 2020;8:85639–55.
Xu Y, Liu H, Long Z. A distributed computing framework for wind speed big data forecasting on Apache Spark. Sustainable Energy Technol Assess. 2020;37: 100582.
Jayasri NP, Aruna R. Big data analytics in health care by data mining and classification techniques. ICT Express. 2022;8(2):250–7.
Banchhor C, Srinivasu N. Analysis of Bayesian optimization algorithms for big data classification based on Map Reduce framework. J Big Data. 2021;8(1):81.
Surantha N, Lesmana TF, Isa SM. Sleep stage classification using extreme learning machine and particle swarm optimization for healthcare big data. J Big Data. 2021;8(1):1–17.
Razali NAM, Malizan NA, Hasbullah NA, Wook M, Zainuddin NM, Ishak KK, Ramli S, Sukardi S. Political security threat prediction framework using hybrid lexicon-based approach and machine learning technique. IEEE Access. 2023;11:17151–64.
Elkano M, Galar M, Sanz J, Bustince H. CHI-BD: A fuzzy rule-based classification system for big data classification problems. Fuzzy Sets Syst. 2018;348:75–101.
Jain DK, Boyapati P, Venkatesh J, Prakash M. An intelligent cognitive-inspired computing with big data analytics framework for sentiment analysis and classification. Inf Process Manage. 2022;59(1): 102758.
Sangaiah AK, Goli A, Tirkolaee EB, Ranjbar-Bourani M, Pandey HM, Zhang W. Big data-driven cognitive computing system for optimization of social media analytics. Ieee Access. 2020;8:82215–26.
Pira E. City councils evolution: a socio-inspired metaheuristic optimization algorithm. J Ambient Intell Humaniz Comput. 2022;1–50.
Aburomman AA, Reaz MBI. A novel SVM-kNN-PSO ensemble method for intrusion detection system. Appl Soft Comput. 2016;38:360–72.
Gu J, Wang L, Wang H, Wang S. A novel approach to intrusion detection using SVM ensemble with feature augmentation. Comput Secur. 2019;86:53–62.
SV. (2020, November 12). Disaster tweets. Kaggle. Retrieved October 29, 2022, from https://www.kaggle.com/datasets/vstepanenko/disaster-tweets
Natural language processing with disaster tweets. Kaggle. (n.d.). Retrieved October 29, 2022, from https://www.kaggle.com/competitions/nlp-getting-started/overview
Author information
Authors and Affiliations
Contributions
JJ, SD, and DNK agreed on the content of the study. JJ, SD, and DNK collected all the data for analysis. JJ, SD, and DNK agreed on the methodology. JJ, SD, and DNK completed the analysis based on agreed steps. Results and conclusions are discussed and written together. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Human and Animal Rights
This article does not contain any studies with human or animal subjects performed by any of the authors.
Informed Consent
Informed consent was obtained from all individual participants included in the study.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jagadeesan, J., D., S. & Kirupanithi, D.N. An Optimized Ensemble Support Vector Machine-Based Extreme Learning Model for Real-Time Big Data Analytics and Disaster Prediction. Cogn Comput 15, 2152–2174 (2023). https://doi.org/10.1007/s12559-023-10176-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-023-10176-x