Weighted Ensemble Learning for Accident Severity Classification Using Social Media Data

Raul, Sanjib Kumar; Rout, Rashmi Ranjan; Somayajulu, D. V. L. N.

doi:10.1007/s42979-024-02870-w

Weighted Ensemble Learning for Accident Severity Classification Using Social Media Data

Original Research
Published: 09 May 2024

Volume 5, article number 528, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Sanjib Kumar Raul ORCID: orcid.org/0000-0001-6176-3552¹,
Rashmi Ranjan Rout¹ &
D. V. L. N. Somayajulu^1,2

111 Accesses
Explore all metrics

Abstract

Due to the heavy increase in social media usage, like Twitter, there is a growing interest in the research community in developing automation tools like accident severity-based tweet classification models. These tools aid in automatically extracting severity information from the accident tweet content. Moreover, prediction models are essential for predicting the severity of an accident to increase the safety efficacy of the road traffic system. However, the difficulty lies in the collection of sufficient labeled data. We propose a model called weighted ensemble-based self-training with decision tree (WESTDT), a semi-supervised methodology with a dynamic data labeling strategy. The base classifier in this model is a homogeneous weighted ensemble of decision tree classifiers for better prediction of pseudo labels. We also propose a novel performance measure called risk factor to estimate the amount of risk present in the application using the prediction model. The proposed model outperformed the state-of-the-art model, namely reliable semi-supervised ensemble learning (RESSEL), and the baseline models, namely decision tree (DT) and self-training with decision tree (STDT), in terms of both the traditional and the proposed measures. The results indicate that the proposed framework outperforms all other models on all datasets in terms of precision, recall, and accuracy by a range of 5–18.3%, 3.9–9.3%, and 6.6%, respectively. These findings can be helpful for the development of efficient and sustainable systems for traffic management and safety. It is also crucial for assisting government authorities in devising prompt, proactive strategies to prevent traffic accidents and enhance road safety.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel weighted majority voting-based ensemble approach for detection of road accidents using social media data

Article 11 November 2024

A Decision-Making Model for Predicting the Severity of Road Traffic Accidents Based on Ensemble Learning

Enhancing Road Safety: Predicting Severity of Accidents

Data Availability

On reasonable request from the corresponding author.

Code Availability

On reasonable request from the corresponding author.

References

World health organization report on road traffic injuries. https://www.who.int/news-room/commentaries/detail/it-s-time-to-get-serious-in-addressing-the-leading-killer-of-our-youth; 2018.
World health organization global status report on road safety. https://www.who.int/publications/i/item/9789241565684; 2018.
Katanalp BY, Eren E. The novel approaches to classify cyclist accident injury-severity: hybrid fuzzy decision mechanisms. Accident Anal Prevent. 2020;144:105590.
Article Google Scholar
Ma Z, Mei G, Cuomo S. An analytic framework using deep learning for prediction of traffic accident injury severity based on contributing factors. Accident Anal Prevent. 2021;160: 106322.
Article Google Scholar
Jeong H, Jang Y, Bowman PJ, Masoud N. Classification of motor vehicle crash injury severity: a hybrid approach for imbalanced data. Accident Anal Prevent. 2018;120:250–61.
Article Google Scholar
Yang Z, Zhang W, Feng J. Predicting multiple types of traffic accident severity with explanations: a multi-task deep learning framework. Saf Sci. 2022;146: 105522.
Article Google Scholar
Liu L, Guevara A, Sanchez-Galan JE. Identification and classification of road traffic incidents in panama city through the analysis of a social media stream and machine learning. Intell Syst Appl. 2022;16: 200158.
Google Scholar
Ali F, Ali A, Imran M, Naqvi RA, Siddiqi MH, Kwak K-S. Traffic accident detection and condition analysis based on social networking data. Accident Anal Prevent. 2021;151: 105973.
Article Google Scholar
Sameen M, Pradhan B. Severity prediction of traffic accidents with recurrent neural networks. Appl Sci. 2017;7(6):476.
Article Google Scholar
Gan J, Li L, Zhang D, Yi Z, Xiang Q. An alternative method for traffic accident severity prediction: using deep forests algorithm. J Adv Transp. 2020;1–13:2020.
Google Scholar
Assi K, Rahman SM, Mansoor U, Ratrout N. Predicting crash injury severity with machine learning algorithm synergized with clustering technique: a promising protocol. Int J Environ Res Public Health. 2020;17(15):5497.
Article Google Scholar
Gutierrez-Osorio C, González FA, Pedraza CA. Deep learning ensemble model for the prediction of traffic accidents using social media data. Computers. 2022;11(9):126.
Article Google Scholar
Liu H, Kumar S, Morstatter F. Twitter data analytics. Springer briefs in computer science. London: Springer; 2014.
Google Scholar
Wang C, Nulty P, Lillis D. Transformer-based multi-task learning for disaster tweet categorisation. Preprint arXiv:2110.08010; 2021.
Abbas AM. Social network analysis using deep learning: applications and schemes. Soc Netw Anal Min. 2021;11(1):106.
Article Google Scholar
Taamneh M, Alkheder S, Taamneh S. Data-mining techniques for traffic accident modeling and prediction in the United Arab Emirates. J Transp Saf Sec. 2017;9(2):146–66.
Google Scholar
Salam S, Islam MS, Ahmed F, Khan L, Kim D, Allo N, Nwariaku O. Exploring the roles of social media data to identify the locations and severity of road traffic accidents. In: 2021 IEEE 4th international conference on artificial intelligence and knowledge engineering (AIKE). IEEE; 2021. p. 62–71.
Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res. 2005;30(1):79–82.
Article Google Scholar
de Vries S, Thierens D. A reliable ensemble based approach to semi-supervised learning. Knowl Based Syst. 2021;215: 106738.
Article Google Scholar
Zheng M, Li T, Zhu R, Chen J, Ma Z, Tang M, Cui Z, Wang Z. Traffic accident’s severity prediction: a deep-learning approach-based CNN network. IEEE Access. 2019;7:39897–910.
Article Google Scholar
Azhar A, Rubab S, Khan MM, Bangash YA, Alshehri MD, Illahi F, Bashir AK. Detection and prediction of traffic accidents using deep learning techniques. Clust Comput. 2022;1–17:2022.
Google Scholar
Vapnik V. The nature of statistical learning theory. London: Springer; 1999.
Google Scholar
Han L, Luo S, Jianmin Yu, Pan L, Chen S. Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes. IEEE J Biomed Health Inform. 2014;19(2):728–34.
Article Google Scholar
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. London: Springer; 2009.
Book Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint arXiv:1810.04805; 2018.
Bokaba T, Doorsamy W, Paul BS. A comparative study of ensemble models for predicting road traffic congestion. Appl Sci. 2022;12(3):1337.
Article Google Scholar
Iranmanesh M, Seyedabrishami S, Moridpour S. Identifying high crash risk segments in rural roads using ensemble decision tree-based models. Sci Rep. 2022;12(1):20024.
Article Google Scholar
Jamal A, Zahid M, Rahman MT, Al-Ahmadi HM, Almoshaogeh M, Farooq D, Ahmad M. Injury severity prediction of traffic crashes with ensemble machine learning techniques: a comparative study. Int J Injury Control Saf Promot. 2021;28(4):408–27.
Article Google Scholar
Umamaheswara SB, Sadam R. Towards developing and analysing metric-based software defect severity prediction model. e-prints, arXiv–2210; 2022.
Wei C, Sohn K, Mellina C, Yuille A, Yang F. Crest: a class-rebalancing self-training framework for imbalanced semi-supervised learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2021. p. 10857–66.
Roli F, Marcialis GL. Semi-supervised PCA-based face recognition using self-training. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer; 2006. p. 560–8.
Nartey OT, Yang G, Asare SK, Wu J, Frempong LN. Robust semi-supervised traffic sign recognition via self-training and weakly-supervised learning. Sensors. 2020;20(9):2684.
Article Google Scholar
Zhang Y, Park DS, Han W, Qin J, Gulati A, Shor J, Jansen A, Xu Y, Huang Y, Wang S, et al. Bigssl: exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. IEEE J Sel Top Signal Process. 2022;2022:1.
Google Scholar
Wang X, Kihara D, Luo J, Qi G-J. Enaet: a self-trained framework for semi-supervised and supervised learning with ensemble transformations. IEEE Trans Image Process. 2020;30:1639–47.
Article Google Scholar
Liu Z, Wen T, Sun W, Zhang Q. Semi-supervised self-training feature weighted clustering decision tree and random forest. IEEE Access. 2020;8:128337–48.
Article Google Scholar
Madisetty S, Desarkar MS. A neural network-based ensemble approach for spam detection in twitter. IEEE Trans Comput Soc Syst. 2018;5(4):973–84.
Article Google Scholar
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Preprint arXiv:1301.3781; 2013.
Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP); 2014. p. 1532–43.
Johnson R, Zhang T. Effective use of word order for text categorization with convolutional neural networks. Preprint arXiv:1412.1058; 2014.
Ghosh S, Chakraborty P, Cohn E, Brownstein JS, Ramakrishnan N. Characterizing diseases from unstructured text: a vocabulary driven word2vec approach. In: Proceedings of the 25th ACM international on conference on information and knowledge management. ACM; 2016. p. 1129–38.
Raul SK, Rout RR, Somayajulu DVLN. Topic classification using regularized variable-size CNN and dynamic BPSO in online social network. Arab J Sci Eng. 2023;2023:1–23.
Google Scholar
Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Ling. 2017;5:135–46.
Google Scholar
Symeonidis S, Effrosynidis D, Arampatzis A. A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis. Expert Syst Appl. 2018;110:298–310.
Article Google Scholar
Kouloumpis E, Wilson T, Moore J. Twitter sentiment analysis: the good the bad and the omg! In: Proceedings of the international AAAI conference on web and social media, vol. 5; 2011. p. 538–41.
Chanda S, Pal S. The effect of stopword removal on information retrieval for code-mixed data obtained via social media. SN Comput Sci. 2023;4(5):494.
Article Google Scholar
Loper E, Bird S. Nltk: the natural language toolkit. Preprint arXiv:cs/0205028; 2002.
Hardeniya N, Perkins J, Chopra D, Joshi N, Mathur I. Natural language processing: Python and NLTK. London: Packt Publishing Ltd; 2016.
Google Scholar
Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A. Advances in pre-training distributed word representations. Preprint arXiv:1712.09405; 2017.
Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T. Learning word vectors for 157 languages. Preprint arXiv:1802.06893; 2018.
Tanha J, Van Someren M, Afsarmanesh H. Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern. 2017;8(1):355–70.
Article Google Scholar
Masse M. REST API design rulebook: designing consistent RESTful web service interfaces. London: O’Reilly Media Inc; 2011.
Google Scholar
Alomari E, Mehmood R. Analysis of tweets in Arabic language for detection of road traffic conditions. In: Smart societies, infrastructure, technologies and applications: first international conference, SCITA 2017, Jeddah, Saudi Arabia, November 27–29, 2017, proceedings 1. Springer; 2018. p. 98–110.
McHugh ML. Interrater reliability: the kappa statistic. Biochem Med. 2012;22(3):276–82.
Article MathSciNet Google Scholar
Sarkar S, Pramanik A, Maiti J, Reniers G. Predicting and analyzing injury severity: a machine learning-based approach using class-imbalanced proactive and reactive data. Saf Sci. 2020;125:104616.
Article Google Scholar
Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1–30.
MathSciNet Google Scholar
Panda SK, Pande SK, Das S. Task partitioning scheduling algorithms for heterogeneous multi-cloud environment. Arab J Sci Eng. 2018;43(2):913–33.
Article Google Scholar
Bishop CM, Nasrabadi NM. Pattern recognition and machine learning, vol. 4. London: Springer; 2006.
Google Scholar
Opitz D, Maclin R. Popular ensemble methods: an empirical study. J Artif Intell Res. 1999;11:169–98.
Article Google Scholar
Rokach L. Pattern classification using ensemble learning. Ser Mach Percept Artif Intell. 2010;75:1.
Google Scholar
Sharma U, Sadam R. How far does the predictive decision impact the software project? The cost, service time, and failure analysis from a cross-project defect prediction model. J Syst Softw. 2023;195: 111522.
Article Google Scholar

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Warangal, Telangana, 506004, India
Sanjib Kumar Raul, Rashmi Ranjan Rout & D. V. L. N. Somayajulu
Department of Computer Science and Engineering, Indian Institute of Information Technology Design and Manufacturing (IIITDM), Kurnool, Andhra Pradesh, 518002, India
D. V. L. N. Somayajulu

Authors

Sanjib Kumar Raul
View author publications
You can also search for this author in PubMed Google Scholar
Rashmi Ranjan Rout
View author publications
You can also search for this author in PubMed Google Scholar
D. V. L. N. Somayajulu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Sanjib Kumar Raul: Conception, Design, Proposed Model, Model Analysis, Drafting, Data collection, Experimentation, Review, and Approve. Rashmi Ranjan Rout: Conception, Design, Model Analysis, Drafting, Review, and Approve. D.V.L.N. Somayajulu: Conception, Design, Model Analysis, Drafting, Review, and Approve.

Corresponding author

Correspondence to Sanjib Kumar Raul.

Ethics declarations

Conflict of interest

The authors affirm that they have no known Conflict of interest.

Ethical Approval

The submitted work by the authors is entirely their own, and it has not been published or considered for publication elsewhere.

Informed Consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Dependable Cyber-Physical Systems and Cyber Security” guest edited by Deepak Puthal and Niranjan K. Ray.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Raul, S.K., Rout, R.R. & Somayajulu, D.V.L.N. Weighted Ensemble Learning for Accident Severity Classification Using Social Media Data. SN COMPUT. SCI. 5, 528 (2024). https://doi.org/10.1007/s42979-024-02870-w

Download citation

Received: 01 August 2023
Accepted: 04 April 2024
Published: 09 May 2024
DOI: https://doi.org/10.1007/s42979-024-02870-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighted Ensemble Learning for Accident Severity Classification Using Social Media Data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel weighted majority voting-based ensemble approach for detection of road accidents using social media data

A Decision-Making Model for Predicting the Severity of Road Traffic Accidents Based on Ensemble Learning

Enhancing Road Safety: Predicting Severity of Accidents

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Informed Consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Weighted Ensemble Learning for Accident Severity Classification Using Social Media Data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel weighted majority voting-based ensemble approach for detection of road accidents using social media data

A Decision-Making Model for Predicting the Severity of Road Traffic Accidents Based on Ensemble Learning

Enhancing Road Safety: Predicting Severity of Accidents

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Informed Consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation