Abstract
Due to the heavy increase in social media usage, like Twitter, there is a growing interest in the research community in developing automation tools like accident severity-based tweet classification models. These tools aid in automatically extracting severity information from the accident tweet content. Moreover, prediction models are essential for predicting the severity of an accident to increase the safety efficacy of the road traffic system. However, the difficulty lies in the collection of sufficient labeled data. We propose a model called weighted ensemble-based self-training with decision tree (WESTDT), a semi-supervised methodology with a dynamic data labeling strategy. The base classifier in this model is a homogeneous weighted ensemble of decision tree classifiers for better prediction of pseudo labels. We also propose a novel performance measure called risk factor to estimate the amount of risk present in the application using the prediction model. The proposed model outperformed the state-of-the-art model, namely reliable semi-supervised ensemble learning (RESSEL), and the baseline models, namely decision tree (DT) and self-training with decision tree (STDT), in terms of both the traditional and the proposed measures. The results indicate that the proposed framework outperforms all other models on all datasets in terms of precision, recall, and accuracy by a range of 5–18.3%, 3.9–9.3%, and 6.6%, respectively. These findings can be helpful for the development of efficient and sustainable systems for traffic management and safety. It is also crucial for assisting government authorities in devising prompt, proactive strategies to prevent traffic accidents and enhance road safety.



Similar content being viewed by others
Data Availability
On reasonable request from the corresponding author.
Code Availability
On reasonable request from the corresponding author.
References
World health organization report on road traffic injuries. https://www.who.int/news-room/commentaries/detail/it-s-time-to-get-serious-in-addressing-the-leading-killer-of-our-youth; 2018.
World health organization global status report on road safety. https://www.who.int/publications/i/item/9789241565684; 2018.
Katanalp BY, Eren E. The novel approaches to classify cyclist accident injury-severity: hybrid fuzzy decision mechanisms. Accident Anal Prevent. 2020;144:105590.
Ma Z, Mei G, Cuomo S. An analytic framework using deep learning for prediction of traffic accident injury severity based on contributing factors. Accident Anal Prevent. 2021;160: 106322.
Jeong H, Jang Y, Bowman PJ, Masoud N. Classification of motor vehicle crash injury severity: a hybrid approach for imbalanced data. Accident Anal Prevent. 2018;120:250–61.
Yang Z, Zhang W, Feng J. Predicting multiple types of traffic accident severity with explanations: a multi-task deep learning framework. Saf Sci. 2022;146: 105522.
Liu L, Guevara A, Sanchez-Galan JE. Identification and classification of road traffic incidents in panama city through the analysis of a social media stream and machine learning. Intell Syst Appl. 2022;16: 200158.
Ali F, Ali A, Imran M, Naqvi RA, Siddiqi MH, Kwak K-S. Traffic accident detection and condition analysis based on social networking data. Accident Anal Prevent. 2021;151: 105973.
Sameen M, Pradhan B. Severity prediction of traffic accidents with recurrent neural networks. Appl Sci. 2017;7(6):476.
Gan J, Li L, Zhang D, Yi Z, Xiang Q. An alternative method for traffic accident severity prediction: using deep forests algorithm. J Adv Transp. 2020;1–13:2020.
Assi K, Rahman SM, Mansoor U, Ratrout N. Predicting crash injury severity with machine learning algorithm synergized with clustering technique: a promising protocol. Int J Environ Res Public Health. 2020;17(15):5497.
Gutierrez-Osorio C, González FA, Pedraza CA. Deep learning ensemble model for the prediction of traffic accidents using social media data. Computers. 2022;11(9):126.
Liu H, Kumar S, Morstatter F. Twitter data analytics. Springer briefs in computer science. London: Springer; 2014.
Wang C, Nulty P, Lillis D. Transformer-based multi-task learning for disaster tweet categorisation. Preprint arXiv:2110.08010; 2021.
Abbas AM. Social network analysis using deep learning: applications and schemes. Soc Netw Anal Min. 2021;11(1):106.
Taamneh M, Alkheder S, Taamneh S. Data-mining techniques for traffic accident modeling and prediction in the United Arab Emirates. J Transp Saf Sec. 2017;9(2):146–66.
Salam S, Islam MS, Ahmed F, Khan L, Kim D, Allo N, Nwariaku O. Exploring the roles of social media data to identify the locations and severity of road traffic accidents. In: 2021 IEEE 4th international conference on artificial intelligence and knowledge engineering (AIKE). IEEE; 2021. p. 62–71.
Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res. 2005;30(1):79–82.
de Vries S, Thierens D. A reliable ensemble based approach to semi-supervised learning. Knowl Based Syst. 2021;215: 106738.
Zheng M, Li T, Zhu R, Chen J, Ma Z, Tang M, Cui Z, Wang Z. Traffic accident’s severity prediction: a deep-learning approach-based CNN network. IEEE Access. 2019;7:39897–910.
Azhar A, Rubab S, Khan MM, Bangash YA, Alshehri MD, Illahi F, Bashir AK. Detection and prediction of traffic accidents using deep learning techniques. Clust Comput. 2022;1–17:2022.
Vapnik V. The nature of statistical learning theory. London: Springer; 1999.
Han L, Luo S, Jianmin Yu, Pan L, Chen S. Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes. IEEE J Biomed Health Inform. 2014;19(2):728–34.
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. London: Springer; 2009.
Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint arXiv:1810.04805; 2018.
Bokaba T, Doorsamy W, Paul BS. A comparative study of ensemble models for predicting road traffic congestion. Appl Sci. 2022;12(3):1337.
Iranmanesh M, Seyedabrishami S, Moridpour S. Identifying high crash risk segments in rural roads using ensemble decision tree-based models. Sci Rep. 2022;12(1):20024.
Jamal A, Zahid M, Rahman MT, Al-Ahmadi HM, Almoshaogeh M, Farooq D, Ahmad M. Injury severity prediction of traffic crashes with ensemble machine learning techniques: a comparative study. Int J Injury Control Saf Promot. 2021;28(4):408–27.
Umamaheswara SB, Sadam R. Towards developing and analysing metric-based software defect severity prediction model. e-prints, arXiv–2210; 2022.
Wei C, Sohn K, Mellina C, Yuille A, Yang F. Crest: a class-rebalancing self-training framework for imbalanced semi-supervised learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2021. p. 10857–66.
Roli F, Marcialis GL. Semi-supervised PCA-based face recognition using self-training. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer; 2006. p. 560–8.
Nartey OT, Yang G, Asare SK, Wu J, Frempong LN. Robust semi-supervised traffic sign recognition via self-training and weakly-supervised learning. Sensors. 2020;20(9):2684.
Zhang Y, Park DS, Han W, Qin J, Gulati A, Shor J, Jansen A, Xu Y, Huang Y, Wang S, et al. Bigssl: exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. IEEE J Sel Top Signal Process. 2022;2022:1.
Wang X, Kihara D, Luo J, Qi G-J. Enaet: a self-trained framework for semi-supervised and supervised learning with ensemble transformations. IEEE Trans Image Process. 2020;30:1639–47.
Liu Z, Wen T, Sun W, Zhang Q. Semi-supervised self-training feature weighted clustering decision tree and random forest. IEEE Access. 2020;8:128337–48.
Madisetty S, Desarkar MS. A neural network-based ensemble approach for spam detection in twitter. IEEE Trans Comput Soc Syst. 2018;5(4):973–84.
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Preprint arXiv:1301.3781; 2013.
Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP); 2014. p. 1532–43.
Johnson R, Zhang T. Effective use of word order for text categorization with convolutional neural networks. Preprint arXiv:1412.1058; 2014.
Ghosh S, Chakraborty P, Cohn E, Brownstein JS, Ramakrishnan N. Characterizing diseases from unstructured text: a vocabulary driven word2vec approach. In: Proceedings of the 25th ACM international on conference on information and knowledge management. ACM; 2016. p. 1129–38.
Raul SK, Rout RR, Somayajulu DVLN. Topic classification using regularized variable-size CNN and dynamic BPSO in online social network. Arab J Sci Eng. 2023;2023:1–23.
Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Ling. 2017;5:135–46.
Symeonidis S, Effrosynidis D, Arampatzis A. A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis. Expert Syst Appl. 2018;110:298–310.
Kouloumpis E, Wilson T, Moore J. Twitter sentiment analysis: the good the bad and the omg! In: Proceedings of the international AAAI conference on web and social media, vol. 5; 2011. p. 538–41.
Chanda S, Pal S. The effect of stopword removal on information retrieval for code-mixed data obtained via social media. SN Comput Sci. 2023;4(5):494.
Loper E, Bird S. Nltk: the natural language toolkit. Preprint arXiv:cs/0205028; 2002.
Hardeniya N, Perkins J, Chopra D, Joshi N, Mathur I. Natural language processing: Python and NLTK. London: Packt Publishing Ltd; 2016.
Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A. Advances in pre-training distributed word representations. Preprint arXiv:1712.09405; 2017.
Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T. Learning word vectors for 157 languages. Preprint arXiv:1802.06893; 2018.
Tanha J, Van Someren M, Afsarmanesh H. Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern. 2017;8(1):355–70.
Masse M. REST API design rulebook: designing consistent RESTful web service interfaces. London: O’Reilly Media Inc; 2011.
Alomari E, Mehmood R. Analysis of tweets in Arabic language for detection of road traffic conditions. In: Smart societies, infrastructure, technologies and applications: first international conference, SCITA 2017, Jeddah, Saudi Arabia, November 27–29, 2017, proceedings 1. Springer; 2018. p. 98–110.
McHugh ML. Interrater reliability: the kappa statistic. Biochem Med. 2012;22(3):276–82.
Sarkar S, Pramanik A, Maiti J, Reniers G. Predicting and analyzing injury severity: a machine learning-based approach using class-imbalanced proactive and reactive data. Saf Sci. 2020;125:104616.
Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1–30.
Panda SK, Pande SK, Das S. Task partitioning scheduling algorithms for heterogeneous multi-cloud environment. Arab J Sci Eng. 2018;43(2):913–33.
Bishop CM, Nasrabadi NM. Pattern recognition and machine learning, vol. 4. London: Springer; 2006.
Opitz D, Maclin R. Popular ensemble methods: an empirical study. J Artif Intell Res. 1999;11:169–98.
Rokach L. Pattern classification using ensemble learning. Ser Mach Percept Artif Intell. 2010;75:1.
Sharma U, Sadam R. How far does the predictive decision impact the software project? The cost, service time, and failure analysis from a cross-project defect prediction model. J Syst Softw. 2023;195: 111522.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Sanjib Kumar Raul: Conception, Design, Proposed Model, Model Analysis, Drafting, Data collection, Experimentation, Review, and Approve. Rashmi Ranjan Rout: Conception, Design, Model Analysis, Drafting, Review, and Approve. D.V.L.N. Somayajulu: Conception, Design, Model Analysis, Drafting, Review, and Approve.
Corresponding author
Ethics declarations
Conflict of interest
The authors affirm that they have no known Conflict of interest.
Ethical Approval
The submitted work by the authors is entirely their own, and it has not been published or considered for publication elsewhere.
Informed Consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Dependable Cyber-Physical Systems and Cyber Security” guest edited by Deepak Puthal and Niranjan K. Ray.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Raul, S.K., Rout, R.R. & Somayajulu, D.V.L.N. Weighted Ensemble Learning for Accident Severity Classification Using Social Media Data. SN COMPUT. SCI. 5, 528 (2024). https://doi.org/10.1007/s42979-024-02870-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-02870-w