Skip to main content
Log in

Weighted Ensemble Learning for Accident Severity Classification Using Social Media Data

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Due to the heavy increase in social media usage, like Twitter, there is a growing interest in the research community in developing automation tools like accident severity-based tweet classification models. These tools aid in automatically extracting severity information from the accident tweet content. Moreover, prediction models are essential for predicting the severity of an accident to increase the safety efficacy of the road traffic system. However, the difficulty lies in the collection of sufficient labeled data. We propose a model called weighted ensemble-based self-training with decision tree (WESTDT), a semi-supervised methodology with a dynamic data labeling strategy. The base classifier in this model is a homogeneous weighted ensemble of decision tree classifiers for better prediction of pseudo labels. We also propose a novel performance measure called risk factor to estimate the amount of risk present in the application using the prediction model. The proposed model outperformed the state-of-the-art model, namely reliable semi-supervised ensemble learning (RESSEL), and the baseline models, namely decision tree (DT) and self-training with decision tree (STDT), in terms of both the traditional and the proposed measures. The results indicate that the proposed framework outperforms all other models on all datasets in terms of precision, recall, and accuracy by a range of 5–18.3%, 3.9–9.3%, and 6.6%, respectively. These findings can be helpful for the development of efficient and sustainable systems for traffic management and safety. It is also crucial for assisting government authorities in devising prompt, proactive strategies to prevent traffic accidents and enhance road safety.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data Availability

On reasonable request from the corresponding author.

Code Availability

On reasonable request from the corresponding author.

References

  1. World health organization report on road traffic injuries. https://www.who.int/news-room/commentaries/detail/it-s-time-to-get-serious-in-addressing-the-leading-killer-of-our-youth; 2018.

  2. World health organization global status report on road safety. https://www.who.int/publications/i/item/9789241565684; 2018.

  3. Katanalp BY, Eren E. The novel approaches to classify cyclist accident injury-severity: hybrid fuzzy decision mechanisms. Accident Anal Prevent. 2020;144:105590.

    Article  Google Scholar 

  4. Ma Z, Mei G, Cuomo S. An analytic framework using deep learning for prediction of traffic accident injury severity based on contributing factors. Accident Anal Prevent. 2021;160: 106322.

    Article  Google Scholar 

  5. Jeong H, Jang Y, Bowman PJ, Masoud N. Classification of motor vehicle crash injury severity: a hybrid approach for imbalanced data. Accident Anal Prevent. 2018;120:250–61.

    Article  Google Scholar 

  6. Yang Z, Zhang W, Feng J. Predicting multiple types of traffic accident severity with explanations: a multi-task deep learning framework. Saf Sci. 2022;146: 105522.

    Article  Google Scholar 

  7. Liu L, Guevara A, Sanchez-Galan JE. Identification and classification of road traffic incidents in panama city through the analysis of a social media stream and machine learning. Intell Syst Appl. 2022;16: 200158.

    Google Scholar 

  8. Ali F, Ali A, Imran M, Naqvi RA, Siddiqi MH, Kwak K-S. Traffic accident detection and condition analysis based on social networking data. Accident Anal Prevent. 2021;151: 105973.

    Article  Google Scholar 

  9. Sameen M, Pradhan B. Severity prediction of traffic accidents with recurrent neural networks. Appl Sci. 2017;7(6):476.

    Article  Google Scholar 

  10. Gan J, Li L, Zhang D, Yi Z, Xiang Q. An alternative method for traffic accident severity prediction: using deep forests algorithm. J Adv Transp. 2020;1–13:2020.

    Google Scholar 

  11. Assi K, Rahman SM, Mansoor U, Ratrout N. Predicting crash injury severity with machine learning algorithm synergized with clustering technique: a promising protocol. Int J Environ Res Public Health. 2020;17(15):5497.

    Article  Google Scholar 

  12. Gutierrez-Osorio C, González FA, Pedraza CA. Deep learning ensemble model for the prediction of traffic accidents using social media data. Computers. 2022;11(9):126.

    Article  Google Scholar 

  13. Liu H, Kumar S, Morstatter F. Twitter data analytics. Springer briefs in computer science. London: Springer; 2014.

    Google Scholar 

  14. Wang C, Nulty P, Lillis D. Transformer-based multi-task learning for disaster tweet categorisation. Preprint arXiv:2110.08010; 2021.

  15. Abbas AM. Social network analysis using deep learning: applications and schemes. Soc Netw Anal Min. 2021;11(1):106.

    Article  Google Scholar 

  16. Taamneh M, Alkheder S, Taamneh S. Data-mining techniques for traffic accident modeling and prediction in the United Arab Emirates. J Transp Saf Sec. 2017;9(2):146–66.

    Google Scholar 

  17. Salam S, Islam MS, Ahmed F, Khan L, Kim D, Allo N, Nwariaku O. Exploring the roles of social media data to identify the locations and severity of road traffic accidents. In: 2021 IEEE 4th international conference on artificial intelligence and knowledge engineering (AIKE). IEEE; 2021. p. 62–71.

  18. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res. 2005;30(1):79–82.

    Article  Google Scholar 

  19. de Vries S, Thierens D. A reliable ensemble based approach to semi-supervised learning. Knowl Based Syst. 2021;215: 106738.

    Article  Google Scholar 

  20. Zheng M, Li T, Zhu R, Chen J, Ma Z, Tang M, Cui Z, Wang Z. Traffic accident’s severity prediction: a deep-learning approach-based CNN network. IEEE Access. 2019;7:39897–910.

    Article  Google Scholar 

  21. Azhar A, Rubab S, Khan MM, Bangash YA, Alshehri MD, Illahi F, Bashir AK. Detection and prediction of traffic accidents using deep learning techniques. Clust Comput. 2022;1–17:2022.

    Google Scholar 

  22. Vapnik V. The nature of statistical learning theory. London: Springer; 1999.

    Google Scholar 

  23. Han L, Luo S, Jianmin Yu, Pan L, Chen S. Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes. IEEE J Biomed Health Inform. 2014;19(2):728–34.

    Article  Google Scholar 

  24. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. London: Springer; 2009.

    Book  Google Scholar 

  25. Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint arXiv:1810.04805; 2018.

  26. Bokaba T, Doorsamy W, Paul BS. A comparative study of ensemble models for predicting road traffic congestion. Appl Sci. 2022;12(3):1337.

    Article  Google Scholar 

  27. Iranmanesh M, Seyedabrishami S, Moridpour S. Identifying high crash risk segments in rural roads using ensemble decision tree-based models. Sci Rep. 2022;12(1):20024.

    Article  Google Scholar 

  28. Jamal A, Zahid M, Rahman MT, Al-Ahmadi HM, Almoshaogeh M, Farooq D, Ahmad M. Injury severity prediction of traffic crashes with ensemble machine learning techniques: a comparative study. Int J Injury Control Saf Promot. 2021;28(4):408–27.

    Article  Google Scholar 

  29. Umamaheswara SB, Sadam R. Towards developing and analysing metric-based software defect severity prediction model. e-prints, arXiv–2210; 2022.

  30. Wei C, Sohn K, Mellina C, Yuille A, Yang F. Crest: a class-rebalancing self-training framework for imbalanced semi-supervised learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2021. p. 10857–66.

  31. Roli F, Marcialis GL. Semi-supervised PCA-based face recognition using self-training. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer; 2006. p. 560–8.

  32. Nartey OT, Yang G, Asare SK, Wu J, Frempong LN. Robust semi-supervised traffic sign recognition via self-training and weakly-supervised learning. Sensors. 2020;20(9):2684.

    Article  Google Scholar 

  33. Zhang Y, Park DS, Han W, Qin J, Gulati A, Shor J, Jansen A, Xu Y, Huang Y, Wang S, et al. Bigssl: exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. IEEE J Sel Top Signal Process. 2022;2022:1.

    Google Scholar 

  34. Wang X, Kihara D, Luo J, Qi G-J. Enaet: a self-trained framework for semi-supervised and supervised learning with ensemble transformations. IEEE Trans Image Process. 2020;30:1639–47.

    Article  Google Scholar 

  35. Liu Z, Wen T, Sun W, Zhang Q. Semi-supervised self-training feature weighted clustering decision tree and random forest. IEEE Access. 2020;8:128337–48.

    Article  Google Scholar 

  36. Madisetty S, Desarkar MS. A neural network-based ensemble approach for spam detection in twitter. IEEE Trans Comput Soc Syst. 2018;5(4):973–84.

    Article  Google Scholar 

  37. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Preprint arXiv:1301.3781; 2013.

  38. Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP); 2014. p. 1532–43.

  39. Johnson R, Zhang T. Effective use of word order for text categorization with convolutional neural networks. Preprint arXiv:1412.1058; 2014.

  40. Ghosh S, Chakraborty P, Cohn E, Brownstein JS, Ramakrishnan N. Characterizing diseases from unstructured text: a vocabulary driven word2vec approach. In: Proceedings of the 25th ACM international on conference on information and knowledge management. ACM; 2016. p. 1129–38.

  41. Raul SK, Rout RR, Somayajulu DVLN. Topic classification using regularized variable-size CNN and dynamic BPSO in online social network. Arab J Sci Eng. 2023;2023:1–23.

    Google Scholar 

  42. Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Ling. 2017;5:135–46.

    Google Scholar 

  43. Symeonidis S, Effrosynidis D, Arampatzis A. A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis. Expert Syst Appl. 2018;110:298–310.

    Article  Google Scholar 

  44. Kouloumpis E, Wilson T, Moore J. Twitter sentiment analysis: the good the bad and the omg! In: Proceedings of the international AAAI conference on web and social media, vol. 5; 2011. p. 538–41.

  45. Chanda S, Pal S. The effect of stopword removal on information retrieval for code-mixed data obtained via social media. SN Comput Sci. 2023;4(5):494.

    Article  Google Scholar 

  46. Loper E, Bird S. Nltk: the natural language toolkit. Preprint arXiv:cs/0205028; 2002.

  47. Hardeniya N, Perkins J, Chopra D, Joshi N, Mathur I. Natural language processing: Python and NLTK. London: Packt Publishing Ltd; 2016.

    Google Scholar 

  48. Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A. Advances in pre-training distributed word representations. Preprint arXiv:1712.09405; 2017.

  49. Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T. Learning word vectors for 157 languages. Preprint arXiv:1802.06893; 2018.

  50. Tanha J, Van Someren M, Afsarmanesh H. Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern. 2017;8(1):355–70.

    Article  Google Scholar 

  51. Masse M. REST API design rulebook: designing consistent RESTful web service interfaces. London: O’Reilly Media Inc; 2011.

    Google Scholar 

  52. Alomari E, Mehmood R. Analysis of tweets in Arabic language for detection of road traffic conditions. In: Smart societies, infrastructure, technologies and applications: first international conference, SCITA 2017, Jeddah, Saudi Arabia, November 27–29, 2017, proceedings 1. Springer; 2018. p. 98–110.

  53. McHugh ML. Interrater reliability: the kappa statistic. Biochem Med. 2012;22(3):276–82.

    Article  MathSciNet  Google Scholar 

  54. Sarkar S, Pramanik A, Maiti J, Reniers G. Predicting and analyzing injury severity: a machine learning-based approach using class-imbalanced proactive and reactive data. Saf Sci. 2020;125:104616.

    Article  Google Scholar 

  55. Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1–30.

    MathSciNet  Google Scholar 

  56. Panda SK, Pande SK, Das S. Task partitioning scheduling algorithms for heterogeneous multi-cloud environment. Arab J Sci Eng. 2018;43(2):913–33.

    Article  Google Scholar 

  57. Bishop CM, Nasrabadi NM. Pattern recognition and machine learning, vol. 4. London: Springer; 2006.

    Google Scholar 

  58. Opitz D, Maclin R. Popular ensemble methods: an empirical study. J Artif Intell Res. 1999;11:169–98.

    Article  Google Scholar 

  59. Rokach L. Pattern classification using ensemble learning. Ser Mach Percept Artif Intell. 2010;75:1.

    Google Scholar 

  60. Sharma U, Sadam R. How far does the predictive decision impact the software project? The cost, service time, and failure analysis from a cross-project defect prediction model. J Syst Softw. 2023;195: 111522.

    Article  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Sanjib Kumar Raul: Conception, Design, Proposed Model, Model Analysis, Drafting, Data collection, Experimentation, Review, and Approve. Rashmi Ranjan Rout: Conception, Design, Model Analysis, Drafting, Review, and Approve. D.V.L.N. Somayajulu: Conception, Design, Model Analysis, Drafting, Review, and Approve.

Corresponding author

Correspondence to Sanjib Kumar Raul.

Ethics declarations

Conflict of interest

The authors affirm that they have no known Conflict of interest.

Ethical Approval

The submitted work by the authors is entirely their own, and it has not been published or considered for publication elsewhere.

Informed Consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Dependable Cyber-Physical Systems and Cyber Security” guest edited by Deepak Puthal and Niranjan K. Ray.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raul, S.K., Rout, R.R. & Somayajulu, D.V.L.N. Weighted Ensemble Learning for Accident Severity Classification Using Social Media Data. SN COMPUT. SCI. 5, 528 (2024). https://doi.org/10.1007/s42979-024-02870-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02870-w

Keywords

Navigation