research-article

Handling Data Imbalance In Linear Modelling of Fatality Rate of Auto Collision

Authors:
Shengkun Xie

Global Management Studies, TRSM, Toronto Metropolitan University, Canada

Global Management Studies, TRSM, Toronto Metropolitan University, Canada

0000-0002-9533-2096
View Profile

,
Jin Zhang

Mathematics and Statistics Department, University of Guelph, Canada

Mathematics and Statistics Department, University of Guelph, Canada

0009-0002-0053-9789
View Profile

,
Anna T. Lawniczak

Mathematics and Statistics Department, University of Guelph, Canada

Mathematics and Statistics Department, University of Guelph, Canada

0000-0002-2984-0877
View Profile

ICISS '23: Proceedings of the 2023 6th International Conference on Information Science and SystemsAugust 2023Pages 83–89https://doi.org/10.1145/3625156.3625169

Published:21 November 2023Publication History

ICISS '23: Proceedings of the 2023 6th International Conference on Information Science and Systems

Pages 83–89

ABSTRACT

Learning from imbalanced data has been an ongoing hot research area. By applying techniques for handling imbalanced data, machine learning or statistical models can significantly improve their prediction performance and mitigate bias, leading to more reliable and unbiased results. Data used to predict the fatality rate of car accidents is derived from various sources, including information at the person, vehicle, and collision levels. These data are typically imbalanced, and studying this type of data is highly desirable in improving road safety. Also, predicting a fatal event is crucial for better management and allocation of limited health resources. This study explores the impact of imbalanced data handling techniques on linear statistical models.The study illustrates the significant specificity improvement when imbalanced data is appropriately managed. The findings of this study provide valuable guidelines for health resource management, illuminating the influence of data imbalance on prediction accuracy and offering insights to improve the performance of predicting auto collision fatalities.

References

Mohamed Bekkar and Taklit Akrouf Alitouche. 2013. Imbalanced data learning approaches review. International Journal of Data Mining & Knowledge Management Process 3, 4 (2013), 15.Google ScholarCross Ref
Nitesh V Chawla. 2010. Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook (2010), 875–886.Google Scholar
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.Google ScholarCross Ref
Veronikha Effendy, ZK Abdurahman Baizal, 2014. Handling imbalanced data in customer churn prediction using combined sampling and weighted random forest. In 2014 2nd International Conference on Information and Communication Technology (ICoICT). IEEE, 325–330.Google ScholarCross Ref
Mohammad Abdul Haque Farquad and Indranil Bose. 2012. Preprocessing unbalanced data using support vector machine. Decision Support Systems 53, 1 (2012), 226–233.Google ScholarDigital Library
Sara Fotouhi, Shahrokh Asadi, and Michael W Kattan. 2019. A comprehensive data level analysis for cancer diagnosis on imbalanced data. Journal of biomedical informatics 90 (2019), 103089.Google ScholarDigital Library
Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, and Gong Bing. 2017. Learning from class-imbalanced data: Review of methods and applications. Expert systems with applications 73 (2017), 220–239.Google Scholar
Amira Kamil Ibrahim Hassan and Ajith Abraham. 2016. Modeling insurance fraud detection using imbalanced data classification. In Advances in Nature and Biologically Inspired Computing: Proceedings of the 7th World Congress on Nature and Biologically Inspired Computing (NaBIC2015) in Pietermaritzburg, South Africa, held December 01-03, 2015. Springer, 117–127.Google ScholarCross Ref
Chuanxia Jian, Jian Gao, and Yinhui Ao. 2016. A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193 (2016), 115–122.Google ScholarDigital Library
Harsurinder Kaur, Husanbir Singh Pannu, and Avleen Kaur Malhi. 2019. A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Computing Surveys (CSUR) 52, 4 (2019), 1–36.Google Scholar
Vojislav Kecman. 2005. Support vector machines–an introduction. In Support vector machines: theory and applications. Springer, 1–47.Google Scholar
Bartosz Krawczyk. 2016. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence 5, 4 (2016), 221–232.Google ScholarCross Ref
Na Liu, Xiaomei Li, Ershi Qi, Man Xu, Ling Li, and Bo Gao. 2020. A novel ensemble learning paradigm for medical diagnosis with imbalanced data. IEEE Access 8 (2020), 171263–171280.Google ScholarCross Ref
Maher Maalouf, Dirar Homouz, and Theodore B Trafalis. 2018. Logistic regression in large rare events and imbalanced data: A performance comparison of prior correction and weighting methods. Computational Intelligence 34, 1 (2018), 161–174.Google ScholarCross Ref
Maher Maalouf and Mohammad Siddiqi. 2014. Weighted logistic regression for large-scale imbalanced and rare events data. Knowledge-Based Systems 59 (2014), 142–148.Google ScholarDigital Library
Maher Maalouf and Theodore B Trafalis. 2011. Rare events and imbalanced datasets: an overview. International Journal of Data Mining, Modelling and Management 3, 4 (2011), 375–388.Google ScholarCross Ref
Benjamin X Wang and Nathalie Japkowicz. 2010. Boosting support vector machines for imbalanced data sets. Knowledge and information systems 25 (2010), 1–20.Google Scholar
Shengkun Xie and Jin Zhang. 2022. A Novel Variable Selection Approach Based on Multi-criteria Decision Analysis. In Information Processing and Management of Uncertainty in Knowledge-Based Systems: 19th International Conference, IPMU 2022, Milan, Italy, July 11–15, 2022, Proceedings, Part II. Springer, 115–127.Google ScholarCross Ref

Index Terms

Handling Data Imbalance In Linear Modelling of Fatality Rate of Auto Collision

Index terms have been assigned to the content through auto-classification.

Recommendations

Modeling of class imbalance using an empirical approach with spambase dataset and random forest classification
RIIT '14: Proceedings of the 3rd annual conference on Research in information technology

Classification of imbalanced data is an important research problem as most of the data encountered in real world systems is imbalanced. Recently a representation learning technique called Synthetic Minority Over-sampling Technique (SMOTE) has been ...
Read More
A Novel Distribution Analysis for SMOTE Oversampling Method in Handling Class Imbalance
Computational Science – ICCS 2019
Abstract
Class Imbalance problems are often encountered in many applications. Such problems occur whenever a class is under-represented, has a few data points, compared to other classes. However, this minority class is usually a significant one. One ...
Read More
Over-sampling via under-sampling in strongly imbalanced data

Classification of imbalanced datasets is an important challenge in machine learning. This investigation analysed the effect of ratio imbalance and the selected classifier on the application of several re-sampling strategies to deal with imbalanced ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICISS '23: Proceedings of the 2023 6th International Conference on Information Science and Systems
August 2023
301 pages
ISBN:9798400708206
DOI:10.1145/3625156

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Fatality Rate Prediction
Imbalanced Data
Machine Learning
SMOTE
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 6
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Handling Data Imbalance In Linear Modelling of Fatality Rate of Auto Collision

ICISS '23: Proceedings of the 2023 6th International Conference on Information Science and Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Modeling of class imbalance using an empirical approach with spambase dataset and random forest classification

A Novel Distribution Analysis for SMOTE Oversampling Method in Handling Class Imbalance

Over-sampling via under-sampling in strongly imbalanced data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Handling Data Imbalance In Linear Modelling of Fatality Rate of Auto Collision

ICISS '23: Proceedings of the 2023 6th International Conference on Information Science and Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Modeling of class imbalance using an empirical approach with spambase dataset and random forest classification

A Novel Distribution Analysis for SMOTE Oversampling Method in Handling Class Imbalance

Over-sampling via under-sampling in strongly imbalanced data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media