skip to main content
10.1145/3625156.3625169acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicissConference Proceedingsconference-collections
research-article

Handling Data Imbalance In Linear Modelling of Fatality Rate of Auto Collision

Published:21 November 2023Publication History

ABSTRACT

Learning from imbalanced data has been an ongoing hot research area. By applying techniques for handling imbalanced data, machine learning or statistical models can significantly improve their prediction performance and mitigate bias, leading to more reliable and unbiased results. Data used to predict the fatality rate of car accidents is derived from various sources, including information at the person, vehicle, and collision levels. These data are typically imbalanced, and studying this type of data is highly desirable in improving road safety. Also, predicting a fatal event is crucial for better management and allocation of limited health resources. This study explores the impact of imbalanced data handling techniques on linear statistical models.The study illustrates the significant specificity improvement when imbalanced data is appropriately managed. The findings of this study provide valuable guidelines for health resource management, illuminating the influence of data imbalance on prediction accuracy and offering insights to improve the performance of predicting auto collision fatalities.

References

  1. Mohamed Bekkar and Taklit Akrouf Alitouche. 2013. Imbalanced data learning approaches review. International Journal of Data Mining & Knowledge Management Process 3, 4 (2013), 15.Google ScholarGoogle ScholarCross RefCross Ref
  2. Nitesh V Chawla. 2010. Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook (2010), 875–886.Google ScholarGoogle Scholar
  3. Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.Google ScholarGoogle ScholarCross RefCross Ref
  4. Veronikha Effendy, ZK Abdurahman Baizal, 2014. Handling imbalanced data in customer churn prediction using combined sampling and weighted random forest. In 2014 2nd International Conference on Information and Communication Technology (ICoICT). IEEE, 325–330.Google ScholarGoogle ScholarCross RefCross Ref
  5. Mohammad Abdul Haque Farquad and Indranil Bose. 2012. Preprocessing unbalanced data using support vector machine. Decision Support Systems 53, 1 (2012), 226–233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sara Fotouhi, Shahrokh Asadi, and Michael W Kattan. 2019. A comprehensive data level analysis for cancer diagnosis on imbalanced data. Journal of biomedical informatics 90 (2019), 103089.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, and Gong Bing. 2017. Learning from class-imbalanced data: Review of methods and applications. Expert systems with applications 73 (2017), 220–239.Google ScholarGoogle Scholar
  8. Amira Kamil Ibrahim Hassan and Ajith Abraham. 2016. Modeling insurance fraud detection using imbalanced data classification. In Advances in Nature and Biologically Inspired Computing: Proceedings of the 7th World Congress on Nature and Biologically Inspired Computing (NaBIC2015) in Pietermaritzburg, South Africa, held December 01-03, 2015. Springer, 117–127.Google ScholarGoogle ScholarCross RefCross Ref
  9. Chuanxia Jian, Jian Gao, and Yinhui Ao. 2016. A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing 193 (2016), 115–122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Harsurinder Kaur, Husanbir Singh Pannu, and Avleen Kaur Malhi. 2019. A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Computing Surveys (CSUR) 52, 4 (2019), 1–36.Google ScholarGoogle Scholar
  11. Vojislav Kecman. 2005. Support vector machines–an introduction. In Support vector machines: theory and applications. Springer, 1–47.Google ScholarGoogle Scholar
  12. Bartosz Krawczyk. 2016. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence 5, 4 (2016), 221–232.Google ScholarGoogle ScholarCross RefCross Ref
  13. Na Liu, Xiaomei Li, Ershi Qi, Man Xu, Ling Li, and Bo Gao. 2020. A novel ensemble learning paradigm for medical diagnosis with imbalanced data. IEEE Access 8 (2020), 171263–171280.Google ScholarGoogle ScholarCross RefCross Ref
  14. Maher Maalouf, Dirar Homouz, and Theodore B Trafalis. 2018. Logistic regression in large rare events and imbalanced data: A performance comparison of prior correction and weighting methods. Computational Intelligence 34, 1 (2018), 161–174.Google ScholarGoogle ScholarCross RefCross Ref
  15. Maher Maalouf and Mohammad Siddiqi. 2014. Weighted logistic regression for large-scale imbalanced and rare events data. Knowledge-Based Systems 59 (2014), 142–148.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Maher Maalouf and Theodore B Trafalis. 2011. Rare events and imbalanced datasets: an overview. International Journal of Data Mining, Modelling and Management 3, 4 (2011), 375–388.Google ScholarGoogle ScholarCross RefCross Ref
  17. Benjamin X Wang and Nathalie Japkowicz. 2010. Boosting support vector machines for imbalanced data sets. Knowledge and information systems 25 (2010), 1–20.Google ScholarGoogle Scholar
  18. Shengkun Xie and Jin Zhang. 2022. A Novel Variable Selection Approach Based on Multi-criteria Decision Analysis. In Information Processing and Management of Uncertainty in Knowledge-Based Systems: 19th International Conference, IPMU 2022, Milan, Italy, July 11–15, 2022, Proceedings, Part II. Springer, 115–127.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Handling Data Imbalance In Linear Modelling of Fatality Rate of Auto Collision
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ICISS '23: Proceedings of the 2023 6th International Conference on Information Science and Systems
            August 2023
            301 pages
            ISBN:9798400708206
            DOI:10.1145/3625156

            Copyright © 2023 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 November 2023

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)6
            • Downloads (Last 6 weeks)2

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format