Skip to main content

Predicting Components of a Target Value Versus Predicting the Target Value Directly

  • Conference paper
  • First Online:
Deep Learning Theory and Applications (DeLTA 2024)

Abstract

In many Regression problems one can predict components of a target value and then combine those components to determine the target value prediction. The alternative is to predict the target value directly. A simple example is automobile insurance claims. The traditional approach is to compute Severity (the average value of claims made) and Frequency (the number of claims made per year). The product of these will then provide the average money paid annually to the customer (henceforth called claim rate). On the other hand, one can derive the claim rate for each customer and use this as the target value. Intuitively one would think that the latter approach (predicting the target value directly) should provide better results but, in fact, the former approach is better. We investigate the difference in performance of these two approaches (called component and composite predictors respectively) and illustrate the difference. We demonstrate this difference using ten Machine Learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

    Google Scholar 

  2. Clemente, C., Guerreiro, G.R., Bravo, J.M.: Modelling motor insurance claim frequency and severity using gradient boosting. Risks 11(9) (2023). https://doi.org/10.3390/risks11090163. https://www.mdpi.com/2227-9091/11/9/163

  3. Deepchecks Glossary: Mean absolute error (2024). https://deepchecks.com/glossary/mean-absolute-error/

  4. Fauzan, M., Murfi, H.: The accuracy of xgboost for insurance claim prediction. Int. J. Adv. Soft Comput. Appl. 10(2), 159–171 (2018)

    Google Scholar 

  5. Frees, E.W., Lee, G., Yang, L.: Multivariate frequency-severity regression models in insurance. Risks 4(1) (2016). https://doi.org/10.3390/risks4010004. https://www.mdpi.com/2227-9091/4/1/4

  6. Garrido, J., Genest, C., Schulz, J.: Generalized linear models for dependent frequency and severity of insurance claims. Insur. Math. Econ. 70, 205–215 (2016). https://doi.org/10.1016/j.insmatheco.2016.06.006. https://www.sciencedirect.com/science/article/pii/S0167668715303358

  7. Geeks for Geeks: Gradient boosting in ml (2023). https://www.geeksforgeeks.org/ml-gradient-boosting/

  8. Geeks for Geeks: Generalized linear models (2024). https://www.geeksforgeeks.org/generalized-linear-models/

  9. Geeks for Geeks: Implementing the adaboost algorithm from scratch (2024). https://www.geeksforgeeks.org/implementing-the-adaboost-algorithm-from-scratch/

  10. Geeks for Geeks: Linear regression in machine learning (2024), https://www.geeksforgeeks.org/ml-linear-regression/

  11. Geeks for Geeks: Xgboost for regression (2024). https://www.geeksforgeeks.org/xgboost-for-regression/

  12. Gooljar, S., Manohar, K., Hosein, P.: Performance evaluation and comparison of a new regression algorithm. In: Proceedings of the 12th International Conference on Data Science, Technology and Applications, pp. 524–531. SCITEPRESS - Science and Technology Publications, Rome (2023)

    Google Scholar 

  13. Guelman, L.: Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Syst. Appl. 39(3), 3659–3667 (2012). https://doi.org/10.1016/j.eswa.2011.09.058

    Article  MathSciNet  Google Scholar 

  14. Hosein, P.: A data science approach to risk assessment for automobile insurance policies. Int. J. Data Sci. Anal. 17, 127–138 (2022). https://doi.org/10.1007/s41060-023-00392-x

    Article  Google Scholar 

  15. IBM: What is a neural network? (2024). https://www.ibm.com/topics/neural-networks

  16. scikit learn: Baggingregressor (2024). https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html

  17. scikit learn: Linear models (2024). https://scikit-learn.org/1.0/modules/linear_model.html/bayesian-ridge-regression

  18. scikit learn: Randomforestregressor (2024). https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

  19. Lim, D.Y.: A neural frequency-severity model and its application to insurance claims (2024)

    Google Scholar 

  20. Schneider, P., Xhafa, F.: Chapter 3 - anomaly detection: concepts and methods. In: Schneider, P., Xhafa, F. (eds.) Anomaly Detection and Complex Event Processing over IoT Data Streams, pp. 49–66. Academic Press (2022).https://doi.org/10.1016/B978-0-12-823818-9.00013-4. https://www.sciencedirect.com/science/article/pii/B9780128238189000134

  21. Shi, P., Feng, X., Ivantsova, A.: Dependent frequency-severity modeling of insurance claims. Insur. Math. Econ. 64, 417–428 (2015). https://doi.org/10.1016/j.insmatheco.2015.07.006. https://www.sciencedirect.com/science/article/pii/S0167668715001183

  22. shiksha online: How to calculate mean absolute error (2023). https://www.shiksha.com/online-courses/articles/mean-absolute-error/

  23. Spiceworks: What is linear regression? types, equation, examples, and best practices for 2022 (2023). https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-linear-regression/

  24. Statistics How To: Absolute error & mean absolute error (mae) (2024). https://www.statisticshowto.com/absolute-error/

  25. Statistics Solutions: What is linear regression (2024). https://www.statisticssolutions.com/academic-solutions/resources/directory-of-statistical-analyses/what-is-linear-regression/

  26. Su, X., Bai, M.: Stochastic gradient boosting frequency-severity model of insurance claims. PLoS ONE 15(8), e0238000 (2020). https://doi.org/10.1371/journal.pone.0238000

    Article  Google Scholar 

  27. Weisberg, S.: Applied Linear Regression, vol. 528. John Wiley & Sons, Hoboken (2005)

    Book  Google Scholar 

  28. Wilson, A.A., Nehme, A., Dhyani, A., Mahbub, K.: A comparison of generalised linear modelling with machine learning approaches for predicting loss cost in motor insurance. Risks 12(4) (2024). https://doi.org/10.3390/risks12040062. https://www.mdpi.com/2227-9091/12/4/62

  29. Wirawan, D.B.: Gunardi: determining auto insurance pure premium based on mileage (pay-as-you-drive insurance) using tree-based machine learning. In: Mustapha, A., Ibrahim, N., Basri, H., Rusiman, M.S., Zuhaib Haider Rizvi, S. (eds.) Proceedings of the 8th International Conference on the Applications of Science and Mathematics, pp. 317–342. Springer, Singapore (2023). DOI: https://doi.org/10.1007/978-981-99-2850-7_25

  30. XGBoost Developers: XGBoost Documentation (2024). https://xgboost.readthedocs.io/en/stable/

  31. Ye, C., Zhang, L., Han, M., Yu, Y., Zhao, B., Yang, Y.: Combining predictions of auto insurance claims. Econometrics 10(2) (2022). https://www.mdpi.com/2225-1146/10/2/19

  32. Yunos, Z., Ali, A., Shamsuddin, S.M., Noriszura, I., Sallehuddin, R.: Predictive modelling for motor insurance claims using artificial neural networks. Int. J. Adv. Soft Comput. Appl. 8 (2016)

    Google Scholar 

  33. Yunos, Z.M., Shamsuddin, S.M., Sallehuddin, R., Alwee, R.: Hybrid predictive modelling for motor insurance claim. In: IOP Conference Series: Materials Science and Engineering, vol. 551, no. 1, p. 012075 (2019). https://doi.org/10.1088/1757-899X/551/1/012075

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shellyann Sooklal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sooklal, S., Hosein, P. (2024). Predicting Components of a Target Value Versus Predicting the Target Value Directly. In: Fred, A., Hadjali, A., Gusikhin, O., Sansone, C. (eds) Deep Learning Theory and Applications. DeLTA 2024. Communications in Computer and Information Science, vol 2172. Springer, Cham. https://doi.org/10.1007/978-3-031-66705-3_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-66705-3_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-66704-6

  • Online ISBN: 978-3-031-66705-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics