Predicting Components of a Target Value Versus Predicting the Target Value Directly

Sooklal, Shellyann; Hosein, Patrick

doi:10.1007/978-3-031-66705-3_24

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2172))

Included in the following conference series:

International Conference on Deep Learning Theory and Applications

278 Accesses

Abstract

In many Regression problems one can predict components of a target value and then combine those components to determine the target value prediction. The alternative is to predict the target value directly. A simple example is automobile insurance claims. The traditional approach is to compute Severity (the average value of claims made) and Frequency (the number of claims made per year). The product of these will then provide the average money paid annually to the customer (henceforth called claim rate). On the other hand, one can derive the claim rate for each customer and use this as the target value. Intuitively one would think that the latter approach (predicting the target value directly) should provide better results but, in fact, the former approach is better. We investigate the difference in performance of these two approaches (called component and composite predictors respectively) and illustrate the difference. We demonstrate this difference using ten Machine Learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Classification Models

Prediction Modeling Methodology

Linear Regression

References

Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Google Scholar
Clemente, C., Guerreiro, G.R., Bravo, J.M.: Modelling motor insurance claim frequency and severity using gradient boosting. Risks 11(9) (2023). https://doi.org/10.3390/risks11090163. https://www.mdpi.com/2227-9091/11/9/163
Deepchecks Glossary: Mean absolute error (2024). https://deepchecks.com/glossary/mean-absolute-error/
Fauzan, M., Murfi, H.: The accuracy of xgboost for insurance claim prediction. Int. J. Adv. Soft Comput. Appl. 10(2), 159–171 (2018)
Google Scholar
Frees, E.W., Lee, G., Yang, L.: Multivariate frequency-severity regression models in insurance. Risks 4(1) (2016). https://doi.org/10.3390/risks4010004. https://www.mdpi.com/2227-9091/4/1/4
Garrido, J., Genest, C., Schulz, J.: Generalized linear models for dependent frequency and severity of insurance claims. Insur. Math. Econ. 70, 205–215 (2016). https://doi.org/10.1016/j.insmatheco.2016.06.006. https://www.sciencedirect.com/science/article/pii/S0167668715303358
Geeks for Geeks: Gradient boosting in ml (2023). https://www.geeksforgeeks.org/ml-gradient-boosting/
Geeks for Geeks: Generalized linear models (2024). https://www.geeksforgeeks.org/generalized-linear-models/
Geeks for Geeks: Implementing the adaboost algorithm from scratch (2024). https://www.geeksforgeeks.org/implementing-the-adaboost-algorithm-from-scratch/
Geeks for Geeks: Linear regression in machine learning (2024), https://www.geeksforgeeks.org/ml-linear-regression/
Geeks for Geeks: Xgboost for regression (2024). https://www.geeksforgeeks.org/xgboost-for-regression/
Gooljar, S., Manohar, K., Hosein, P.: Performance evaluation and comparison of a new regression algorithm. In: Proceedings of the 12th International Conference on Data Science, Technology and Applications, pp. 524–531. SCITEPRESS - Science and Technology Publications, Rome (2023)
Google Scholar
Guelman, L.: Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Syst. Appl. 39(3), 3659–3667 (2012). https://doi.org/10.1016/j.eswa.2011.09.058
Article MathSciNet Google Scholar
Hosein, P.: A data science approach to risk assessment for automobile insurance policies. Int. J. Data Sci. Anal. 17, 127–138 (2022). https://doi.org/10.1007/s41060-023-00392-x
Article Google Scholar
IBM: What is a neural network? (2024). https://www.ibm.com/topics/neural-networks
scikit learn: Baggingregressor (2024). https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html
scikit learn: Linear models (2024). https://scikit-learn.org/1.0/modules/linear_model.html/bayesian-ridge-regression
scikit learn: Randomforestregressor (2024). https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html
Lim, D.Y.: A neural frequency-severity model and its application to insurance claims (2024)
Google Scholar
Schneider, P., Xhafa, F.: Chapter 3 - anomaly detection: concepts and methods. In: Schneider, P., Xhafa, F. (eds.) Anomaly Detection and Complex Event Processing over IoT Data Streams, pp. 49–66. Academic Press (2022).https://doi.org/10.1016/B978-0-12-823818-9.00013-4. https://www.sciencedirect.com/science/article/pii/B9780128238189000134
Shi, P., Feng, X., Ivantsova, A.: Dependent frequency-severity modeling of insurance claims. Insur. Math. Econ. 64, 417–428 (2015). https://doi.org/10.1016/j.insmatheco.2015.07.006. https://www.sciencedirect.com/science/article/pii/S0167668715001183
shiksha online: How to calculate mean absolute error (2023). https://www.shiksha.com/online-courses/articles/mean-absolute-error/
Spiceworks: What is linear regression? types, equation, examples, and best practices for 2022 (2023). https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-linear-regression/
Statistics How To: Absolute error & mean absolute error (mae) (2024). https://www.statisticshowto.com/absolute-error/
Statistics Solutions: What is linear regression (2024). https://www.statisticssolutions.com/academic-solutions/resources/directory-of-statistical-analyses/what-is-linear-regression/
Su, X., Bai, M.: Stochastic gradient boosting frequency-severity model of insurance claims. PLoS ONE 15(8), e0238000 (2020). https://doi.org/10.1371/journal.pone.0238000
Article Google Scholar
Weisberg, S.: Applied Linear Regression, vol. 528. John Wiley & Sons, Hoboken (2005)
Book Google Scholar
Wilson, A.A., Nehme, A., Dhyani, A., Mahbub, K.: A comparison of generalised linear modelling with machine learning approaches for predicting loss cost in motor insurance. Risks 12(4) (2024). https://doi.org/10.3390/risks12040062. https://www.mdpi.com/2227-9091/12/4/62
Wirawan, D.B.: Gunardi: determining auto insurance pure premium based on mileage (pay-as-you-drive insurance) using tree-based machine learning. In: Mustapha, A., Ibrahim, N., Basri, H., Rusiman, M.S., Zuhaib Haider Rizvi, S. (eds.) Proceedings of the 8th International Conference on the Applications of Science and Mathematics, pp. 317–342. Springer, Singapore (2023). DOI: https://doi.org/10.1007/978-981-99-2850-7_25
XGBoost Developers: XGBoost Documentation (2024). https://xgboost.readthedocs.io/en/stable/
Ye, C., Zhang, L., Han, M., Yu, Y., Zhao, B., Yang, Y.: Combining predictions of auto insurance claims. Econometrics 10(2) (2022). https://www.mdpi.com/2225-1146/10/2/19
Yunos, Z., Ali, A., Shamsuddin, S.M., Noriszura, I., Sallehuddin, R.: Predictive modelling for motor insurance claims using artificial neural networks. Int. J. Adv. Soft Comput. Appl. 8 (2016)
Google Scholar
Yunos, Z.M., Shamsuddin, S.M., Sallehuddin, R., Alwee, R.: Hybrid predictive modelling for motor insurance claim. In: IOP Conference Series: Materials Science and Engineering, vol. 551, no. 1, p. 012075 (2019). https://doi.org/10.1088/1757-899X/551/1/012075

Download references

Author information

Authors and Affiliations

The University of the West Indies, St. Augustine, Trinidad and Tobago
Shellyann Sooklal & Patrick Hosein

Authors

Shellyann Sooklal
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Hosein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shellyann Sooklal .

Editor information

Editors and Affiliations

Instituto de Telecomunicações and University of Lisbon, Lisbon, Portugal
Ana Fred
LIAS, LIAS/ENSMA, Poitiers, France
Allel Hadjali
Ford Motor Company, Dearborn, MI, USA
Oleg Gusikhin
University of Naples Federico II, Naples, Italy
Carlo Sansone

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sooklal, S., Hosein, P. (2024). Predicting Components of a Target Value Versus Predicting the Target Value Directly. In: Fred, A., Hadjali, A., Gusikhin, O., Sansone, C. (eds) Deep Learning Theory and Applications. DeLTA 2024. Communications in Computer and Information Science, vol 2172. Springer, Cham. https://doi.org/10.1007/978-3-031-66705-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-66705-3_24
Published: 21 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-66704-6
Online ISBN: 978-3-031-66705-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predicting Components of a Target Value Versus Predicting the Target Value Directly