Abstract
Financial revenue, in the insurance sector, is systematically rising. This growth is, primarily, related to an increasing number of sold policies. While there exists a substantial body of work focused on discovering insurance fraud, e.g. related to car accidents, an open question remains, is it possible to capture incorrect data in the sales systems. Such erroneous data can result in financial losses. It may be caused by mistakes made by the sales person(s), but may be also a result of a fraud. In this work, research is focused on detecting anomalies in car insurance contracts. It is based on a dataset obtained from an actual insurance company, based in Poland. This dataset is thoroughly analysed, including preprocessing and feature selection. Next, a number of anomaly detection algorithms are applied to it, and their performance is compared. Specifically, clustering algorithms, dynamic classifier selection, and gradient boosted decision trees, are experimented with. Furthermore, the scenario where the size of the dataset is increasing is considered. It is shown that use of, broadly understood, machine learning has a realistic potential to facilitate anomaly detection, during insurance policy sales.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Polish Central Statistical Office: Polish Insurance Market in 2018 (2019). https://stat.gov.pl/en/topics/economic-activities-finances/financial-results/polish-insurance-market-in-2018,2,8.html
Talagala, P.D., Hyndman, R.J., Smith-Miles, K.: Anomaly detection in high dimensional data. J. Comput. Graph. Stat. (2020)
Thiprungsri, S., Vasarhelyi, M.A.: Cluster analysis for anomaly detection in accounting data: an accounting approach. Int. J. Digit. Account. Res. 11, 69â84 (2011)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Symposium on Mathematical Statistics and Probability (1967)
Zhao, Y., Hryniewicki, M.K.: DCSO: dynamic combination of detector scores for outlier ensembles. In: ACM KDD Workshop on Outlier Detection De-Constructed (ODD v5.0) (2018)
Viaene, S., Derrig, R.A., Baesens, B., Dedene, G.: A comparison of state-of-the-art classification techniques for expert automobile insurance claim fraud detection. J. Risk Insur. 69, 373â421 (2002)
Hassan, A.K.I., Abraham, A.: Modeling insurance fraud detection using ensemble combining classification. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 8, 257â265 (2016)
DeBarr, D., Wechsler, H.: Fraud detection using reputation features, SVMs, and random forests. In: Proceedings of the International Conference on Data Science (2013)
Niana, K., Zhanga, H., Tayal, A., Coleman, T., Li, Y.: Auto insurance fraud detection using unsupervised spectral ranking for anomaly. J. Financ. Data Sci. 2, 58â75 (2016)
Anton, S.D.D., Sinha, S., Schotten, H.D.: Anomaly-based intrusion detection in industrial data with SVM and random forests. In: International Conference on Software, Telecommunications and Computer Networks (2019)
Dhieb, N., Ghazzai, H., Besbes, H., Massoud, Y.: Extreme gradient boosting machine learning algorithm for safe auto insurance operations. In: IEEE International Conference of Vehicular Electronics and Safety (2019)
Bodaghi, A., Teimourpour, B.: Automobile insurance fraud detection using social network analysis. In: Moshirpour, M., Far, B.H., Alhajj, R. (eds.) Applications of Data Management and Analysis. LNSN, pp. 11â16. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95810-1_2
BĂ©jar, J.: K-means vs Mini Batch K-means: A comparison, KEMLG - Grup dâEnginyeria del Coneixement i Aprenentatge AutomĂ tic - Reports de recerca (2013)
McLachlan, G.J., Basford, K.E.: Mixture models. Inference and applications to clustering (1988)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases (1996)
Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236â244 (1963)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579â2605 (2008)
van der Maaten, L.: Learning a parametric embedding by preserving local structure. In: Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics (2009)
Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861â874 (2006)
Insurance Guarantee Fund. https://www.ufg.pl/infoportal/faces/pages_home-page
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Piesio, M., Ganzha, M., Paprzycki, M. (2020). Applying Machine Learning to Anomaly Detection in Car Insurance Sales. In: Bellatreche, L., Goyal, V., Fujita, H., Mondal, A., Reddy, P.K. (eds) Big Data Analytics. BDA 2020. Lecture Notes in Computer Science(), vol 12581. Springer, Cham. https://doi.org/10.1007/978-3-030-66665-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-66665-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66664-4
Online ISBN: 978-3-030-66665-1
eBook Packages: Computer ScienceComputer Science (R0)