Abstract
Connectivity and automation are evermore part of today’s cars. To provide automation, many gauges are integrated in cars to collect physical readings. In the automobile industry, the gathered multiple datasets can be used to predict whether a car repair is needed soon. This information gives drivers and retailers helpful information to take action early. However, prediction in real use cases shows new challenges: misclassified instances have not equal but different costs. For example, incurred costs for not predicting a necessarily needed tire change are usually higher than predicting a tire change even though the car could still drive thousands of kilometers. To tackle this problem, we introduce a new example-dependent cost sensitive prediction model extending the well-established idea of logistic regression. Our model allows different costs of misclassified instances and obtains prediction results leading to overall less cost. Our method consistently outperforms the state-of-the-art in example-dependent cost-sensitive logistic regression on various datasets. Applying our methods to vehicle data from a large European car manufacturer, we show cost savings of about 10%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that we do not have to explicitly distinguish between \({c_{i}^{FP}}\) and \({c_{i}^{FN}}\). If \(y_i=0\), then \({c_{i}^{FP}} = c_i\), if \(y_i=1\), then \({c_{i}^{FN}}=c_i\). For a single instance, \({c_{i}^{FP}}\) and \({c_{i}^{FN}}\) can never occur together.
- 2.
The case \(y_i=0\) is equivalent; only mirrored. W.l.o.g. we consider in the following only \(y_i=1\).
- 3.
More precise, the average loss for correctly classified instances would be \(2\cdot T_{log}\).
- 4.
- 5.
- 6.
Due to nondisclosure agreements we unfortunately can not provide more details on the dataset. The two other datasets studied in this work are publicly available.
References
Zadrozny, B., et al.: Cost-sensitive learning by cost-proportionate example weighting. In: ICDM, pp. 435–442 (2003)
Günnemann, N., et al.: Robust multivariate autoregression for anomaly detection in dynamic product ratings. In: WWW, pp. 361–372 (2014)
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Haykin, S.: A comprehensive foundation. Neural Netw. 2, 41 (2004)
Weiss, G.M.: Learning with rare cases and small disjuncts. In: ICML, pp. 558–565 (1995)
Bahnsen, A.C., et al.: Example-dependent cost-sensitive logistic regression for credit scoring. In: ICMLA, pp. 263–269 (2014)
Anderson, R.: The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation. Oxford University Press, Oxford (2007)
Bahnsen, A.C., et al.: Cost sensitive credit card fraud detection using Bayes minimum risk. In: ICMLA, pp. 333–338 (2013)
Bahnsen, A.C., et al.: Improving credit card fraud detection with calibrated probabilities. In: SIAM, pp. 677–685 (2014)
Alejo, R., García, V., Marqués, A.I., Sánchez, J.S., Antonio-Velázquez, J.A.: Making accurate credit risk predictions with cost-sensitive MLP neural networks. In: Casillas, J., Martínez-López, F., Vicari, R., De la Prieta, F. (eds.) Management Intelligent Systems. AISC, vol. 220, pp. 1–8. Springer, Heidelberg (2013). doi:10.1007/978-3-319-00569-0_1
Beling, P., et al.: Optimal scoring cutoff policies and efficient frontiers. J. Oper. Res. Soc. 56(9), 1016–1029 (2005)
Oliver, R.M., et al.: Optimal score cutoffs and pricing in regulatory capital in retail credit portfolios. University of Southampton (2009)
Verbraken, T., et al.: Development and application of consumer credit scoring models using profit-based classification measures. Eur. J. Oper. Res. 238(2), 505–513 (2014)
Lomax, S., et al.: A survey of cost-sensitive decision tree induction algorithms. CSUR 45(2), 16 (2013)
Bahnsen, A.C., et al.: Ensemble of example-dependent cost-sensitive decision trees (2015). arXiv preprint arXiv:1505.04637
Mobley, R.K.: An Introduction to Predictive Maintenance. Butterworth-Heinemann, Oxford (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Günnemann, N., Pfeffer, J. (2017). Cost Matters: A New Example-Dependent Cost-Sensitive Logistic Regression Model. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10234. Springer, Cham. https://doi.org/10.1007/978-3-319-57454-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-57454-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57453-0
Online ISBN: 978-3-319-57454-7
eBook Packages: Computer ScienceComputer Science (R0)