Abstract
Predictive classification is a part of data mining and of many related data-intensive research activities. In applications deriving from business intelligence, potentially valuable data from large databases often cannot be used in an unrestricted way. Privacy constraints may not allow the data modeler to use all of the existing feature variables in building the classification models. In certain situations, pre-processing the original data can lead to intermediate datasets, which hide private or commercially sensitive information but still contain information useful enough for building competitive classification models. To this end, we propose to cooperatively use both unsupervised Clustering and supervised Support Vector Machines. For an instance of real-life credit client scoring, we then evaluate our approach against the case of unrestricted use of all data features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Japkowicz, N. (2002). Supervised learning with unsupervised output separation. In International Conference on Artificial Intelligence and Soft Computing (pp. 321–325).
Li, B., Chi, M., Fan, J., & Xue, X. (2007). Support cluster machine. In Proceedings of the 24th International Conference on Machine Learning (pp. 505–512).
Ohm, P. (2010). Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57, 1710–1777.
Radovanović, M., Nanopulos, A., & Ivanović, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11, 2487–2531.
Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: The MIT Press.
Von Luxburg, U., Williamson, R. C., & Guyon, I. (2012). Clustering: Science or art? Workshop on Unsupervised Learning and Transfer Learning, JMLR Proceeding, 27, 65–79.
Weiss, G. M. (2004). Mining with rarity: a unifying framework. SIGKDD Explorations, 6(1), 7–19.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Schebesch, K. ., Stecking, R. (2014). Clustering for Data Privacy and Classification Tasks. In: Huisman, D., Louwerse, I., Wagelmans, A. (eds) Operations Research Proceedings 2013. Operations Research Proceedings. Springer, Cham. https://doi.org/10.1007/978-3-319-07001-8_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-07001-8_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07000-1
Online ISBN: 978-3-319-07001-8
eBook Packages: Business and EconomicsBusiness and Management (R0)