Clustering for Data Privacy and Classification Tasks

Schebesch, Klaus  B.; Stecking, Ralf

doi:10.1007/978-3-319-07001-8_54

Klaus B. Schebesch^4,5 &
Ralf Stecking⁶

Part of the book series: Operations Research Proceedings ((ORP))

1622 Accesses
1 Citations

Abstract

Predictive classification is a part of data mining and of many related data-intensive research activities. In applications deriving from business intelligence, potentially valuable data from large databases often cannot be used in an unrestricted way. Privacy constraints may not allow the data modeler to use all of the existing feature variables in building the classification models. In certain situations, pre-processing the original data can lead to intermediate datasets, which hide private or commercially sensitive information but still contain information useful enough for building competitive classification models. To this end, we propose to cooperatively use both unsupervised Clustering and supervised Support Vector Machines. For an instance of real-life credit client scoring, we then evaluate our approach against the case of unrestricted use of all data features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Object selection in credit scoring using covariance matrix of parameters estimations

Article 09 February 2017

Credit Card Fraud Detection: An Exploration of Different Sampling Methods to Solve the Class Imbalance Problem

Customs fraud detection

Article 30 October 2019

References

Japkowicz, N. (2002). Supervised learning with unsupervised output separation. In International Conference on Artificial Intelligence and Soft Computing (pp. 321–325).
Google Scholar
Li, B., Chi, M., Fan, J., & Xue, X. (2007). Support cluster machine. In Proceedings of the 24th International Conference on Machine Learning (pp. 505–512).
Google Scholar
Ohm, P. (2010). Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57, 1710–1777.
Google Scholar
Radovanović, M., Nanopulos, A., & Ivanović, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11, 2487–2531.
Google Scholar
Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: The MIT Press.
Google Scholar
Von Luxburg, U., Williamson, R. C., & Guyon, I. (2012). Clustering: Science or art? Workshop on Unsupervised Learning and Transfer Learning, JMLR Proceeding, 27, 65–79.
Google Scholar
Weiss, G. M. (2004). Mining with rarity: a unifying framework. SIGKDD Explorations, 6(1), 7–19.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economics, “Vasile Goldiş” Western University of Arad, 310086, Arad, Romania
Klaus B. Schebesch
Department of Informatics, “Vasile Goldiş” Western University of Arad, 310086, Arad, Romania
Klaus B. Schebesch
Department of Economics, Carl von Ossietzky University of Oldenburg, 26111, Oldenburg, Germany
Ralf Stecking

Authors

Klaus B. Schebesch
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Stecking
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Klaus B. Schebesch .

Editor information

Editors and Affiliations

Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, The Netherlands
Dennis Huisman
Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, The Netherlands
Ilse Louwerse
Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, The Netherlands
Albert P.M. Wagelmans

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schebesch, K. ., Stecking, R. (2014). Clustering for Data Privacy and Classification Tasks. In: Huisman, D., Louwerse, I., Wagelmans, A. (eds) Operations Research Proceedings 2013. Operations Research Proceedings. Springer, Cham. https://doi.org/10.1007/978-3-319-07001-8_54

Download citation

DOI: https://doi.org/10.1007/978-3-319-07001-8_54
Published: 10 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07000-1
Online ISBN: 978-3-319-07001-8
eBook Packages: Business and EconomicsBusiness and Management (R0)

Publish with us

Policies and ethics