Skip to main content

Clustering for Data Privacy and Classification Tasks

  • Conference paper
  • First Online:
Operations Research Proceedings 2013

Part of the book series: Operations Research Proceedings ((ORP))

Abstract

Predictive classification is a part of data mining and of many related data-intensive research activities. In applications deriving from business intelligence, potentially valuable data from large databases often cannot be used in an unrestricted way. Privacy constraints may not allow the data modeler to use all of the existing feature variables in building the classification models. In certain situations, pre-processing the original data can lead to intermediate datasets, which hide private or commercially sensitive information but still contain information useful enough for building competitive classification models. To this end, we propose to cooperatively use both unsupervised Clustering and supervised Support Vector Machines. For an instance of real-life credit client scoring, we then evaluate our approach against the case of unrestricted use of all data features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Japkowicz, N. (2002). Supervised learning with unsupervised output separation. In International Conference on Artificial Intelligence and Soft Computing (pp. 321–325).

    Google Scholar 

  2. Li, B., Chi, M., Fan, J., & Xue, X. (2007). Support cluster machine. In Proceedings of the 24th International Conference on Machine Learning (pp. 505–512).

    Google Scholar 

  3. Ohm, P. (2010). Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57, 1710–1777.

    Google Scholar 

  4. Radovanović, M., Nanopulos, A., & Ivanović, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11, 2487–2531.

    Google Scholar 

  5. Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: The MIT Press.

    Google Scholar 

  6. Von Luxburg, U., Williamson, R. C., & Guyon, I. (2012). Clustering: Science or art? Workshop on Unsupervised Learning and Transfer Learning, JMLR Proceeding, 27, 65–79.

    Google Scholar 

  7. Weiss, G. M. (2004). Mining with rarity: a unifying framework. SIGKDD Explorations, 6(1), 7–19.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Klaus B. Schebesch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Schebesch, K. ., Stecking, R. (2014). Clustering for Data Privacy and Classification Tasks. In: Huisman, D., Louwerse, I., Wagelmans, A. (eds) Operations Research Proceedings 2013. Operations Research Proceedings. Springer, Cham. https://doi.org/10.1007/978-3-319-07001-8_54

Download citation

Publish with us

Policies and ethics