Skip to main content

Microaggregation for Categorical Variables: A Median Based Approach

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3050))

Abstract

Microaggregation is a masking procedure used for protecting confidential data prior to their public release. This technique, that relies on clustering and aggregation techniques, is solely used for numerical data. In this work we introduce a microaggregation procedure for categorical variables. We describe the new masking method and we analyse the results it obtains according to some indices found in the literature. The method is compared with Top and Bottom Coding, Global recoding, Rank Swapping and PRAM.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proc. 2000 ACM SIGMOD Int’l Conf. Management of Data, pp. 439–450. ACM Press, New York (2000)

    Chapter  Google Scholar 

  2. Chiang, J.-H., Hao, P.-Y.: A new kernel-based fuzzy clustering approach: support vector clustering with cell growing. IEEE Trans. on Fuzzy Systems 11(4), 518–527 (2003)

    Article  Google Scholar 

  3. Data Extraction System (DES), U. S. Census Bureau, http://www.census.gov/DES/www/welcome.html

  4. Domingo-Ferrer, J., Torra, V.: Disclosure Control Methods and Information Loss for Microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.M. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. Elsevier, Amsterdam (2001)

    Google Scholar 

  5. Domingo-Ferrer, J., Torra, V.: A Quantitative Comparison of Disclosure Control Methods for Microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.M. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. Elsevier, Amsterdam (2001)

    Google Scholar 

  6. Domingo-Ferrer, J., Torra, V.: Median based aggregation operators for prototype construction in ordinal scales. Intl. J. of Intel. Syst. 6, 633–655 (2003)

    Article  Google Scholar 

  7. Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.M. (eds.): Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Elsevier, Amsterdam (2001)

    Google Scholar 

  8. Eschrich, S., Ke, J., Hall, L.O., Goldgof, D.B.: Fast accurate fuzzy clustering through data reduction. IEEE Trans. on Fuzzy Systems 11(2), 262–270 (2003)

    Article  Google Scholar 

  9. Felso, F., Theeuwes, J., Wagner, G.G.: Disclosure Limitation Methods in Use: Results of a Survey. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.M. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 17–42. Elsevier, Amsterdam (2001)

    Google Scholar 

  10. Godo, L., Torra, V.: On aggregation operators for ordinal qualitative information. IEEE Trans. on Fuzzy Systems 8(2), 143–154 (2000)

    Article  Google Scholar 

  11. Herrera, F., Herrera-Viedma, E., Verdegay, J.L.: A Sequential Selection process in Group Decision Making with a Linguistic Assessment Approach. Information Science 85, 223–239 (1995)

    Article  MATH  Google Scholar 

  12. Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. on Fuzzy Systems 7(4), 446–452 (1999)

    Article  Google Scholar 

  13. Kolen, J.F., Hutcheson, T.: Reducing the time complexity of the fuzzy c-means algorithm. IEEE Trans. on Fuzzy Systems 10(2), 263–267 (2002)

    Article  Google Scholar 

  14. Kooiman, P., Willenborg, L., Gouweleeuw, J.: PRAM: a method for disclosure limitation of microdata, Statistics Netherlands, Research Report (1998)

    Google Scholar 

  15. Leski, J.M.: Generalized weighted conditional fuzzy clustering. IEEE Trans. on Fuzzy Systems 11(6), 709–715 (2003)

    Article  Google Scholar 

  16. Miyamoto, S.: Introduction to fuzzy clustering. Morikita, Japan (1999)

    Google Scholar 

  17. Miyamoto, S., Umayahara, K.: Methods in Hard and Fuzzy Clustering. In: Liu, Z.-Q., Miyamoto, S. (eds.) Soft Computing and Human-Centered Machines, pp. 85–129. Springer, Tokyo (2000)

    Google Scholar 

  18. Sande, G.: Exact and approximate methods for data directed microaggregation in one or more dimensions. Int. J. of Unc. Fuzziness and Knowledge Based Systems 10(5), 459–476 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  19. Sugeno, M.: Theory of Fuzzy Integrals and its Applications (PhD Dissertation). Tokyo Institute of Technology, Tokyo, Japan (1974)

    Google Scholar 

  20. Torra, V.: Negation functions based semantics for ordered linguistic labels. Intl. J. of Intel. Syst. 11, 975–988 (1996)

    Article  Google Scholar 

  21. Torra, V.: The Weighted OWA operator. Intl. J. of Intel. Syst. 12, 153–166 (1997)

    Article  MATH  Google Scholar 

  22. Torra, V.: Aggregation of linguistic labels when semantics is based on antonyms. Intl. J. of Intel. Systems 16, 513–524 (2001)

    Article  MATH  Google Scholar 

  23. Willenborg, L., De Waal, T.: Statistical Disclosure Control in Practice. LNS, vol. 111. Springer, Heidelberg (1996)

    MATH  Google Scholar 

  24. Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, Heidelberg (2001)

    Book  MATH  Google Scholar 

  25. Winkler, W.E.: Matching and record linkage. In: Cox, B.G. (ed.) Business Survey Methods, pp. 355–384. Wiley, New York (1995)

    Google Scholar 

  26. Xu, Z.S., Da, Q.L.: An overview of operators for aggregating information. Int. J. of Intel. Systems 18, 953–969 (2003)

    Article  MATH  Google Scholar 

  27. Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure Risk Assessment in Perturbative Microdata Protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Torra, V. (2004). Microaggregation for Categorical Variables: A Median Based Approach. In: Domingo-Ferrer, J., Torra, V. (eds) Privacy in Statistical Databases. PSD 2004. Lecture Notes in Computer Science, vol 3050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25955-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-25955-8_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22118-0

  • Online ISBN: 978-3-540-25955-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics