Abstract
Microaggregation is a masking procedure used for protecting confidential data prior to their public release. This technique, that relies on clustering and aggregation techniques, is solely used for numerical data. In this work we introduce a microaggregation procedure for categorical variables. We describe the new masking method and we analyse the results it obtains according to some indices found in the literature. The method is compared with Top and Bottom Coding, Global recoding, Rank Swapping and PRAM.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proc. 2000 ACM SIGMOD Int’l Conf. Management of Data, pp. 439–450. ACM Press, New York (2000)
Chiang, J.-H., Hao, P.-Y.: A new kernel-based fuzzy clustering approach: support vector clustering with cell growing. IEEE Trans. on Fuzzy Systems 11(4), 518–527 (2003)
Data Extraction System (DES), U. S. Census Bureau, http://www.census.gov/DES/www/welcome.html
Domingo-Ferrer, J., Torra, V.: Disclosure Control Methods and Information Loss for Microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.M. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. Elsevier, Amsterdam (2001)
Domingo-Ferrer, J., Torra, V.: A Quantitative Comparison of Disclosure Control Methods for Microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.M. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. Elsevier, Amsterdam (2001)
Domingo-Ferrer, J., Torra, V.: Median based aggregation operators for prototype construction in ordinal scales. Intl. J. of Intel. Syst. 6, 633–655 (2003)
Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.M. (eds.): Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Elsevier, Amsterdam (2001)
Eschrich, S., Ke, J., Hall, L.O., Goldgof, D.B.: Fast accurate fuzzy clustering through data reduction. IEEE Trans. on Fuzzy Systems 11(2), 262–270 (2003)
Felso, F., Theeuwes, J., Wagner, G.G.: Disclosure Limitation Methods in Use: Results of a Survey. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.M. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 17–42. Elsevier, Amsterdam (2001)
Godo, L., Torra, V.: On aggregation operators for ordinal qualitative information. IEEE Trans. on Fuzzy Systems 8(2), 143–154 (2000)
Herrera, F., Herrera-Viedma, E., Verdegay, J.L.: A Sequential Selection process in Group Decision Making with a Linguistic Assessment Approach. Information Science 85, 223–239 (1995)
Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. on Fuzzy Systems 7(4), 446–452 (1999)
Kolen, J.F., Hutcheson, T.: Reducing the time complexity of the fuzzy c-means algorithm. IEEE Trans. on Fuzzy Systems 10(2), 263–267 (2002)
Kooiman, P., Willenborg, L., Gouweleeuw, J.: PRAM: a method for disclosure limitation of microdata, Statistics Netherlands, Research Report (1998)
Leski, J.M.: Generalized weighted conditional fuzzy clustering. IEEE Trans. on Fuzzy Systems 11(6), 709–715 (2003)
Miyamoto, S.: Introduction to fuzzy clustering. Morikita, Japan (1999)
Miyamoto, S., Umayahara, K.: Methods in Hard and Fuzzy Clustering. In: Liu, Z.-Q., Miyamoto, S. (eds.) Soft Computing and Human-Centered Machines, pp. 85–129. Springer, Tokyo (2000)
Sande, G.: Exact and approximate methods for data directed microaggregation in one or more dimensions. Int. J. of Unc. Fuzziness and Knowledge Based Systems 10(5), 459–476 (2002)
Sugeno, M.: Theory of Fuzzy Integrals and its Applications (PhD Dissertation). Tokyo Institute of Technology, Tokyo, Japan (1974)
Torra, V.: Negation functions based semantics for ordered linguistic labels. Intl. J. of Intel. Syst. 11, 975–988 (1996)
Torra, V.: The Weighted OWA operator. Intl. J. of Intel. Syst. 12, 153–166 (1997)
Torra, V.: Aggregation of linguistic labels when semantics is based on antonyms. Intl. J. of Intel. Systems 16, 513–524 (2001)
Willenborg, L., De Waal, T.: Statistical Disclosure Control in Practice. LNS, vol. 111. Springer, Heidelberg (1996)
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, Heidelberg (2001)
Winkler, W.E.: Matching and record linkage. In: Cox, B.G. (ed.) Business Survey Methods, pp. 355–384. Wiley, New York (1995)
Xu, Z.S., Da, Q.L.: An overview of operators for aggregating information. Int. J. of Intel. Systems 18, 953–969 (2003)
Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure Risk Assessment in Perturbative Microdata Protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Torra, V. (2004). Microaggregation for Categorical Variables: A Median Based Approach. In: Domingo-Ferrer, J., Torra, V. (eds) Privacy in Statistical Databases. PSD 2004. Lecture Notes in Computer Science, vol 3050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25955-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-25955-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22118-0
Online ISBN: 978-3-540-25955-8
eBook Packages: Springer Book Archive