Local and global recoding methods for anonymizing set-valued data

Terrovitis, Manolis; Mamoulis, Nikos; Kalnis, Panos

doi:10.1007/s00778-010-0192-8

Local and global recoding methods for anonymizing set-valued data

Regular Paper
Published: 10 June 2010

Volume 20, pages 83–106, (2011)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Manolis Terrovitis¹,
Nikos Mamoulis² &
Panos Kalnis³

376 Accesses
77 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper, we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of supermarket transactions that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the knowledge of the adversary. We define a new version of the k-anonymity guarantee, the k ^m-anonymity, to limit the effects of the data dimensionality, and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm that finds the optimal solution, however, at a high cost that makes it inapplicable for large, realistic problems. Then, we propose a greedy heuristic, which performs generalizations in an Apriori, level-wise fashion. The heuristic scales much better and in most of the cases finds a solution close to the optimal. Finally, we investigate the application of techniques that partition the database and perform anonymization locally, aiming at the reduction of the memory consumption and further scalability. A thorough experimental evaluation with real datasets shows that a vertical partitioning approach achieves excellent results in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: VLDB ’05: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 901–909. VLDB Endowment (2005)
Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R. Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: Proceedings of ACM PODS, pp. 153–162 (2006)
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for k-anonymity. J. Priv. Tech. (Paper number:20051120001) (2005)
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Anonymity preserving pattern discovery. VLDB J. (accepted for publication) (2008)
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proceedings of ICDE, pp 217–228 (2005)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. pp. 137–150. December (2004)
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: vldb, pp. 758–769 (2007)
Ghinita, G., Tao, Y., Kalnis, P.: On the anonymization of sparse high-dimensional data. In: Proceedings of ICDE (2008)
Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: vldb, pp. 420–431 (1995)
Han J., Fu Y.: Mining multiple-level association rules in large databases. IEEE TKDE 11(5), 798–805 (1999)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD, pp. 1–12 (2000)
He Y., Naughton J.F.: Anonymization of set-valued data via top-down, local generalization. PVLDB 2(1), 934–945 (2009)
Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of SIGKDD, pp. 279–288 (2002)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of ACM SIGMOD, pp. 49–60 (2005)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of ICDE (2006)
Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of ICDE, pp. 106–115 (2007)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: privacy beyond k-anonymity. In: Proceedings of ICDE (2006)
Meyerson, A., Williams, R.: On the complexity of optimal K-anonymity. In: Proceedings of ACM PODS, pp. 223–228 (2004)
Nergiz, M., Clifton, C., Nergiz, A.: Multirelational k-anonymity. Technical Report CSD TR 08-002
Nergiz, M., Clifton, C., Nergiz, A.: Multirelational k-anonymity. In: Proceedings of ICDE, pp. 1417–1421 (2007)
Nergiz M.E., Clifton C.: Thoughts on k-anonymization. Data. Knowl. Eng. 63(3), 622–645 (2007)
Article Google Scholar
Park, H., Shim, K.: Approximate algorithms for k-anonymity. In: Proceedings of ACM SIGMOD, pp. 67–78 (2007)
Press W.H., Teukolsky S.A., Vetterling W.T., Flannery B.P.: Numerical recipes in C, 2nd edn. Cambridge University Press, Cambridge (1992)
MATH Google Scholar
Samarati P.: Protecting respondents’ identities in microdata release. IEEE TKDE 13(6), 1010–1027 (2001)
Google Scholar
Sweeney L.: k-Anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. In: Proceedings of the VLDB Endowment (PVLDB) (former VLDB proceedings) 1(1) (2008)
Verykios V.S., Elmagarmid A.K., Bertino E., Saygin Y., Dasseni E.: Association rule hiding. IEEE TKDE 16(4), 434–447 (2004)
Google Scholar
Xiao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: Proceedings of VLDB, pp. 139–150 (2006)
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.: Utility-based anonymization using local recoding. In: Proceedings of SIGKDD, pp. 785–790 (2006)
Xu, Y., Wang, K., Fu, A.W.-C., Yu, P.S.: Anonymizing transaction databases for publication. In: Proceedings of KDD, pp. 767–775 (2008)
Zhang, Q., Koudas, N., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: Proceedings of ICDE, pp. 116–125 (2007)
Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proceedings of KDD, pp. 401–406 (2001)

Download references

Author information

Authors and Affiliations

Institute for the Management of Information Systems (IMIS), Research Center “Athena”, Athena, Greece
Manolis Terrovitis
Department of Computer Science, University of Hong Kong, Hong Kong, China
Nikos Mamoulis
Division of Mathematical and Computer Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Panos Kalnis

Authors

Manolis Terrovitis
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Mamoulis
View author publications
You can also search for this author in PubMed Google Scholar
Panos Kalnis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikos Mamoulis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Terrovitis, M., Mamoulis, N. & Kalnis, P. Local and global recoding methods for anonymizing set-valued data. The VLDB Journal 20, 83–106 (2011). https://doi.org/10.1007/s00778-010-0192-8

Download citation

Received: 26 May 2009
Revised: 02 March 2010
Accepted: 16 April 2010
Published: 10 June 2010
Issue Date: February 2011
DOI: https://doi.org/10.1007/s00778-010-0192-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local and global recoding methods for anonymizing set-valued data

Abstract

Access this article

Similar content being viewed by others

Improved Algorithms for Anonymization of Set-Valued Data

On the Complexity of t-Closeness Anonymization and Related Problems

Anonymization of Data Sets with NULL Values

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Local and global recoding methods for anonymizing set-valued data

Abstract

Access this article

Similar content being viewed by others

Improved Algorithms for Anonymization of Set-Valued Data

On the Complexity of t-Closeness Anonymization and Related Problems

Anonymization of Data Sets with NULL Values

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation