Abstract
The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization recently proposed is the k-anonymity. This approach requires that the rows in a table are clustered in sets of size at least k and that all the rows in a cluster become the same tuple, after the suppression of some records. The natural optimization problem, where the goal is to minimize the number of suppressed entries, is known to be NP-hard when the values are over a ternary alphabet, k=3 and the rows length is unbounded. In this paper we give a lower bound on the approximation factor that any polynomial-time algorithm can achieve on two restrictions of the problem, namely (i) when the records values are over a binary alphabet and k=3, and (ii) when the records have length at most 8 and k=4, showing that these restrictions of the problem are APX-hard.
Similar content being viewed by others
References
Aggarwal G, Feder T, Kenthapadi K, Khuller S, Panigrahy R, Thomas D, Zhu A (2006) Achieving anonymity via clustering. In: Vansummeren S (ed) PODS. ACM, New York, pp 153–162
Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Anonymizing tables. In: Eiter T, Libkin L (eds) ICDT. Lecture notes in computer science, vol 3363. Springer, Berlin, pp 246–258
Aggarwal G, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Approximation algorithms for k-anonymity. J Priv Technol 2
Alimonti P, Kann V (2000) Some APX-completeness results for cubic graphs. Theor Comput Sci 237(1–2):123–134
Ausiello G, Crescenzi P, Gambosi V, Kann G, Marchetti-Spaccamela A, Protasi M (1999) Complexity and approximation: combinatorial optimization problems and their approximability properties. Springer, Berlin
Chaytor R, Evans PA, Wareham T (2008) Fixed-parameter tractability of anonymizing data by suppressing entries. In: Yang B, Du D-Z, Wang CA (eds) COCOA. Lecture notes in computer science, vol 5165. Springer, Berlin, pp 23–31
Gasieniec L, Jansson J, Lingas A (2004) Approximation algorithms for hamming clustering problems. J Discrete Algorithms 2(2):289–301
Gionis A, Tassa T (2007) k-anonymization with minimal loss of information. In: Arge L, Hoffmann M, Welzl E (eds) ESA. Lecture notes in computer science, vol 4698. Springer, Berlin, pp 439–450
Li M, Ma B, Wang L (2002) Finding similar regions in many sequences. J Comput Syst Sci 65(1):73–96
Park H, Shim K (2007) Approximate algorithms for k-anonymity. In: Chan CY, Ooi BC, Zhou A (eds) SIGMOD Conference. ACM, New York, pp 67–78
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: PODS. ACM, New York, p 188 (abstract)
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-Based Syst 10(5):557–570
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bonizzoni, P., Della Vedova, G. & Dondi, R. Anonymizing binary and small tables is hard to approximate. J Comb Optim 22, 97–119 (2011). https://doi.org/10.1007/s10878-009-9277-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-009-9277-y