Abstract
We study the problem of abstracting a table of data about individuals so that no selection query can identify fewer than k individuals. We show that it is impossible to achieve arbitrarily good polynomial-time approximations for a number of natural variations of the generalization technique, unless P = NP, even when the table has only a single quasi-identifying attribute that represents a geographic or unordered attribute:
-
- Zip-codes: nodes of a planar graph generalized into connected subgraphs
-
- GPS coordinates: points in R2 generalized into non-overlapping rectangles
-
- Unordered data: text labels that can be grouped arbitrarily.
These hard single-attribute instances of generalization problems contrast with the previously known NP-hard instances, which require the number of attributes to be proportional to the number of individual records (the rows of the table). In addition to impossibility results, we provide approximation algorithms for these difficult single-attribute generalization problems, which, of course, apply to multiple-attribute instances with one that is quasi-identifying. Incidentally, the generalization problem for unordered data can be viewed as a novel type of bin packing problem–min-max bin covering–which may be of independent interest.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 246–258. Springer, Heidelberg (2005)
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proc. of 21st Int. Conf. on Data Engineering (ICDE), pp. 217–228. IEEE Computer Society Press, Los Alamitos (2005)
Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 188–200. Springer, Heidelberg (2007)
Coffman Jr., E.G., Garey, M.R., Johnson, D.S.: Approximation algorithms for bin packing: a survey. In: Approximation algorithms for NP-hard problems, pp. 46–93. PWS Publishing Co., Boston (1997)
Domingo-Ferrer, J., Torra, V.: A critique of k-anonymity and some of its enhancements. In: ARES 2008: Proceedings of the, Third International Conference on Availability, Reliability and Security, Washington, DC, USA, pp. 990–993. IEEE Computer Society Press, Los Alamitos (2008)
Du, W., Eppstein, D., Goodrich, M.T., Lueker, G.S.: On the approximability of geometric and geographic generalization and the min-max bin covering problem. Electronic preprint arxiv:0904.3756 (2009)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, New York (1979)
Khanna, S., Muthukrishnan, S., Paterson, M.: On approximating rectangle tiling and packing. In: Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms SODA 1998, pp. 384–393. ACM Press, New York (1998)
LeFevre, K., Dewitt, D.J., Ramakrishnan, R.: Incognito:efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD, June 12-16 (2005)
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 3 (2007)
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS 2004: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM Press, New York (2004)
Park, H., Shim, K.: Approximate algorithms for K-anonymity. In: SIGMOD 2007: Proceedings of the, ACM SIGMOD International Conference on Management of Data, pp. 67–78. ACM Press, New York (2007)
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6) (2001)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI (1998)
Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2003)
Wang, K., Fung, B.C.M.: Anonymizing sequential releases. In: KDD 2006: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 414–423. ACM Press, New York (2006)
Wong, R.C.-W., Li, J., Fu, A.W.-C., Wang, K.: (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: KDD 2006: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 754–759. ACM Press, New York (2006)
Zhong, S., Yang, Z., Wright, R.N.: Privacy-enhancing k-anonymization of customer data. In: PODS 2005: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 139–147. ACM Press, New York (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Du, W., Eppstein, D., Goodrich, M.T., Lueker, G.S. (2009). On the Approximability of Geometric and Geographic Generalization and the Min-Max Bin Covering Problem. In: Dehne, F., Gavrilova, M., Sack, JR., Tóth , C.D. (eds) Algorithms and Data Structures. WADS 2009. Lecture Notes in Computer Science, vol 5664. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03367-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-03367-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03366-7
Online ISBN: 978-3-642-03367-4
eBook Packages: Computer ScienceComputer Science (R0)