Adversarial-knowledge dimensions in data privacy

Chen, Bee-Chung; LeFevre, Kristen; Ramakrishnan, Raghu

doi:10.1007/s00778-008-0118-x

Adversarial-knowledge dimensions in data privacy

Regular Paper
Published: 20 November 2008

Volume 18, pages 429–467, (2009)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Bee-Chung Chen¹,
Kristen LeFevre² &
Raghu Ramakrishnan¹

145 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

Privacy is an important issue in data publishing. Many organizations distribute non-aggregate personal data for research, and they must take steps to ensure that an adversary cannot predict sensitive information pertaining to individuals with high confidence. This problem is further complicated by the fact that, in addition to the published data, the adversary may also have access to other resources (e.g., public records and social networks relating individuals), which we call adversarial knowledge. A robust privacy framework should allow publishing organizations to analyze data privacy by means of not only data dimensions (data that a publishing organization has), but also adversarial-knowledge dimensions (information not in the data). In this paper, we first describe a general framework for reasoning about privacy in the presence of adversarial knowledge. Within this framework, we propose a novel multidimensional approach to quantifying adversarial knowledge. This approach allows the publishing organization to investigate privacy threats and enforce privacy requirements in the presence of various types and amounts of adversarial knowledge. Our main technical contributions include a multidimensional privacy criterion that is more intuitive and flexible than previous approaches to modeling background knowledge. In addition, we identify an important congregation property of the adversarial-knowledge dimensions. Based on this property, we provide algorithms for measuring disclosure and sanitizing data that improve computational efficiency several orders of magnitude over the best known techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal R., Ghosh S., Imielinski T., Swami A: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993). doi:10.1109/69.250074
Article Google Scholar
Bacchus F., Grove A.J., Halpern J., Koller D.: From statistical knowledge bases to degrees of belief. Artif. Intell. 87(1–2), 75–143 (1996). doi:10.1016/S0004-3702(96)00003-3
Article MathSciNet Google Scholar
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’07), pp. 273–282 (2007)
Breiman L., Friedman J., Stone C.J., Olshen R.A.: Classification and Regression Trees. Chapman & Hall, London (1984)
MATH Google Scholar
Chen, B.-C., Chen, L., Lin, Y., Ramakrishnan, R.: Prediction cubes. In: Proceeding of the 31th International Conference on Very Large Data Bases (VLDB’05), pp. 982–993 (2005)
Chen, B.-C., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis: predicting global aggregates from local regions. In: Proceeding of the 32nd International Conference on Very Large Data Bases (VLDB’06), pp. 655–666 (2006b)
Chen, B.-C., LeFevre, K., Ramakrishnan, R.: Privacy skyline: privacy with multidimensional adversarial knowledge. In: Proceeding of the 33th International Conference on Very Large Data Bases (VLDB’07). Also, Technical Report 1596, Computer Sciences, University of Wisconsin, Madison (2007)
Chen, B.-C.: Cube-Space Data Mining. Ph.D. Dissertation, Computer Sciences, University of Wisconsin, Madison (2008)
Dalvi, N., Miklau, G., Suciu, D.: Asymptotic conditional probabilities for conjunctive query. In: Proceedings of the 10th International Conference on Database Theory (ICDT’05), pp. 289–305 (2005)
Deutsch, A., Papakonstantinou, Y.: Privacy in database publishing. In: Proceedings of the 10th International Conference on Database Theory (ICDT’05), pp. 230–245 (2005)
Dobra A., Fienberg S.E.: Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation. Stat. J. U. Nations ECE 18, 363–371 (2001)
Google Scholar
Dwork, C.: Differential privacy. In: Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP’06), pp. 1–12 (2006a)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference (TCC’06), pp. 265–284 (2006b)
Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’03), pp. 211–222 (2003)
Gehrke, J., Ramakrishnan, R., Ganti, V.: RainForest—a framework for fast decision tree construction of large datasets. In: Proceedings of 24th International Conference on Very Large Data Bases (VLD4B’98), pp. 416–427 (1998)
Gray J., Chaudhuri S., Bosworth A., Layman A., Reichart D., Venkatrao M.: Data Cube: A relational aggregate operator generalizing group-by, cross-tab, and sub-tables. J. Data Min. Knowl. Dis. 1(1), 29–53 (1997). doi:10.1023/A:1009726021843
Article Google Scholar
Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD’06), pp. 217–228 (2006)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of ACM SIGMOD International Conference of Management of Data (SIGMOD’05), pp. 49-60 (2005)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian: Multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), pp. 25 (2006a)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Workload-aware anonymization. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06), pp. 277–286 (2006b)
LeFevre, K., DeWitt, D.: Scalable Anonymization Algorithms for Large Data Sets. Technical Report 1590, Computer Sciences, University of Wisconsin, Madison (2007)
Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE’07), pp. 106–115 (2007)
Machanavajjhala, A., Gehrke, J.: On the efficiency of checking perfect privacy. In: Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’06), pp. 163–172 (2006a)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity. In: Privacy Beyond k-Anonymity. Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), pp. 24 (2006b)
Martin, D., Kifer, D., Machanavajjhala, A., Gehrke, J., Halpern, J.: Worst-case background knowledge in privacy. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE’07), pp. 126–135 (2007). (For the extended version that includes the appendix, see “Worst-case background knowledge in privacy”, Computer Science Technical Report, Cornell University, 2006)
Miklau, G., Suciu, D.: A formal analysis of information disclosure in data exchange. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’04), pp. 575–586 (2004)
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC’07), pp. 75–84 (2007)
Papadimitriou C.M.: Computational complexity. Addison-Wesley, Reading (1994)
MATH Google Scholar
Quinlan J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Menlo Park (1993)
Google Scholar
Ramakrishnan R., Chen B.-C.: Exploratory mining in cube space. Data Min. Knowl. Discov. 15(1), 29–54 (2007). doi:10.1007/s10618-007-0063-0
Article MathSciNet Google Scholar
Samarati, P., Sweeney, L.: Protecting Privacy when Disclosing Information: k-Anonymity and its Enforcement through Generalization and Suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)
Sweeney L.: k-Anonymity: a model for protecting privacy. Int. J. Uncertain Fuzziness Knowledge-based Syst. 10(5), 557–570 (2002). doi:10.1142/S0218488502001648
Article MATH MathSciNet Google Scholar
Tao, Y., Xiao, X., Li, J., Zhang, D.: On anti-corruption privacy preserving publication. In: Proceeding of the 24th International Conference on Data Engineering (ICDE’08), pp. 725–734 (2008)
Witten, I., Frank, E.: Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, Menlo Park (2005) (http://www.cs.waikato.ac.nz/ml/weka)
Xiao, X., Tao, Y.: Personalized privacy preservation. In: Proceedings of ACM SIGMOD International Conference of Management of Data (SIGMOD’06), pp. 229–240 (2006a)
Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: Proceeding of the 32nd International Conference on Very Large Data Bases (VLDB’06), pp. 139–150 (2006b)
Yao, C., Wang, X.S., Jajodia, S.: Checking for k-anonymity violation by views. In: Proceeding of the 31st International Conference on Very Large Data Bases (VLDB’05), pp. 910–921 (2005)

Download references

Author information

Authors and Affiliations

Yahoo! Inc., 701 First Avenue, Sunnyvale, CA, 94089, USA
Bee-Chung Chen & Raghu Ramakrishnan
Electrical Engineering and Computer Science, University of Michigan, 2260 Hayward Ave., Ann Arbor, MI, 48109, USA
Kristen LeFevre

Authors

Bee-Chung Chen
View author publications
You can also search for this author in PubMed Google Scholar
Kristen LeFevre
View author publications
You can also search for this author in PubMed Google Scholar
Raghu Ramakrishnan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bee-Chung Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, BC., LeFevre, K. & Ramakrishnan, R. Adversarial-knowledge dimensions in data privacy. The VLDB Journal 18, 429–467 (2009). https://doi.org/10.1007/s00778-008-0118-x

Download citation

Received: 06 February 2008
Revised: 01 October 2008
Accepted: 02 October 2008
Published: 20 November 2008
Issue Date: April 2009
DOI: https://doi.org/10.1007/s00778-008-0118-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adversarial-knowledge dimensions in data privacy

Abstract

Access this article

Similar content being viewed by others

Privacy and artificial intelligence: challenges for protecting health information in a new era

Big healthcare data: preserving security and privacy

AI, big data, and the future of consent

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adversarial-knowledge dimensions in data privacy

Abstract

Access this article

Similar content being viewed by others

Privacy and artificial intelligence: challenges for protecting health information in a new era

Big healthcare data: preserving security and privacy

AI, big data, and the future of consent

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation