Skip to main content
Log in

Adversarial-knowledge dimensions in data privacy

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Privacy is an important issue in data publishing. Many organizations distribute non-aggregate personal data for research, and they must take steps to ensure that an adversary cannot predict sensitive information pertaining to individuals with high confidence. This problem is further complicated by the fact that, in addition to the published data, the adversary may also have access to other resources (e.g., public records and social networks relating individuals), which we call adversarial knowledge. A robust privacy framework should allow publishing organizations to analyze data privacy by means of not only data dimensions (data that a publishing organization has), but also adversarial-knowledge dimensions (information not in the data). In this paper, we first describe a general framework for reasoning about privacy in the presence of adversarial knowledge. Within this framework, we propose a novel multidimensional approach to quantifying adversarial knowledge. This approach allows the publishing organization to investigate privacy threats and enforce privacy requirements in the presence of various types and amounts of adversarial knowledge. Our main technical contributions include a multidimensional privacy criterion that is more intuitive and flexible than previous approaches to modeling background knowledge. In addition, we identify an important congregation property of the adversarial-knowledge dimensions. Based on this property, we provide algorithms for measuring disclosure and sanitizing data that improve computational efficiency several orders of magnitude over the best known techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R., Ghosh S., Imielinski T., Swami A: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993). doi:10.1109/69.250074

    Article  Google Scholar 

  2. Bacchus F., Grove A.J., Halpern J., Koller D.: From statistical knowledge bases to degrees of belief. Artif. Intell. 87(1–2), 75–143 (1996). doi:10.1016/S0004-3702(96)00003-3

    Article  MathSciNet  Google Scholar 

  3. Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’07), pp. 273–282 (2007)

  4. Breiman L., Friedman J., Stone C.J., Olshen R.A.: Classification and Regression Trees. Chapman & Hall, London (1984)

    MATH  Google Scholar 

  5. Chen, B.-C., Chen, L., Lin, Y., Ramakrishnan, R.: Prediction cubes. In: Proceeding of the 31th International Conference on Very Large Data Bases (VLDB’05), pp. 982–993 (2005)

  6. Chen, B.-C., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis: predicting global aggregates from local regions. In: Proceeding of the 32nd International Conference on Very Large Data Bases (VLDB’06), pp. 655–666 (2006b)

  7. Chen, B.-C., LeFevre, K., Ramakrishnan, R.: Privacy skyline: privacy with multidimensional adversarial knowledge. In: Proceeding of the 33th International Conference on Very Large Data Bases (VLDB’07). Also, Technical Report 1596, Computer Sciences, University of Wisconsin, Madison (2007)

  8. Chen, B.-C.: Cube-Space Data Mining. Ph.D. Dissertation, Computer Sciences, University of Wisconsin, Madison (2008)

  9. Dalvi, N., Miklau, G., Suciu, D.: Asymptotic conditional probabilities for conjunctive query. In: Proceedings of the 10th International Conference on Database Theory (ICDT’05), pp. 289–305 (2005)

  10. Deutsch, A., Papakonstantinou, Y.: Privacy in database publishing. In: Proceedings of the 10th International Conference on Database Theory (ICDT’05), pp. 230–245 (2005)

  11. Dobra A., Fienberg S.E.: Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation. Stat. J. U. Nations ECE 18, 363–371 (2001)

    Google Scholar 

  12. Dwork, C.: Differential privacy. In: Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP’06), pp. 1–12 (2006a)

  13. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference (TCC’06), pp. 265–284 (2006b)

  14. Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’03), pp. 211–222 (2003)

  15. Gehrke, J., Ramakrishnan, R., Ganti, V.: RainForest—a framework for fast decision tree construction of large datasets. In: Proceedings of 24th International Conference on Very Large Data Bases (VLD4B’98), pp. 416–427 (1998)

  16. Gray J., Chaudhuri S., Bosworth A., Layman A., Reichart D., Venkatrao M.: Data Cube: A relational aggregate operator generalizing group-by, cross-tab, and sub-tables. J. Data Min. Knowl. Dis. 1(1), 29–53 (1997). doi:10.1023/A:1009726021843

    Article  Google Scholar 

  17. Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD’06), pp. 217–228 (2006)

  18. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of ACM SIGMOD International Conference of Management of Data (SIGMOD’05), pp. 49-60 (2005)

  19. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian: Multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), pp. 25 (2006a)

  20. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Workload-aware anonymization. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06), pp. 277–286 (2006b)

  21. LeFevre, K., DeWitt, D.: Scalable Anonymization Algorithms for Large Data Sets. Technical Report 1590, Computer Sciences, University of Wisconsin, Madison (2007)

  22. Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE’07), pp. 106–115 (2007)

  23. Machanavajjhala, A., Gehrke, J.: On the efficiency of checking perfect privacy. In: Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’06), pp. 163–172 (2006a)

  24. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity. In: Privacy Beyond k-Anonymity. Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), pp. 24 (2006b)

  25. Martin, D., Kifer, D., Machanavajjhala, A., Gehrke, J., Halpern, J.: Worst-case background knowledge in privacy. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE’07), pp. 126–135 (2007). (For the extended version that includes the appendix, see “Worst-case background knowledge in privacy”, Computer Science Technical Report, Cornell University, 2006)

  26. Miklau, G., Suciu, D.: A formal analysis of information disclosure in data exchange. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’04), pp. 575–586 (2004)

  27. Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC’07), pp. 75–84 (2007)

  28. Papadimitriou C.M.: Computational complexity. Addison-Wesley, Reading (1994)

    MATH  Google Scholar 

  29. Quinlan J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Menlo Park (1993)

    Google Scholar 

  30. Ramakrishnan R., Chen B.-C.: Exploratory mining in cube space. Data Min. Knowl. Discov. 15(1), 29–54 (2007). doi:10.1007/s10618-007-0063-0

    Article  MathSciNet  Google Scholar 

  31. Samarati, P., Sweeney, L.: Protecting Privacy when Disclosing Information: k-Anonymity and its Enforcement through Generalization and Suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)

  32. Sweeney L.: k-Anonymity: a model for protecting privacy. Int. J. Uncertain Fuzziness Knowledge-based Syst. 10(5), 557–570 (2002). doi:10.1142/S0218488502001648

    Article  MATH  MathSciNet  Google Scholar 

  33. Tao, Y., Xiao, X., Li, J., Zhang, D.: On anti-corruption privacy preserving publication. In: Proceeding of the 24th International Conference on Data Engineering (ICDE’08), pp. 725–734 (2008)

  34. Witten, I., Frank, E.: Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, Menlo Park (2005) (http://www.cs.waikato.ac.nz/ml/weka)

  35. Xiao, X., Tao, Y.: Personalized privacy preservation. In: Proceedings of ACM SIGMOD International Conference of Management of Data (SIGMOD’06), pp. 229–240 (2006a)

  36. Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: Proceeding of the 32nd International Conference on Very Large Data Bases (VLDB’06), pp. 139–150 (2006b)

  37. Yao, C., Wang, X.S., Jajodia, S.: Checking for k-anonymity violation by views. In: Proceeding of the 31st International Conference on Very Large Data Bases (VLDB’05), pp. 910–921 (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bee-Chung Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, BC., LeFevre, K. & Ramakrishnan, R. Adversarial-knowledge dimensions in data privacy. The VLDB Journal 18, 429–467 (2009). https://doi.org/10.1007/s00778-008-0118-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-008-0118-x

Keywords

Navigation