Abstract
Privacy is an important issue in data publishing. Many organizations distribute non-aggregate personal data for research, and they must take steps to ensure that an adversary cannot predict sensitive information pertaining to individuals with high confidence. This problem is further complicated by the fact that, in addition to the published data, the adversary may also have access to other resources (e.g., public records and social networks relating individuals), which we call adversarial knowledge. A robust privacy framework should allow publishing organizations to analyze data privacy by means of not only data dimensions (data that a publishing organization has), but also adversarial-knowledge dimensions (information not in the data). In this paper, we first describe a general framework for reasoning about privacy in the presence of adversarial knowledge. Within this framework, we propose a novel multidimensional approach to quantifying adversarial knowledge. This approach allows the publishing organization to investigate privacy threats and enforce privacy requirements in the presence of various types and amounts of adversarial knowledge. Our main technical contributions include a multidimensional privacy criterion that is more intuitive and flexible than previous approaches to modeling background knowledge. In addition, we identify an important congregation property of the adversarial-knowledge dimensions. Based on this property, we provide algorithms for measuring disclosure and sanitizing data that improve computational efficiency several orders of magnitude over the best known techniques.
Similar content being viewed by others
References
Agrawal R., Ghosh S., Imielinski T., Swami A: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993). doi:10.1109/69.250074
Bacchus F., Grove A.J., Halpern J., Koller D.: From statistical knowledge bases to degrees of belief. Artif. Intell. 87(1–2), 75–143 (1996). doi:10.1016/S0004-3702(96)00003-3
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’07), pp. 273–282 (2007)
Breiman L., Friedman J., Stone C.J., Olshen R.A.: Classification and Regression Trees. Chapman & Hall, London (1984)
Chen, B.-C., Chen, L., Lin, Y., Ramakrishnan, R.: Prediction cubes. In: Proceeding of the 31th International Conference on Very Large Data Bases (VLDB’05), pp. 982–993 (2005)
Chen, B.-C., Ramakrishnan, R., Shavlik, J.W., Tamma, P.: Bellwether analysis: predicting global aggregates from local regions. In: Proceeding of the 32nd International Conference on Very Large Data Bases (VLDB’06), pp. 655–666 (2006b)
Chen, B.-C., LeFevre, K., Ramakrishnan, R.: Privacy skyline: privacy with multidimensional adversarial knowledge. In: Proceeding of the 33th International Conference on Very Large Data Bases (VLDB’07). Also, Technical Report 1596, Computer Sciences, University of Wisconsin, Madison (2007)
Chen, B.-C.: Cube-Space Data Mining. Ph.D. Dissertation, Computer Sciences, University of Wisconsin, Madison (2008)
Dalvi, N., Miklau, G., Suciu, D.: Asymptotic conditional probabilities for conjunctive query. In: Proceedings of the 10th International Conference on Database Theory (ICDT’05), pp. 289–305 (2005)
Deutsch, A., Papakonstantinou, Y.: Privacy in database publishing. In: Proceedings of the 10th International Conference on Database Theory (ICDT’05), pp. 230–245 (2005)
Dobra A., Fienberg S.E.: Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation. Stat. J. U. Nations ECE 18, 363–371 (2001)
Dwork, C.: Differential privacy. In: Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP’06), pp. 1–12 (2006a)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference (TCC’06), pp. 265–284 (2006b)
Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’03), pp. 211–222 (2003)
Gehrke, J., Ramakrishnan, R., Ganti, V.: RainForest—a framework for fast decision tree construction of large datasets. In: Proceedings of 24th International Conference on Very Large Data Bases (VLD4B’98), pp. 416–427 (1998)
Gray J., Chaudhuri S., Bosworth A., Layman A., Reichart D., Venkatrao M.: Data Cube: A relational aggregate operator generalizing group-by, cross-tab, and sub-tables. J. Data Min. Knowl. Dis. 1(1), 29–53 (1997). doi:10.1023/A:1009726021843
Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD’06), pp. 217–228 (2006)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of ACM SIGMOD International Conference of Management of Data (SIGMOD’05), pp. 49-60 (2005)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian: Multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), pp. 25 (2006a)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Workload-aware anonymization. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06), pp. 277–286 (2006b)
LeFevre, K., DeWitt, D.: Scalable Anonymization Algorithms for Large Data Sets. Technical Report 1590, Computer Sciences, University of Wisconsin, Madison (2007)
Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE’07), pp. 106–115 (2007)
Machanavajjhala, A., Gehrke, J.: On the efficiency of checking perfect privacy. In: Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’06), pp. 163–172 (2006a)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity. In: Privacy Beyond k-Anonymity. Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), pp. 24 (2006b)
Martin, D., Kifer, D., Machanavajjhala, A., Gehrke, J., Halpern, J.: Worst-case background knowledge in privacy. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE’07), pp. 126–135 (2007). (For the extended version that includes the appendix, see “Worst-case background knowledge in privacy”, Computer Science Technical Report, Cornell University, 2006)
Miklau, G., Suciu, D.: A formal analysis of information disclosure in data exchange. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’04), pp. 575–586 (2004)
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC’07), pp. 75–84 (2007)
Papadimitriou C.M.: Computational complexity. Addison-Wesley, Reading (1994)
Quinlan J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Menlo Park (1993)
Ramakrishnan R., Chen B.-C.: Exploratory mining in cube space. Data Min. Knowl. Discov. 15(1), 29–54 (2007). doi:10.1007/s10618-007-0063-0
Samarati, P., Sweeney, L.: Protecting Privacy when Disclosing Information: k-Anonymity and its Enforcement through Generalization and Suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)
Sweeney L.: k-Anonymity: a model for protecting privacy. Int. J. Uncertain Fuzziness Knowledge-based Syst. 10(5), 557–570 (2002). doi:10.1142/S0218488502001648
Tao, Y., Xiao, X., Li, J., Zhang, D.: On anti-corruption privacy preserving publication. In: Proceeding of the 24th International Conference on Data Engineering (ICDE’08), pp. 725–734 (2008)
Witten, I., Frank, E.: Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, Menlo Park (2005) (http://www.cs.waikato.ac.nz/ml/weka)
Xiao, X., Tao, Y.: Personalized privacy preservation. In: Proceedings of ACM SIGMOD International Conference of Management of Data (SIGMOD’06), pp. 229–240 (2006a)
Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: Proceeding of the 32nd International Conference on Very Large Data Bases (VLDB’06), pp. 139–150 (2006b)
Yao, C., Wang, X.S., Jajodia, S.: Checking for k-anonymity violation by views. In: Proceeding of the 31st International Conference on Very Large Data Bases (VLDB’05), pp. 910–921 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, BC., LeFevre, K. & Ramakrishnan, R. Adversarial-knowledge dimensions in data privacy. The VLDB Journal 18, 429–467 (2009). https://doi.org/10.1007/s00778-008-0118-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-008-0118-x