Abstract
Private data often come in the form of associations between entities, such as customers and products bought from a pharmacy, which are naturally represented in the form of a large, sparse bipartite graph. As with tabular data, it is desirable to be able to publish anonymized versions of such data, to allow others to perform ad hoc analysis of aggregate graph properties. However, existing tabular anonymization techniques do not give useful or meaningful results when applied to graphs: small changes or masking of the edge structure can radically change aggregate graph properties. We introduce a new family of anonymizations for bipartite graph data, called (k, ℓ)-groupings. These groupings preserve the underlying graph structure perfectly, and instead anonymize the mapping from entities to nodes of the graph. We identify a class of “safe” (k, ℓ)-groupings that have provable guarantees to resist a variety of attacks, and show how to find such safe groupings. We perform experiments on real bipartite graph data to study the utility of the anonymized version, and the impact of publishing alternate groupings of the same graph data. Our experiments demonstrate that (k, ℓ)-groupings offer strong tradeoffs between privacy and utility.
Similar content being viewed by others
References
Backstrom, L., Dwork, C., Kleinberg, J.: Wherefore are thou R3579X? Anonymized social networks, hidden patterns and structural steganography. In: International Conference on World Wide Web (WWW) (2007)
Bennett, J., Lanning, S.: The Netflix prize. In: KDDCup Workshop (2007)
Bhagat, S., Cormode, G., Krishnamurthy, B., Srivastava, D.: Class-based graph anonymization for social network data. In: International Conference on Very Large Data Bases (2009)
Campan, A., Truta, T.M.: A clustering approach for data and structural anonymity in social networks. In: International Workshop on Privacy, Security and Trust in KDD (PinKDD) (2008)
Garey M.R., Johnson D.S. (1979) Computers and Intractability, a Guide to the Theory of NP-Completeness. W.H. Freeman and Company, San Francisco
Ghinita, G., Tao, Y., Kalnis, P.: On the anonymization of sparse high-dimensional data. In: IEEE International Conference on Data Engineering (2008)
Hay, M., Jensen, D., Miklau, G., Towsley, D., Weis, P.: Resisting structural re-identification in anonymized social networks. In: International Conference on Very Large Data Bases (2008)
Hay, M., Miklau, G., Jensen, D., Weis, P., Srivastava, S.: Anonymizing social networks. Technical Report 07-19, University of Massachusetts Amherst (2007)
Korolova, A., Motwani, R., Nabar, S., Xu, Y.: Link privacy in social networks. In: ACM Conference on Information and Knowledge Management (CIKM) (2008)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: IEEE International Conference on Data Engineering (2007)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity: Privacy beyond k-anonymity. In: IEEE International Conference on Data Engineering (2006)
Martin, D.J., Kifer, D., Machanavajjhala, A., Gehrke, J.: Worse-case background knowledge for privacy-preserving data publishing. In: IEEE International Conference on Data Engineering (2007)
Narayanan, A., Shmatikov, V.: How to break anonymity of the Netflix prize dataset. Technical Report arXiv:cs/0610105v1, arXiv (2006)
Nergiz, M.E., Clifton, C., Nergiz, A.E.: Multirelational k-anonymity. In: IEEE International Conference on Data Engineering (2007)
Samarati P. (2001) Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6): 1010–1027
Sweeney L. (2002) k-Anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5): 557–570
Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. In: International Conference on Very Large Data Bases (2008)
Wong, R., Li, J., Fu, A., Wang, K.: (α, k)-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In: ACM SIGKDD (2006)
Wong, R.C.-W., Fu, A.W.-C., Wang, K., Pei, J.: Minimality attack in privacy preserving data publishing. In: International Conference on Very Large Data Bases (2007)
Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: International Conference on Very Large Data Bases (2006)
Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: ACM SIGMOD International Conference on Management of Data (2007)
Xu, Y., Wang, K., Fu, A.W.-C., Yu, P.S.: Anonymizing transaction databases for publication. In: ACM SIGKDD (2008)
Zhang, Q., Koudas, N., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: IEEE International Conference on Data Engineering (2007)
Zheleva, E., Getoor, L.: Preserving the privacy of sensitive relationships in graph data. In: International Workshop on Privacy, Security and Trust in KDD (PinKDD) (2007)
Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: IEEE International Conference on Data Engineering (2008)
Author information
Authors and Affiliations
Corresponding author
Additional information
T. Yu and Q. Zhang were partially sponsored by the NSF through grants IIS-0430166 and CNS-0747247.
Rights and permissions
About this article
Cite this article
Cormode, G., Srivastava, D., Yu, T. et al. Anonymizing bipartite graph data using safe groupings. The VLDB Journal 19, 115–139 (2010). https://doi.org/10.1007/s00778-009-0167-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-009-0167-9