skip to main content
10.1145/1971690.1971695acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmobicaseConference Proceedingsconference-collections
research-article

PCTA: privacy-constrained clustering-based transaction data anonymization

Published:25 March 2011Publication History

ABSTRACT

Transaction data about individuals are increasingly collected to support a plethora of applications, spanning from marketing to biomedical studies. Publishing these data is required by many organizations, but may result in privacy breaches, if an attacker exploits potentially identifying information to link individuals to their records in the published data. Algorithms that prevent this threat by transforming transaction data prior to their release have been proposed recently, but incur significant information loss due to their inability to accommodate a range of different privacy requirements that data owners often have. To address this issue, we propose a novel clustering-based framework to anonymizing transaction data. Our framework provides the basis for designing algorithms that explore a larger solution space than existing methods, which allows publishing data with less information loss, and can satisfy a wide range of privacy requirements. Based on this framework, we develop PCTA, a generalization-based algorithm to construct anonymizations that incur a small amount of information loss under many different privacy requirements. Experiments with benchmark datasets verify that PCTA significantly outperforms the current state-of-the-art algorithms in terms of data utility, while being comparable in terms of efficiency.

References

  1. National Institutes of Health. Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies. NOT-OD-07-088. 2007.Google ScholarGoogle Scholar
  2. Health insurance portability and accountability act of 1996 united states public law.Google ScholarGoogle Scholar
  3. R. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, pages 217--228, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Byun, A. Kamra, E. Bertino, and N. Li. Efficient k-anonymity using clustering technique. In DASFAA, pages 188--200, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Cao, P. Karras, C. Raïssi, and K. Tan. rho-uncertainty: Inference-proof transaction anonymization. PVLDB. 3(1):1033--1044, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C.-C. Chang, B. Thompson, H. Wang, and D. Yao. Towards publishing recommendation data with predictive anonymization. In 5th ACM Symposium on Information, Computer and Communications Security, pages 24--35, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Chen, D. Kifer, K. LeFevre, and A. Machanavajjhala. Privacy-preserving data publishing. Found. Trends databases, 2(1--2):1--167, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Domingo-Ferrer and V. Torra. Ordinal, continuous and heterogeneous k-anonymity through microaggregation. DMKD, 11(2):195--212, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey on recent developments. ACM Comput. Surv., 42, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In ICDE, pages 205--216, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Ghinita, Y. Tao, and P. Kalnis. On the anonymization of sparse high-dimensional data. In ICDE, pages 715--724, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Gkoulalas-Divanis and V. Verykios. A free terrain model for trajectory k-anonymity. In DEXA, pages 49--56, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gkoulalas-Divanis and V. S. Verykios. Privacy in Trajectory Data, chapter 11, pages 199--212. Social Implications of Data Mining and Information Privacy: Interdisciplinary Frameworks and Solutions. Information Science Reference, 2008.Google ScholarGoogle Scholar
  14. Y. He and J. F. Naughton. Anonymization of set-valued data via top-down, local generalization. PVLDB, 2(1):934--945, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. S. Iyengar. Transforming data to satisfy privacy constraints. In KDD, pages 279--288, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Jha, L. Kruger, and P. McDaniel. Privacy preserving clustering. In ESORICS, pages 397--417, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Kisilevich, L. Rokach, Y. Elovici, and B. Shapira. Efficient multidimensional suppression for k-anonymity. TKDE, 22:334--347, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. LeFevre, D. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, page 25, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Li, R. Wong, A. Fu, and J. Pei. Achieving -anonymity by clustering in attribute hierarchical structures. In DaWaK, pages 405--416, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Liu and E. Terzi. Towards identity anonymization on graphs. In 2008 SIGMOD, pages 93--106, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Loukides, A. Gkoulalas-Divanis, and B. Malin. COAT: COnstraint-based Anonymization of Transactions. KAIS. To Appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Loukides, A. Gkoulalas-Divanis, and B. Malin. Anonymization of electronic medical records for validating genome-wide association studies. PNAS, 17:7898--7903, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  23. G. Loukides, A. Gkoulalas-Divanis, and J. Shao. Anonymizing transaction data to eliminate sensitive inferences. In DEXA, pages 400--415, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Loukides and J. Shao. Capturing data usefulness and privacy protection in k-anonymisation. In SAC, pages 370--374, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE S&P, pages 111--125, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. E. Nergiz and C. Clifton. Thoughts on k-anonymization. DKE, 63(3):622--645, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. D. of State Health Services. User manual of texas hospital inpatient discharge public use data file. http://www.dshs.state.tx.us/THCIC/, 2008.Google ScholarGoogle Scholar
  28. R. G. Pensa, A. Monreale, F. Pinelli, and D. Pedreschi. Pattern-preserving k-anonymization of sequences and its application to mobility data mining. In Workshop on Privacy in Location-Based Applications, 2008.Google ScholarGoogle Scholar
  29. S. J. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In VLDB, pages 682--693, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Samarati. Protecting respondents identities in microdata release. TKDE, 13(9):1010--1027, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Sweeney. k-anonymity: a model for protecting privacy. IJUFKS, 10:557--570, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Terrovitis, N. Mamoulis, and P. Kalnis. Local and global recoding methods for anonymizing set-valued data. VLDB J. To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy-preserving anonymization of set-valued data. PVLDB, 1(1):115--125, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. V. S. Verykios, M. L. Damiani, and A. Gkoulalas-Divanis. Privacy and Security in Spatiotemporal Data and Trajectories, chapter 8, pages 213--240. Mobility, Data Mining and Privacy: Geographic Knowledge Discovery. Springer, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  35. J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. W.-C. Fu. Utility-based anonymization using local recoding. In KDD, pages 785--790, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Xu and D. C. Wunsch. Clustering. Wiley-IEEE Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Xu, K. Wang, A. W.-C. Fu, and P. S. Yu. Anonymizing transaction databases for publication. In KDD, pages 767--775, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In KDD, pages 401--406, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PCTA: privacy-constrained clustering-based transaction data anonymization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          PAIS '11: Proceedings of the 4th International Workshop on Privacy and Anonymity in the Information Society
          March 2011
          62 pages
          ISBN:9781450306119
          DOI:10.1145/1971690

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 March 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader