Skip to main content
Log in

COAT: COnstraint-based anonymization of transactions

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Publishing transactional data about individuals in an anonymous form is increasingly required by organizations. Recent approaches ensure that potentially identifying information cannot be used to link published transactions to individuals’ identities. However, these approaches are inadequate to anonymize data that is both protected and practically useful in applications because they incorporate coarse privacy requirements, do not integrate utility requirements, and tend to explore a small portion of the solution space. In this paper, we propose the first approach for anonymizing transactional data under application-specific privacy and utility requirements. We model such requirements as constraints, investigate how these constraints can be specified, and propose COnstraint-based Anonymization of Transactions, an algorithm that anonymizes transactions using a flexible anonymization scheme to meet the specified constraints. Experiments with benchmark datasets verify that COAT significantly outperforms the current state-of-the-art algorithm in terms of data utility, while being comparable in terms of efficiency. Our approach is also shown to be effective in preserving both privacy and utility in a real-world scenario that requires disseminating patients’ information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abowd GD, Dey AK, Brown PJ, Davies N, Smith M, Steggles P (1999) Towards a better understanding of context and context-awareness. In: Proceedings of the 1st international symposium on handheld and ubiquitous computing, pp 304–307

  2. Abul O, Atzori M, Bonchi F, Giannotti F (2007) Hiding sensitive trajectory patterns. In: Proceedings of the 7th IEEE international conference on data mining workshops, pp 693–698

  3. Abul O, Atzori M, Bonchi F, Giannotti F (2007) Hiding sequences. In: Proceedings of the 23rd IEEE international conference on data engineering workshop, pp 147–156

  4. Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Springer, Berlin

    Book  Google Scholar 

  5. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, pp 487–499

  6. Agrawal R, Srikant R (2000) Privacy-preserving data mining. SIGMOD Rec 29(2): 439–450

    Article  Google Scholar 

  7. Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of the 21st IEEE international conference on data engineering, pp 217–228

  8. Bose RPJC, van der Aalst WMP (2009) Context aware trace clustering: Towards improving process mining results. In: Proceedings of the SIAM international conference on data mining, pp 401–412

  9. Büchner AG, Hughes JG, Bell DA (1999) Contextual data and domain knowledge for incorporation in knowledge discovery systems. In: Proceedings of the 2nd international and interdisciplinary conference on modeling and using context, pp 447–450

  10. Cao H, Hu DH, Shen D, Jiang D, Sun J, Chen E, Yang Q (2009) Context-aware query classification. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp 3–10

  11. Cao H, Jiang D, Pei J, He Q, Liao Z, Chen E, Li H (2008) Context-aware query suggestion by mining click-through and session data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 875–883

  12. Chen B, Ramakrishnan R, LeFevre K (2007) Privacy skyline: privacy with multidimensional adversarial knowledge. In: VLDB, pp 770–781

  13. Chen K, Liu L (2005) Privacy preserving data classification with rotation perturbation. In: Proceedings of the 5th IEEE international conference on data mining, pp 589–592

  14. Clifton C (2000) Using sample size to limit exposure to data mining. J Comput Secur 8(4): 281–307

    Google Scholar 

  15. El Emam K, Dankar FK (2008) Protecting privacy using k-anonymity. J Am Med Inform Assoc 15(5): 627–637

    Article  Google Scholar 

  16. Farahat AK, Kamel MS (2009) Document clustering using semantic kernels based on term-term correlations. In: Proceedings of the 9th IEEE international conference on data mining workshops, pp 459–464

  17. Farkas C, Jajodia S (2002) The inference problem: a survey. SIGKDD Explor Newslett 4(2): 6–11

    Article  Google Scholar 

  18. Friedman A, Schuster A, Wolff R (2006) k-anonymous decision tree induction. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases, pp 151–162

  19. Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey on recent developments. ACM Comput Surv (forthcoming)

  20. Ghinita G, Karras P, Kalnis P, Mamoulis N (2009) A framework for efficient data anonymization under privacy and accuracy constraints. ACM Trans Database Syst 34(2)

  21. Ghinita G, Tao Y, Kalnis P (2008) On the anonymization of sparse high-dimensional data. In: Proceedings of the 24th IEEE international conference on data engineering, pp 715–724

  22. Ghinita G, Zhao K, Papadias D, Kalnis P (2010) A reciprocal framework for spatial k-anonymity. Inf Syst 35(3): 299–314

    Article  Google Scholar 

  23. Gkoulalas-Divanis A, Verykios VS (2009) Exact knowledge hiding through database extension. IEEE Trans Knowl Data Eng 21(5): 699–713

    Article  Google Scholar 

  24. Gkoulalas-Divanis A, Verykios VS (2009) Hiding sensitive knowledge without side effects. Knowl Inf Syst 20(3): 263–299

    Article  Google Scholar 

  25. Gkoulalas-Divanis A, Verykios VS, Bozanis P (2009) A network aware privacy model for online requests in trajectory data. Data Knowl Eng 68(4): 431–452

    Article  Google Scholar 

  26. Gkoulalas-Divanis A, Verykios VS (2008) A free terrain model for trajectory k-anonymity. In: Proceedings of the 19th international conference on database and expert systems applications, pp 49–56

  27. Haghighi PD, Zaslavsky A, Krishnaswamy S, Gaber MM, Loke S (2009) Context-aware adaptive data stream mining. Intell Data Anal 13(3): 423–434

    Google Scholar 

  28. He Y, Naughton JF (2009) Anonymization of set-valued data via top-down, local generalization. Proc VLDB Endow 2(1): 934–945

    Google Scholar 

  29. Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288

  30. Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4): 387–414

    Article  Google Scholar 

  31. Kisilevich S, Rokach L, Elovici Y, Shapira B (2010) Efficient multidimensional suppression for k-anonymity. IEEE Trans Knowl Data Eng 22: 334–347

    Article  Google Scholar 

  32. LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 49–60

  33. LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd IEEE international conference on data engineering, p 25

  34. Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE’07, pp 106–115

  35. Liu K, Terzi E (2008) Towards identity anonymization on graphs. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp 93–106

  36. Loukides G, Denny JC, Malin B (2010) The disclosure of diagnosis codes can breach research participants’ privacy. J Am Med Inform Assoc 17: 322–327

    Google Scholar 

  37. Loukides G, Tziatzios A, Shao J (2009) Towards preference-constrained -anonymisation. In: DASFAA international workshop on privacy- preserving data analysis (PPDA), pp 231–245

  38. Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-diversity: privacy beyond k-anonymity. In: Proceedings of the 22nd IEEE international conference on data engineering, p 24

  39. Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R et al (2007) The ncbi dbgap database of genotypes and phenotypes. Nat Genet 39: 1181–1186

    Article  Google Scholar 

  40. Manolio TA, Brooks LD, Collins FS (2008) A hapmap harvest of insights into the genetics of common disease. J Clin Investig 118: 1590–1605

    Article  Google Scholar 

  41. Marsden-Haug N, Foster VB, Gould PL, Elbert E, Wang H, Pavlin JA (2007) Code-based syndromic surveillance for influenzalike illness by international classification of diseases, ninth revision. Emerg Infect Dis 13(2): 207–216

    Article  Google Scholar 

  42. Mohammed N, Fung BCM, Hung PCK, Lee C (2009) Anonymizing healthcare data: a case study on the blood transfusion service. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1285–1294

  43. Moore R (1996) Controlled data swapping techniques for masking public use microdata sets. US Bureau of the Census, Statistical Research Division RR96/04

  44. Moustakides GV, Verykios VS (2006) A max- min approach for hiding frequent itemsets. In: Proceedings of the 6th IEEE international conference on data mining workshops, pp 502–506

  45. Narayanan A, Shmatikov V (2008) Robust de-anonymization of large sparse datasets. In: Proceedings of the 2008 IEEE symposium on security and privacy, pp 111–125

  46. Natwichai J, Li X, Orlowska M (2005) Hiding classification rules for data sharing with privacy preservation. In: Proceedings of 7th international conference on data warehousing and knowledge discovery, pp 468–467

  47. Nergiz ME, Atzori M, Saygin Y (2008) Towards trajectory anonymization: a generalization-based approach. In: Proceedings of the SIGSPATIAL ACM GIS 2008 international workshop on security and privacy in GIS and LBS, pp 52–61

  48. Nergiz ME, Clifton C, Nergiz AE (2009) Multirelational k-anonymity. IEEE Trans Knowl Data Eng 21(8): 1104–1117

    Article  Google Scholar 

  49. Nin J, Herranz J, Torra V (2008) Rethinking rank swapping to decrease disclosure risk. Data Knowl Eng 64(1): 346–364

    Article  Google Scholar 

  50. National Institutes of Health (2003) Final statement on sharing research data. NOT-OD-03-032

  51. Oliveira SRM, Zaïane OR (2003) Protecting sensitive knowledge by data sanitization. In: Proceedings of the 3rd IEEE international conference on data mining, pp 613–616

  52. Pensa RG, Monreale A, Pinelli F, Pedreschi D (2008) Pattern-preserving k-anonymization of sequences and its application to mobility data mining. In: Proceedings of the 1st international workshop on privacy in location-based applications

  53. Punera K, Rajan S, Ghosh J (2006) Automatic construction of n-ary tree based taxonomies. In: Proceedings of the 6th IEEE international conference on data mining workshops, pp 75–79

  54. Samarati P (2001) Protecting respondents identities in microdata release. IEEE Trans Knowl Data Eng 13(9): 1010–1027

    Article  Google Scholar 

  55. Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. SIGMOD Rec 30(4): 45–54

    Article  Google Scholar 

  56. Sharkey P, Tian Hongwei H, Zhang W, Xu S (2008) Privacy-preserving data mining through knowledge model sharing. In: Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD, pp 97–115

  57. Singh S, Vajirkar P, Lee Y (2003) Context-based data mining using ontologies. In: Proceedings of the 22nd international conference on conceptual modeling, pp 405–418

  58. Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceedings of the 21st international conference on very large data bases, pp 407–419

  59. Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the 3rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 67–73

  60. Stead WW, Bates RA, Byrd J, Giuse DA, Miller RA, Shultz EK (2003) Case study: the Vanderbilt University medical center information management architecture

  61. Sun X, Yu PS (2005) A border-based approach for hiding sensitive frequent itemsets. In: Proceedings of the 5th IEEE international conference on data mining, 8 pp

  62. Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10: 557–570

    Article  MathSciNet  MATH  Google Scholar 

  63. Terrovitis M, Mamoulis N, Kalnis P (2008) Privacy-preserving anonymization of set-valued data. Proc VLDB Endow 1(1): 115–125

    Google Scholar 

  64. Terrovitis M, Mamoulis N, Kalnis P (2010) Local and global recoding methods for anonymizing set-valued data. VLDB J (to appear)

  65. Truta TM, Campan A (2010) Avoiding attribute disclosure with (extended) p-sensitive k-anonymity model. Ann Inf Syst J Special Issue Data Mining 8: 353–373

    Google Scholar 

  66. Verykios VS, Gkoulalas-Divanis A (2008) A survey of association rule hiding methods for privacy, chap 11. In: Privacy preserving data mining: models and algorithms. Springer, pp 267–289

  67. Wang L, Liu X (2008) A new model of evaluating concept similarity. Knowl Based Syst 21(8): 842–846

    Article  Google Scholar 

  68. Wong R, Li J, Fu A, Wang K (2006) alpha-k-anonymity: an enhanced k-anonymity model for privacy-preserving data publishing. In: KDD’06, pp 754–759

  69. Xiang L (2009) Context-aware data mining methodology for supply chain finance cooperative systems. In: Proceedings of the 5th international conference on autonomic and autonomous systems, pp 301–306

  70. Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Proceedings of the 32nd international conference on very large data bases, pp 139–150

  71. Xu J, Wang W, Pei J, Wang X, Shi B, Fu AW-C (2006) Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–790

  72. Xu Y, Wang K, Fu AW-C, Yu PS (2008) Anonymizing transaction databases for publication. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 767–775

  73. Sung YY, Liu Y, Xiong H, Ng A (2006) Privacy preservation for data cubes. Knowl Inf Syst 9(1): 38–61

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Grigorios Loukides.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Loukides, G., Gkoulalas-Divanis, A. & Malin, B. COAT: COnstraint-based anonymization of transactions. Knowl Inf Syst 28, 251–282 (2011). https://doi.org/10.1007/s10115-010-0354-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0354-4

Keywords

Navigation