Abstract
A particularly challenging problem for data anonymization is dealing with transactional data. Most anonymization methods assume homogeneous, independent and identically distributed (i.i.d.) data; “flattening” transactional data to satisfy this model results in wide, sparse data that does not anonymize well with traditional techniques. While there have been some approaches for generalization-based anonymization, bucketization techniques (e.g., anatomy) pose new challenges. In particular, bucketization provides the opportunity to learn correlations between data items, but also a risk of identifying individuals because of dependencies inferred from such correlations. We present a method that balances these issues, retaining the ability to discover correlations in the data, while hiding dependencies that would enable correlations to be used to link specific values to individuals. We introduce a correlation anonymization constraint that ensures correlations do not allow data to be linked to a specific individual, and an elastic safe grouping algorithm that meets this constraint while preserving data correlations. We evaluate the utility loss on a transactional rental dataset.
Similar content being viewed by others
Notes
We assume User ID and Vin Number are independent and identically distributed (i.i.d.).
Some data is anonymized/suppressed in order to meet the constraint; this is in keeping with privacy models that uses partial suppression by replacing individual’s values with a * to preserve privacy as in [34, 42] or encryption as in the model in [36] where some data is left encrypted, and only “safe” data is revealed.
References
Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Li Z (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp 308–318
Aldous DJ (1985) Exchangeability and related topics. In: École d’été de probabilités de Saint-Flour, XIII—1983, volume 1117 of Lecture Notes in Math. Springer, Berlin, pp 1–198
Andrés ME, Bordenabe NE, Chatzikokolakis K, Palamidessi C (2013) Geo-indistinguishability: differential privacy for location-based systems. In: Proceedings of the 2013 ACM SIGSAC conference on computer and communications security, pp 901–914
Anjum A, Raschia G (2017) Banga: an efficient and flexible generalization-based algorithm for privacy preserving data publication. Computers 6(1):1
Biskup J, PreuB M, Wiese L (2011) On the inference-proofness of database fragmentation satisfying confidentiality constraints. In: Proceedings of the 14th information security conference, Xian, China, oct 26–29
Bouna BA, Clifton C, Malluhi QM (2013) Using safety constraint for transactional dataset anonymization. In: DBSec, pp 164–178
Bouna BA, Clifton C, Malluhi QM (2015) Efficient sanitization of unsafe data correlations. In: Proceedings of the workshops of the EDBT/ICDT 2015 joint conference (EDBT/ICDT), Brussels, Belgium, March 27th, 2015, pp 278–285
Bouna B, Clifton C, Malluhi QM (2015) Anonymizing transactional datasets. J Comput Secur 23(1):89–106
Centers for Medicare & Medicaid Services (1996) The Health Insurance Portability and Accountability Act of 1996 (HIPAA). http://www.cms.hhs.gov/hipaa/
Chicha E, Bouna BA, Nassar M, Chbeir R (2018) Cloud-based differentially private image classification. Wirel Netw 99:1–8
Chicha E, Bouna BA, Nassar M, Chbeir R, Haraty RA, Oussalah M, Benslimane D, Alraja MN (2021) A user-centric mechanism for sequentially releasing graph datasets under blowfish privacy. ACM Trans Internet Technol TOIT 21(1):1–25
Ciriani V, De Vimercati Sabrina CD, Foresti S, Jajodia S, Paraboschi S, Samarati P (2010) Combining fragmentation and encryption to protect privacy in data storage. ACM Trans Inf Syst Secur 13:22:1-22:33
Dai C, Ghinita G, Bertino E, Byun J-W, Li N (2009) Tiamat: a tool for interactive analysis of microdata anonymization techniques. Proc VLDB Endow 2(2):1618–1621
Domingo-Ferrer J, Soria-Comas J (2015) From t-closeness to differential privacy and vice versa in data anonymization. Knowl Based Syst 74:151–158
Dwork C (2008) Differential privacy: a survey of results. In: International conference on theory and applications of models of computation. Springer, pp. 1–19
Dwork C, Roth A et al (2014) The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci 9(3–4):211–407
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Proceedings of the third conference on theory of cryptography, TCC’06. Springer, Berlin, Heidelberg, pp 265–284
Fatemeh ANY, Azadeh S (2018) Bottom-up sequential anonymization in the presence of adversary knowledge. Inf Sci 405:316–335
Friedman A, Schuster A (2010) Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 493–502
Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv (Csur) 42(4):1–53
Gong Q, Luo J, Yang M, Ni W, Xo-B Li (2017) Anonymizing 1:m microdata with high utility. Knowl Based Syst 115(Supplement(Supplement C):15–26
Gong Q, Yang M, Chen Z, Wenjia W, Luo J (2017) A framework for utility enhanced incomplete microdata anonymization. Clust Comput 20(2):1749–1764
He X, Machanavajjhala A, Ding B (2014) Blowfish privacy: tuning privacy-utility trade-offs using policies. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. ACM, pp 1447–1458
Hundepool A, Willenborg LCRJ (1996) \(\mu \)-and \(\tau \)-argus: software for statistical disclosure control. In: Third international seminar on statistical confidentiality
Kantarcioglu M, Inan A, Kuzu M (2010) Anonymization toolbox
Kifer D (2009) Attacks on privacy and Definetti’s theorem. In: SIGMOD conference, pp 127–138
LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp 49–60
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Workload-aware anonymization. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 277–286
Li T, Li N, Zhang J, Molloy I (2012) Slicing: a new approach for privacy preserving data publishing. IEEE Trans Knowl Data Eng 24(3):561–574
Li B, Liu Y, Han X, Zhang J (2018) Cross-bucket generalization for information and privacy preservation. IEEE Trans Knowl Data Eng 30(3):449–459
Li T, Li N (2008) Injector: mining background knowledge for data anonymization. In: tICDE, pp 446–455
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE, pp 106–115
Li N, Qardaji W, Su D (2012) On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In: Proceedings of the 7th ACM symposium on information, computer and communications security, ASIACCS ’12. ACM, New York, pp 32–33
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) \(l\)-diversity: privacy beyond \(k\)-anonymity. In: Proceedings of the 22nd IEEE international conference on data engineering (ICDE 2006), Atlanta Georgia
Nassar M, Chicha E, Bouna BA, Chbeir R (2020) Vip blowfish privacy in communication graphs. In: ICETE (2), pp 459–467
Nergiz AE, Clifton C (2011) Query processing in private data outsourcing using anonymization. In: The 25th IFIP WG 11.3 conference on data and applications security and privacy (DBSEC-11), Richmond, Virginia
Nergiz ME, Atzori M, Clifton C (2007) Hiding the presence of individuals from shared databases. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 665–676
Prasser F, Kohlmayer F, Lautenschläger R, Kuhn KA (2014) Arx-a comprehensive tool for anonymizing biomedical data. In: AMIA annual symposium proceedings, vol 2014. American Medical Informatics Association, p 984
Ressel P (1985) De Finetti-type theorems: an analytical approach. Ann Probab 13(3):898–922
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Soria-Comas J, Domingo-Ferrer J (2013) Differential privacy via t-closeness in data publishing. In: Eleventh annual international conference on privacy, security and trust, PST 2013, 10–12 July, 2013, Tarragona, Catalonia, Spain, July 10–12, 2013, pp 27–35
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzzin Knowl Based Syst 10(5):557–570
Wang H, Liu R (2015) Hiding outliers into crowd: privacy-preserving data publishing with outliers. Data Knowl Eng 100(Part A):94–115
Wang K, Wang P, Fu Ada W, Wong RC-W (2016) Generalized bucketization scheme for flexible privacy settings. Inf Sci 348:377–393
Wong RC-W, Fu Ada W-C, Wang K, Yu PS, Pei J (2011) Can the utility of anonymized data be used for privacy breaches? ACM Trans Knowl Discov Data 5(3):16:1-16:24
Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Proceedings of 32nd international conference on very large data bases (VLDB 2006), Seoul, Korea
Acknowledgements
The authors would like to acknowledge the National Council for Scientific Research of Lebanon (CNRS-L) and Univ. Pau & Pays Adour, UPPA-E2S, LIUPPA, for granting a doctoral fellowship to Elie Chicha.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chicha, E., Al Bouna, B., Wünsche, K. et al. Exposing safe correlations in transactional datasets. SOCA 15, 289–307 (2021). https://doi.org/10.1007/s11761-021-00325-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11761-021-00325-1