Exposing safe correlations in transactional datasets

Chicha, Elie; Al Bouna, Bechara; Wünsche, Kay; Chbeir, Richard

doi:10.1007/s11761-021-00325-1

Exposing safe correlations in transactional datasets

Special Issue Paper
Published: 21 August 2021

Volume 15, pages 289–307, (2021)
Cite this article

Service Oriented Computing and Applications Aims and scope Submit manuscript

Elie Chicha ORCID: orcid.org/0000-0003-4801-7293^1,2,
Bechara Al Bouna²,
Kay Wünsche³ &
…
Richard Chbeir¹

146 Accesses
1 Citation
Explore all metrics

Abstract

A particularly challenging problem for data anonymization is dealing with transactional data. Most anonymization methods assume homogeneous, independent and identically distributed (i.i.d.) data; “flattening” transactional data to satisfy this model results in wide, sparse data that does not anonymize well with traditional techniques. While there have been some approaches for generalization-based anonymization, bucketization techniques (e.g., anatomy) pose new challenges. In particular, bucketization provides the opportunity to learn correlations between data items, but also a risk of identifying individuals because of dependencies inferred from such correlations. We present a method that balances these issues, retaining the ability to discover correlations in the data, while hiding dependencies that would enable correlations to be used to link specific values to individuals. We introduce a correlation anonymization constraint that ensures correlations do not allow data to be linked to a specific individual, and an elastic safe grouping algorithm that meets this constraint while preserving data correlations. We evaluate the utility loss on a transactional rental dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

(k, l)-Clustering for Transactional Data Streams Anonymization

Using Safety Constraint for Transactional Dataset Anonymization

Utility Aware Clustering for Publishing Transactional Data

Notes

We assume User ID and Vin Number are independent and identically distributed (i.i.d.).
Some data is anonymized/suppressed in order to meet the constraint; this is in keeping with privacy models that uses partial suppression by replacing individual’s values with a * to preserve privacy as in [34, 42] or encryption as in the model in [36] where some data is left encrypted, and only “safe” data is revealed.
https://github.com/ElieChicha/ESC/blob/main/sourcedata.xlsx.

References

Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Li Z (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp 308–318
Aldous DJ (1985) Exchangeability and related topics. In: École d’été de probabilités de Saint-Flour, XIII—1983, volume 1117 of Lecture Notes in Math. Springer, Berlin, pp 1–198
Andrés ME, Bordenabe NE, Chatzikokolakis K, Palamidessi C (2013) Geo-indistinguishability: differential privacy for location-based systems. In: Proceedings of the 2013 ACM SIGSAC conference on computer and communications security, pp 901–914
Anjum A, Raschia G (2017) Banga: an efficient and flexible generalization-based algorithm for privacy preserving data publication. Computers 6(1):1
Article Google Scholar
Biskup J, PreuB M, Wiese L (2011) On the inference-proofness of database fragmentation satisfying confidentiality constraints. In: Proceedings of the 14th information security conference, Xian, China, oct 26–29
Bouna BA, Clifton C, Malluhi QM (2013) Using safety constraint for transactional dataset anonymization. In: DBSec, pp 164–178
Bouna BA, Clifton C, Malluhi QM (2015) Efficient sanitization of unsafe data correlations. In: Proceedings of the workshops of the EDBT/ICDT 2015 joint conference (EDBT/ICDT), Brussels, Belgium, March 27th, 2015, pp 278–285
Bouna B, Clifton C, Malluhi QM (2015) Anonymizing transactional datasets. J Comput Secur 23(1):89–106
Article Google Scholar
Centers for Medicare & Medicaid Services (1996) The Health Insurance Portability and Accountability Act of 1996 (HIPAA). http://www.cms.hhs.gov/hipaa/
Chicha E, Bouna BA, Nassar M, Chbeir R (2018) Cloud-based differentially private image classification. Wirel Netw 99:1–8
Google Scholar
Chicha E, Bouna BA, Nassar M, Chbeir R, Haraty RA, Oussalah M, Benslimane D, Alraja MN (2021) A user-centric mechanism for sequentially releasing graph datasets under blowfish privacy. ACM Trans Internet Technol TOIT 21(1):1–25
Article Google Scholar
Ciriani V, De Vimercati Sabrina CD, Foresti S, Jajodia S, Paraboschi S, Samarati P (2010) Combining fragmentation and encryption to protect privacy in data storage. ACM Trans Inf Syst Secur 13:22:1-22:33
Article Google Scholar
Dai C, Ghinita G, Bertino E, Byun J-W, Li N (2009) Tiamat: a tool for interactive analysis of microdata anonymization techniques. Proc VLDB Endow 2(2):1618–1621
Article Google Scholar
Domingo-Ferrer J, Soria-Comas J (2015) From t-closeness to differential privacy and vice versa in data anonymization. Knowl Based Syst 74:151–158
Article Google Scholar
Dwork C (2008) Differential privacy: a survey of results. In: International conference on theory and applications of models of computation. Springer, pp. 1–19
Dwork C, Roth A et al (2014) The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci 9(3–4):211–407
MathSciNet MATH Google Scholar
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Proceedings of the third conference on theory of cryptography, TCC’06. Springer, Berlin, Heidelberg, pp 265–284
Fatemeh ANY, Azadeh S (2018) Bottom-up sequential anonymization in the presence of adversary knowledge. Inf Sci 405:316–335
MathSciNet Google Scholar
Friedman A, Schuster A (2010) Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 493–502
Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv (Csur) 42(4):1–53
Article Google Scholar
Gong Q, Luo J, Yang M, Ni W, Xo-B Li (2017) Anonymizing 1:m microdata with high utility. Knowl Based Syst 115(Supplement(Supplement C):15–26
Article Google Scholar
Gong Q, Yang M, Chen Z, Wenjia W, Luo J (2017) A framework for utility enhanced incomplete microdata anonymization. Clust Comput 20(2):1749–1764
Article Google Scholar
He X, Machanavajjhala A, Ding B (2014) Blowfish privacy: tuning privacy-utility trade-offs using policies. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. ACM, pp 1447–1458
Hundepool A, Willenborg LCRJ (1996) \(\mu \)-and \(\tau \)-argus: software for statistical disclosure control. In: Third international seminar on statistical confidentiality
Kantarcioglu M, Inan A, Kuzu M (2010) Anonymization toolbox
Kifer D (2009) Attacks on privacy and Definetti’s theorem. In: SIGMOD conference, pp 127–138
LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp 49–60
LeFevre K, DeWitt DJ, Ramakrishnan R (2006) Workload-aware anonymization. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 277–286
Li T, Li N, Zhang J, Molloy I (2012) Slicing: a new approach for privacy preserving data publishing. IEEE Trans Knowl Data Eng 24(3):561–574
Article Google Scholar
Li B, Liu Y, Han X, Zhang J (2018) Cross-bucket generalization for information and privacy preservation. IEEE Trans Knowl Data Eng 30(3):449–459
Article Google Scholar
Li T, Li N (2008) Injector: mining background knowledge for data anonymization. In: tICDE, pp 446–455
Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE, pp 106–115
Li N, Qardaji W, Su D (2012) On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In: Proceedings of the 7th ACM symposium on information, computer and communications security, ASIACCS ’12. ACM, New York, pp 32–33
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) \(l\)-diversity: privacy beyond \(k\)-anonymity. In: Proceedings of the 22nd IEEE international conference on data engineering (ICDE 2006), Atlanta Georgia
Nassar M, Chicha E, Bouna BA, Chbeir R (2020) Vip blowfish privacy in communication graphs. In: ICETE (2), pp 459–467
Nergiz AE, Clifton C (2011) Query processing in private data outsourcing using anonymization. In: The 25th IFIP WG 11.3 conference on data and applications security and privacy (DBSEC-11), Richmond, Virginia
Nergiz ME, Atzori M, Clifton C (2007) Hiding the presence of individuals from shared databases. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 665–676
Prasser F, Kohlmayer F, Lautenschläger R, Kuhn KA (2014) Arx-a comprehensive tool for anonymizing biomedical data. In: AMIA annual symposium proceedings, vol 2014. American Medical Informatics Association, p 984
Ressel P (1985) De Finetti-type theorems: an analytical approach. Ann Probab 13(3):898–922
Article MathSciNet Google Scholar
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Soria-Comas J, Domingo-Ferrer J (2013) Differential privacy via t-closeness in data publishing. In: Eleventh annual international conference on privacy, security and trust, PST 2013, 10–12 July, 2013, Tarragona, Catalonia, Spain, July 10–12, 2013, pp 27–35
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzzin Knowl Based Syst 10(5):557–570
Article MathSciNet Google Scholar
Wang H, Liu R (2015) Hiding outliers into crowd: privacy-preserving data publishing with outliers. Data Knowl Eng 100(Part A):94–115
Article Google Scholar
Wang K, Wang P, Fu Ada W, Wong RC-W (2016) Generalized bucketization scheme for flexible privacy settings. Inf Sci 348:377–393
Article MathSciNet Google Scholar
Wong RC-W, Fu Ada W-C, Wang K, Yu PS, Pei J (2011) Can the utility of anonymized data be used for privacy breaches? ACM Trans Knowl Discov Data 5(3):16:1-16:24
Article Google Scholar
Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Proceedings of 32nd international conference on very large data bases (VLDB 2006), Seoul, Korea

Download references

Acknowledgements

The authors would like to acknowledge the National Council for Scientific Research of Lebanon (CNRS-L) and Univ. Pau & Pays Adour, UPPA-E2S, LIUPPA, for granting a doctoral fellowship to Elie Chicha.

Author information

Authors and Affiliations

LIUPPA, Universite Pau and Pays Adour, E2S UPPA, Anglet, France
Elie Chicha & Richard Chbeir
TICKET Lab., Antonine University, Hadat-Baabda, Lebanon
Elie Chicha & Bechara Al Bouna
Technische Universität Dresden, Dresden, Germany
Kay Wünsche

Authors

Elie Chicha
View author publications
You can also search for this author in PubMed Google Scholar
Bechara Al Bouna
View author publications
You can also search for this author in PubMed Google Scholar
Kay Wünsche
View author publications
You can also search for this author in PubMed Google Scholar
Richard Chbeir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elie Chicha.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chicha, E., Al Bouna, B., Wünsche, K. et al. Exposing safe correlations in transactional datasets. SOCA 15, 289–307 (2021). https://doi.org/10.1007/s11761-021-00325-1

Download citation

Received: 17 February 2021
Revised: 16 July 2021
Accepted: 02 August 2021
Published: 21 August 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11761-021-00325-1

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exposing safe correlations in transactional datasets

Abstract

Access this article

Similar content being viewed by others

(k, l)-Clustering for Transactional Data Streams Anonymization

Using Safety Constraint for Transactional Dataset Anonymization

Utility Aware Clustering for Publishing Transactional Data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Navigation

Exposing safe correlations in transactional datasets

Abstract

Access this article

Similar content being viewed by others

(k, l)-Clustering for Transactional Data Streams Anonymization

Using Safety Constraint for Transactional Dataset Anonymization

Utility Aware Clustering for Publishing Transactional Data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation