Abstract
Data De-identification and Differential Privacy are two possible approaches for providing data security and user privacy. Data de-identification is the process where the personal identifiable information of individuals is extracted to create anonymized databases. Data de-identification has been used for quite some time in industry to sanitize data before it is outsourced for data-mining purposes. Differential privacy attempts to protect sensitive data by adding an appropriate level of noise to the output of a query or to the primary database so that the presence or the absence of a single piece of information will not significantly alter the query output. Recent work in the literature has highlighted the risk of re-identification of information in a de-identified data set. In this paper, we provide a comprehensive comparison of these two privacy-preserving strategies. Our results show that the differentially private trained models produce highly accurate data, while preserving data privacy, making them a reliable alternative to the data de-identification models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
Dwork, C., Pottenger, R.: Toward practicing privacy. J. Am. Med. Inform. Assoc. 20(1), 102–108 (2013)
Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Holohan, N.: Welcome to the IBM differential privacy library. https://diffprivlib.readthedocs.io/en/latest/. Accessed 21 Dec 2019
Information, of Ontario, P.C.: De-identification guidelines for structured data. White Paper, pp. 1–28 (2016)
Jain, P., Gyanchandani, M., Khare, N.: Differential privacy: its technological prescriptive using big data. J. Big Data 5(1), 1–24 (2018). https://doi.org/10.1186/s40537-018-0124-9
Kim, J., Winkler, W.: Multiplicative noise for masking continuous data. Statistics 1, 9 (2003)
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), pp. 94–103. IEEE (2007)
McSherry, F.D.: Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 19–30. ACM (2009)
Mohan, P., Thakurta, A., Shi, E., Song, D., Culler, D.: GUPT: privacy preserving data analysis made easy. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 349–360. ACM (2012)
Office, N.: MIT and Harvard release de-identified learning data from open online courses. http://news.mit.edu/2014/mit-and-harvard-release-de-identified-learning-data-open-online-courses. Accessed 15 May 2019
Organization, H.: Hitrust de-identification framework. https://hitrustalliance.net/de-identification/. Accessed 30 Jan 2020
Registry, B.O., Spafford, O.: Requesting data. https://www.bornontario.ca/en/data/requesting-data.aspx. Accessed 12 Aug 2019
Rocher, L., Hendrickx, J.M., De Montjoye, Y.A.: Estimating the success of re-identifications in incomplete datasets using generative models. Nat. Commun. 10(1), 1–9 (2019)
Roy, I., Setty, S.T., Kilzer, A., Shmatikov, V., Witchel, E.: Airavat: Security and privacy for mapreduce. In: Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2010, vol. 10, pp. 297–312 (2010)
Sarathy, R., Muralidhar, K.: Evaluating laplace noise addition to satisfy differential privacy for numeric data. Trans. Data Priv. 4(1), 1–17 (2011)
Spafford, K.: Will my personal information be safe? http://cpcssn.ca/faq-posts/will-my-personal-information-be-safe/. Accessed 10 Aug 2019
UCI: Center for machine learning and intelligent systems. https://cml.ics.uci.edu/. Accessed Aug 2020
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Rashid, F., Miri, A. (2020). An Emerging Strategy for Privacy Preserving Databases: Differential Privacy. In: Moallem, A. (eds) HCI for Cybersecurity, Privacy and Trust. HCII 2020. Lecture Notes in Computer Science(), vol 12210. Springer, Cham. https://doi.org/10.1007/978-3-030-50309-3_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-50309-3_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50308-6
Online ISBN: 978-3-030-50309-3
eBook Packages: Computer ScienceComputer Science (R0)