Privacy-Preserving Healthcare Data Modeling Based on Sensitivity and Utility

Saha, Sayantani; Mallick, Shuchismita; Neogy, Sarmistha

doi:10.1007/s42979-022-01372-x

Privacy-Preserving Healthcare Data Modeling Based on Sensitivity and Utility

Original Research
Published: 15 September 2022

Volume 3, article number 482, (2022)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Sayantani Saha¹,
Shuchismita Mallick¹ &
Sarmistha Neogy²

179 Accesses
1 Citation
Explore all metrics

Abstract

Huge amount of data is produced and processed in recent data-centric applications. Secure management as well as maintaining privacy of the data is a challenging scenario as data itself store the sensitive data along with other application data. Protecting sensitive data is very challenging as it could not be quantified directly. Here, we formulate a metric sensitivity-score to calculate the sensitivity value of the data attributes in a dataset. Sensitive attributes are segregated carefully to avoid possible data linkage attacks by the legitimate users of the application data. Micro-data format is good for maintaining privacy for sensitive data. However, the utility of the data will decrease exponentially. So here in this paper, the authors try to model the data in such a way that a balance between privacy and utility is maintained. The entire data set is segregated in micro-data format with attributes based on the sensitivity value. A Decision Tree-based classifier is used to label the attributes of a sample healthcare dataset as Sensitive or not. Experiments are also conducted to compare the utility and the privacy factor of the proposed method with other existing data partitioning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attribute association based privacy preservation for multi trust level environment

Article 01 September 2015

Enhanced ℓ – Diversity Algorithm for Privacy Preserving Data Mining

A novel two phase data sensitivity based access control framework for healthcare data

Article 13 June 2023

References

https://www3.weforum.org/docs/WEF_Redesigning_Data_Privacy_Report_2020.pdf. Accessed Sept 2021.
Wikipedia, AOL Data Leak. http://en.wikipedia.org/wiki/AOL_search_data_leak. 2022. Accessed Dec 2021.
Valecha R, Shambhu U, Rao HR. An activity theory approach to leak detection and mitigation in patient health information (PHI). J Assoc Inf Syst. 2021;22(4):6.
Google Scholar
Huang XZ, Liu JQ, Han Z, Yang J. Privacy beyond sensitive values. Sci China Inf Sci. 2015;58(7):1–15.
Article MathSciNet Google Scholar
Harel Amir, Shabtai Asaf, Rokach Lior, Elovici Yuval. M-score: a misuseability weight measure. IEEE Trans Dependable Secure Comput. 2012;9(3):414–28.
Article Google Scholar
Mahesh R, Meyyappan T. Anonymization technique through record elimination to preserve privacy of published data. In: International Conference on Pattern Recognition, Informatics and Mobile Engineering, 2013, pp. 328–332.
Victor N, Lopez D, Abawajy JH. Privacy models for big data: a survey. Int J Big Data Intell. 2016;3(1):61–75.
Article Google Scholar
Kim J, Hyung-Jong K. The data modeling considered correlation of information leakage detection and privacy violation. In: Asian conference on intelligent information and database systems. Berlin: Springer; 2011. p. 392–401.
Chapter Google Scholar
Noshad M. A data value metric for quantifying information content and utility. J Big Data. 2021;8:1–23.
Article Google Scholar
Ganta SR, Kasiviswanathan SP, Smith A. Composition attacks and auxiliary information in data privacy. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2008, pp. 265–73.
Ye M, Wu X, Hu X, Hu D. Anonymizing classification data using rough set theory. Knowl-Based Syst. 2013;43(2):82–94.
Article Google Scholar
Yao Lin, et al. Sensitive label privacy preservation with anatomization for data publishing. IEEE Trans Dependable Secure Comput. 2019;18(2):904–17.
Article Google Scholar
Cheung YM, Jia H. Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit. 2013;46(8):2228–38.
Article MATH Google Scholar
Li T, Li N, Zhang J, Molloy I. Slicing: a new approach for privacy preserving data publishing. IEEE Trans Knowl Data Eng. 2012;24(3):561–74.
Article Google Scholar
Anjum A, Ahmad N, Malik SUR, Zubair S, Shahzad B. An efficient approach for publishing microdata for multiple sensitive attributes. J Supercomput. 2018;74(10):5127–55. https://doi.org/10.1007/s11227-018-2390-x.
Article Google Scholar
Khan R, Tao X, Anjum A, Sajjad H, Malik SUR, Khan A, Amiri F. Privacy preserving for multiple sensitive attributes against fingerprint correlation attack satisfying C-diversity. Wirel Commun Mob Comput. 2020;2020:1–18. https://doi.org/10.1155/2020/8416823.
Article Google Scholar
Sweeney L. Achieving K-anonymity privacy protection using generalization and suppression. Fuzz Knowl-Based Syst. 2002;10(5):571–88. https://doi.org/10.1142/S021848850200165X.
Article MathSciNet MATH Google Scholar
Samarati P, Sweeney L. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory. 1998.
Li N, et al. t-closeness: privacy beyond k-anonymity and L-diversity. In: Data engineering (ICDE), IEEE 23rd International Conference. 2007.
Samarati P. Protecting respondent’s privacy in microdata release. IEEE Trans Knowl Data Eng. 2001;13(6):1010–27.
Article Google Scholar
Ashoka K, Poornima B. Enhanced utility in preserving privacy for multiple heterogeneous sensitive attributes using correlation and personal sensitivity flags. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE, 2017, pp. 970–76.
Rao PS, Satyanarayana S. Privacy preserving data publishing based on sensitivity in context of big data using Hive. J Big Data. 2018;5(1):1–20.
Article Google Scholar
Jayapradha J, Prakash M, Youseef A, Osamah IK, Saleh Ahmed A. Heap Bucketization anonymity-an efficient privacy-preserving data publishing model for multiple sensitive attributes. IEEE Access. 2022;10:28773–91.
Article Google Scholar
Kumar TKA, Hong L, Johnson PT, Xiaofeh H. Content sensitivity based access control framework for Hadoop. Digit Commun Netw. 2017;3(4):213–25.
Article Google Scholar
Lee H, Chung YD. Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values. BMC Med Inform Decis Mak. 2020;20:155. https://doi.org/10.1186/s12911-020-01171-5.
Article Google Scholar
Veeningen M, Supriyo C, Anna Zsófia H, Gerald S, Eric B, Peter van der SPEK, Onno Van Der G, Job G, Wessel K, Thijs V. Enabling analytics on sensitive medical data with secure multi-party computation. In: MIE, 2018, pp. 76–80.
Yang J, Li J, Niu Y. A hybrid solution for privacy preserving medical data sharing in the cloud environment. Future Gener Comput Syst. 2015;43–44:74–86.
Article Google Scholar
Ciriani V, Capitani De, di Vimercati S, Foresti S, Jajodia S, Paraboschi S, Samarati P. Fragmentation and encryption to enforce privacy in data storage. In: Biskup J, López J, editors. Computer security-ESORICS. Darmstadt: ESORICS; 2007.
Google Scholar
Ganapathy V, Thomas D, Feder T, Garcia-Molina H, Motwani R. Distributing data for secure database services. Trans Data Priv. 2012;5(1):253–72.
MathSciNet Google Scholar
Aggarwal, G, Bawa, M, Ganesan, P, Garcia-Molina, H, Kenthapadi K, Motwani R, Srivastava U, Thomas D, Xu Y. Two can keep a secret: a distributed architecture for secure database services. In: Proceedings of the Conference on Innovative Data Systems Research (CIDR 2005), www.cidrdb.org, 2005, pp. 186–199.
Mansour HO, et al. Quasi-Identifier recognition algorithm for privacy preservation of cloud data based on risk reidentification. Wirel Commun Mob Comput. 2021.
Amalie D, Michael P, Stephanie C, Emma F, Priyanka P, Kieran R, Haotian W, Jessica CM, Michael H, Graham W, Colleen LL. Differential privacy for public health data: an innovative tool to optimize information sharing while protecting data confidentiality. Patterns. 2021;2(12): 100366. https://doi.org/10.1016/j.patter.2021.100366.
Article Google Scholar
Saha S, Saha P, Neogy S. Hierarchical metadata based secure data retrieval technique for health care application. In: Proceedings of the 10th ICACCT 2016. Berlin: Springer; 2016. p. 175–82.
Google Scholar
Claude ES. A mathematical theory of communication. In: ACM SIGMOBILE Mobile Computing and Communications, 2001, pp. 3–55. (Review 5, No. 1)

Download references

Author information

Authors and Affiliations

Department of Information Technology, MAKAUT, Kolkata, West Bengal, India
Sayantani Saha & Shuchismita Mallick
Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
Sarmistha Neogy

Authors

Sayantani Saha
View author publications
You can also search for this author inPubMed Google Scholar
Shuchismita Mallick
View author publications
You can also search for this author inPubMed Google Scholar
Sarmistha Neogy
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Sayantani Saha.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Social Data Science: Research Challenges and Future Directions” guest edited by Sarbani Roy, Chandreyee Chowdhury, and Samiran Chattopadhyay.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Saha, S., Mallick, S. & Neogy, S. Privacy-Preserving Healthcare Data Modeling Based on Sensitivity and Utility. SN COMPUT. SCI. 3, 482 (2022). https://doi.org/10.1007/s42979-022-01372-x

Download citation

Received: 05 December 2021
Accepted: 15 August 2022
Published: 15 September 2022
DOI: https://doi.org/10.1007/s42979-022-01372-x

Keywords

Part of a collection:

Social Data Science: Research Challenges and Future Directions

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Privacy-Preserving Healthcare Data Modeling Based on Sensitivity and Utility

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Attribute association based privacy preservation for multi trust level environment

Enhanced ℓ – Diversity Algorithm for Privacy Preserving Data Mining

A novel two phase data sensitivity based access control framework for healthcare data

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now