Abstract
Huge amount of data is produced and processed in recent data-centric applications. Secure management as well as maintaining privacy of the data is a challenging scenario as data itself store the sensitive data along with other application data. Protecting sensitive data is very challenging as it could not be quantified directly. Here, we formulate a metric sensitivity-score to calculate the sensitivity value of the data attributes in a dataset. Sensitive attributes are segregated carefully to avoid possible data linkage attacks by the legitimate users of the application data. Micro-data format is good for maintaining privacy for sensitive data. However, the utility of the data will decrease exponentially. So here in this paper, the authors try to model the data in such a way that a balance between privacy and utility is maintained. The entire data set is segregated in micro-data format with attributes based on the sensitivity value. A Decision Tree-based classifier is used to label the attributes of a sample healthcare dataset as Sensitive or not. Experiments are also conducted to compare the utility and the privacy factor of the proposed method with other existing data partitioning algorithm.






Similar content being viewed by others
References
https://www3.weforum.org/docs/WEF_Redesigning_Data_Privacy_Report_2020.pdf. Accessed Sept 2021.
Wikipedia, AOL Data Leak. http://en.wikipedia.org/wiki/AOL_search_data_leak. 2022. Accessed Dec 2021.
Valecha R, Shambhu U, Rao HR. An activity theory approach to leak detection and mitigation in patient health information (PHI). J Assoc Inf Syst. 2021;22(4):6.
Huang XZ, Liu JQ, Han Z, Yang J. Privacy beyond sensitive values. Sci China Inf Sci. 2015;58(7):1–15.
Harel Amir, Shabtai Asaf, Rokach Lior, Elovici Yuval. M-score: a misuseability weight measure. IEEE Trans Dependable Secure Comput. 2012;9(3):414–28.
Mahesh R, Meyyappan T. Anonymization technique through record elimination to preserve privacy of published data. In: International Conference on Pattern Recognition, Informatics and Mobile Engineering, 2013, pp. 328–332.
Victor N, Lopez D, Abawajy JH. Privacy models for big data: a survey. Int J Big Data Intell. 2016;3(1):61–75.
Kim J, Hyung-Jong K. The data modeling considered correlation of information leakage detection and privacy violation. In: Asian conference on intelligent information and database systems. Berlin: Springer; 2011. p. 392–401.
Noshad M. A data value metric for quantifying information content and utility. J Big Data. 2021;8:1–23.
Ganta SR, Kasiviswanathan SP, Smith A. Composition attacks and auxiliary information in data privacy. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2008, pp. 265–73.
Ye M, Wu X, Hu X, Hu D. Anonymizing classification data using rough set theory. Knowl-Based Syst. 2013;43(2):82–94.
Yao Lin, et al. Sensitive label privacy preservation with anatomization for data publishing. IEEE Trans Dependable Secure Comput. 2019;18(2):904–17.
Cheung YM, Jia H. Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit. 2013;46(8):2228–38.
Li T, Li N, Zhang J, Molloy I. Slicing: a new approach for privacy preserving data publishing. IEEE Trans Knowl Data Eng. 2012;24(3):561–74.
Anjum A, Ahmad N, Malik SUR, Zubair S, Shahzad B. An efficient approach for publishing microdata for multiple sensitive attributes. J Supercomput. 2018;74(10):5127–55. https://doi.org/10.1007/s11227-018-2390-x.
Khan R, Tao X, Anjum A, Sajjad H, Malik SUR, Khan A, Amiri F. Privacy preserving for multiple sensitive attributes against fingerprint correlation attack satisfying C-diversity. Wirel Commun Mob Comput. 2020;2020:1–18. https://doi.org/10.1155/2020/8416823.
Sweeney L. Achieving K-anonymity privacy protection using generalization and suppression. Fuzz Knowl-Based Syst. 2002;10(5):571–88. https://doi.org/10.1142/S021848850200165X.
Samarati P, Sweeney L. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory. 1998.
Li N, et al. t-closeness: privacy beyond k-anonymity and L-diversity. In: Data engineering (ICDE), IEEE 23rd International Conference. 2007.
Samarati P. Protecting respondent’s privacy in microdata release. IEEE Trans Knowl Data Eng. 2001;13(6):1010–27.
Ashoka K, Poornima B. Enhanced utility in preserving privacy for multiple heterogeneous sensitive attributes using correlation and personal sensitivity flags. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE, 2017, pp. 970–76.
Rao PS, Satyanarayana S. Privacy preserving data publishing based on sensitivity in context of big data using Hive. J Big Data. 2018;5(1):1–20.
Jayapradha J, Prakash M, Youseef A, Osamah IK, Saleh Ahmed A. Heap Bucketization anonymity-an efficient privacy-preserving data publishing model for multiple sensitive attributes. IEEE Access. 2022;10:28773–91.
Kumar TKA, Hong L, Johnson PT, Xiaofeh H. Content sensitivity based access control framework for Hadoop. Digit Commun Netw. 2017;3(4):213–25.
Lee H, Chung YD. Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values. BMC Med Inform Decis Mak. 2020;20:155. https://doi.org/10.1186/s12911-020-01171-5.
Veeningen M, Supriyo C, Anna Zsófia H, Gerald S, Eric B, Peter van der SPEK, Onno Van Der G, Job G, Wessel K, Thijs V. Enabling analytics on sensitive medical data with secure multi-party computation. In: MIE, 2018, pp. 76–80.
Yang J, Li J, Niu Y. A hybrid solution for privacy preserving medical data sharing in the cloud environment. Future Gener Comput Syst. 2015;43–44:74–86.
Ciriani V, Capitani De, di Vimercati S, Foresti S, Jajodia S, Paraboschi S, Samarati P. Fragmentation and encryption to enforce privacy in data storage. In: Biskup J, López J, editors. Computer security-ESORICS. Darmstadt: ESORICS; 2007.
Ganapathy V, Thomas D, Feder T, Garcia-Molina H, Motwani R. Distributing data for secure database services. Trans Data Priv. 2012;5(1):253–72.
Aggarwal, G, Bawa, M, Ganesan, P, Garcia-Molina, H, Kenthapadi K, Motwani R, Srivastava U, Thomas D, Xu Y. Two can keep a secret: a distributed architecture for secure database services. In: Proceedings of the Conference on Innovative Data Systems Research (CIDR 2005), www.cidrdb.org, 2005, pp. 186–199.
Mansour HO, et al. Quasi-Identifier recognition algorithm for privacy preservation of cloud data based on risk reidentification. Wirel Commun Mob Comput. 2021.
Amalie D, Michael P, Stephanie C, Emma F, Priyanka P, Kieran R, Haotian W, Jessica CM, Michael H, Graham W, Colleen LL. Differential privacy for public health data: an innovative tool to optimize information sharing while protecting data confidentiality. Patterns. 2021;2(12): 100366. https://doi.org/10.1016/j.patter.2021.100366.
Saha S, Saha P, Neogy S. Hierarchical metadata based secure data retrieval technique for health care application. In: Proceedings of the 10th ICACCT 2016. Berlin: Springer; 2016. p. 175–82.
Claude ES. A mathematical theory of communication. In: ACM SIGMOBILE Mobile Computing and Communications, 2001, pp. 3–55. (Review 5, No. 1)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Social Data Science: Research Challenges and Future Directions” guest edited by Sarbani Roy, Chandreyee Chowdhury, and Samiran Chattopadhyay.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Saha, S., Mallick, S. & Neogy, S. Privacy-Preserving Healthcare Data Modeling Based on Sensitivity and Utility. SN COMPUT. SCI. 3, 482 (2022). https://doi.org/10.1007/s42979-022-01372-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01372-x