Skip to main content

Advertisement

Log in

Privacy-Preserving Healthcare Data Modeling Based on Sensitivity and Utility

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Huge amount of data is produced and processed in recent data-centric applications. Secure management as well as maintaining privacy of the data is a challenging scenario as data itself store the sensitive data along with other application data. Protecting sensitive data is very challenging as it could not be quantified directly. Here, we formulate a metric sensitivity-score to calculate the sensitivity value of the data attributes in a dataset. Sensitive attributes are segregated carefully to avoid possible data linkage attacks by the legitimate users of the application data. Micro-data format is good for maintaining privacy for sensitive data. However, the utility of the data will decrease exponentially. So here in this paper, the authors try to model the data in such a way that a balance between privacy and utility is maintained. The entire data set is segregated in micro-data format with attributes based on the sensitivity value. A Decision Tree-based classifier is used to label the attributes of a sample healthcare dataset as Sensitive or not. Experiments are also conducted to compare the utility and the privacy factor of the proposed method with other existing data partitioning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. https://www3.weforum.org/docs/WEF_Redesigning_Data_Privacy_Report_2020.pdf. Accessed Sept 2021.

  2. Wikipedia, AOL Data Leak. http://en.wikipedia.org/wiki/AOL_search_data_leak. 2022. Accessed Dec 2021.

  3. Valecha R, Shambhu U, Rao HR. An activity theory approach to leak detection and mitigation in patient health information (PHI). J Assoc Inf Syst. 2021;22(4):6.

    Google Scholar 

  4. Huang XZ, Liu JQ, Han Z, Yang J. Privacy beyond sensitive values. Sci China Inf Sci. 2015;58(7):1–15.

    Article  MathSciNet  Google Scholar 

  5. Harel Amir, Shabtai Asaf, Rokach Lior, Elovici Yuval. M-score: a misuseability weight measure. IEEE Trans Dependable Secure Comput. 2012;9(3):414–28.

    Article  Google Scholar 

  6. Mahesh R, Meyyappan T. Anonymization technique through record elimination to preserve privacy of published data. In: International Conference on Pattern Recognition, Informatics and Mobile Engineering, 2013, pp. 328–332.

  7. Victor N, Lopez D, Abawajy JH. Privacy models for big data: a survey. Int J Big Data Intell. 2016;3(1):61–75.

    Article  Google Scholar 

  8. Kim J, Hyung-Jong K. The data modeling considered correlation of information leakage detection and privacy violation. In: Asian conference on intelligent information and database systems. Berlin: Springer; 2011. p. 392–401.

    Chapter  Google Scholar 

  9. Noshad M. A data value metric for quantifying information content and utility. J Big Data. 2021;8:1–23.

    Article  Google Scholar 

  10. Ganta SR, Kasiviswanathan SP, Smith A. Composition attacks and auxiliary information in data privacy. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2008, pp. 265–73.

  11. Ye M, Wu X, Hu X, Hu D. Anonymizing classification data using rough set theory. Knowl-Based Syst. 2013;43(2):82–94.

    Article  Google Scholar 

  12. Yao Lin, et al. Sensitive label privacy preservation with anatomization for data publishing. IEEE Trans Dependable Secure Comput. 2019;18(2):904–17.

    Article  Google Scholar 

  13. Cheung YM, Jia H. Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit. 2013;46(8):2228–38.

    Article  MATH  Google Scholar 

  14. Li T, Li N, Zhang J, Molloy I. Slicing: a new approach for privacy preserving data publishing. IEEE Trans Knowl Data Eng. 2012;24(3):561–74.

    Article  Google Scholar 

  15. Anjum A, Ahmad N, Malik SUR, Zubair S, Shahzad B. An efficient approach for publishing microdata for multiple sensitive attributes. J Supercomput. 2018;74(10):5127–55. https://doi.org/10.1007/s11227-018-2390-x.

    Article  Google Scholar 

  16. Khan R, Tao X, Anjum A, Sajjad H, Malik SUR, Khan A, Amiri F. Privacy preserving for multiple sensitive attributes against fingerprint correlation attack satisfying C-diversity. Wirel Commun Mob Comput. 2020;2020:1–18. https://doi.org/10.1155/2020/8416823.

    Article  Google Scholar 

  17. Sweeney L. Achieving K-anonymity privacy protection using generalization and suppression. Fuzz Knowl-Based Syst. 2002;10(5):571–88. https://doi.org/10.1142/S021848850200165X.

    Article  MathSciNet  MATH  Google Scholar 

  18. Samarati P, Sweeney L. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory. 1998.

  19. Li N, et al. t-closeness: privacy beyond k-anonymity and L-diversity. In: Data engineering (ICDE), IEEE 23rd International Conference. 2007.

  20. Samarati P. Protecting respondent’s privacy in microdata release. IEEE Trans Knowl Data Eng. 2001;13(6):1010–27.

    Article  Google Scholar 

  21. Ashoka K, Poornima B. Enhanced utility in preserving privacy for multiple heterogeneous sensitive attributes using correlation and personal sensitivity flags. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE, 2017, pp. 970–76.

  22. Rao PS, Satyanarayana S. Privacy preserving data publishing based on sensitivity in context of big data using Hive. J Big Data. 2018;5(1):1–20.

    Article  Google Scholar 

  23. Jayapradha J, Prakash M, Youseef A, Osamah IK, Saleh Ahmed A. Heap Bucketization anonymity-an efficient privacy-preserving data publishing model for multiple sensitive attributes. IEEE Access. 2022;10:28773–91.

    Article  Google Scholar 

  24. Kumar TKA, Hong L, Johnson PT, Xiaofeh H. Content sensitivity based access control framework for Hadoop. Digit Commun Netw. 2017;3(4):213–25.

    Article  Google Scholar 

  25. Lee H, Chung YD. Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values. BMC Med Inform Decis Mak. 2020;20:155. https://doi.org/10.1186/s12911-020-01171-5.

    Article  Google Scholar 

  26. Veeningen M, Supriyo C, Anna Zsófia H, Gerald S, Eric B, Peter van der SPEK, Onno Van Der G, Job G, Wessel K, Thijs V. Enabling analytics on sensitive medical data with secure multi-party computation. In: MIE, 2018, pp. 76–80.

  27. Yang J, Li J, Niu Y. A hybrid solution for privacy preserving medical data sharing in the cloud environment. Future Gener Comput Syst. 2015;43–44:74–86.

    Article  Google Scholar 

  28. Ciriani V, Capitani De, di Vimercati S, Foresti S, Jajodia S, Paraboschi S, Samarati P. Fragmentation and encryption to enforce privacy in data storage. In: Biskup J, López J, editors. Computer security-ESORICS. Darmstadt: ESORICS; 2007.

    Google Scholar 

  29. Ganapathy V, Thomas D, Feder T, Garcia-Molina H, Motwani R. Distributing data for secure database services. Trans Data Priv. 2012;5(1):253–72.

    MathSciNet  Google Scholar 

  30. Aggarwal, G, Bawa, M, Ganesan, P, Garcia-Molina, H, Kenthapadi K, Motwani R, Srivastava U, Thomas D, Xu Y. Two can keep a secret: a distributed architecture for secure database services. In: Proceedings of the Conference on Innovative Data Systems Research (CIDR 2005), www.cidrdb.org, 2005, pp. 186–199.

  31. Mansour HO, et al. Quasi-Identifier recognition algorithm for privacy preservation of cloud data based on risk reidentification. Wirel Commun Mob Comput. 2021.

  32. Amalie D, Michael P, Stephanie C, Emma F, Priyanka P, Kieran R, Haotian W, Jessica CM, Michael H, Graham W, Colleen LL. Differential privacy for public health data: an innovative tool to optimize information sharing while protecting data confidentiality. Patterns. 2021;2(12): 100366. https://doi.org/10.1016/j.patter.2021.100366.

    Article  Google Scholar 

  33. Saha S, Saha P, Neogy S. Hierarchical metadata based secure data retrieval technique for health care application. In: Proceedings of the 10th ICACCT 2016. Berlin: Springer; 2016. p. 175–82.

    Google Scholar 

  34. Claude ES. A mathematical theory of communication. In: ACM SIGMOBILE Mobile Computing and Communications, 2001, pp. 3–55. (Review 5, No. 1)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sayantani Saha.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Social Data Science: Research Challenges and Future Directions” guest edited by Sarbani Roy, Chandreyee Chowdhury, and Samiran Chattopadhyay.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, S., Mallick, S. & Neogy, S. Privacy-Preserving Healthcare Data Modeling Based on Sensitivity and Utility. SN COMPUT. SCI. 3, 482 (2022). https://doi.org/10.1007/s42979-022-01372-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01372-x

Keywords