Skip to main content

Analysis and Protection of Public Medical Dataset: From Privacy Perspective

  • Conference paper
  • First Online:
Health Information Science (HIS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14305))

Included in the following conference series:

  • 563 Accesses

Abstract

High-quality medical treatment is unattainable without protecting patients’ medical records and other sensitive information. One of the most critical challenges in the medical industry is patient privacy in light of medical systems’ widespread digitization and networking. What we call “health data” includes a plethora of information on individuals, including their medical records, treatment records, genetic data, and demographic information. In this paper, we review existing methods to keep patients’ health records private and compare their advantages and limitations. We then analyze the public medical dataset from the perspective of privacy protection, utilizing the k-anonymity and l-diversity models, and compare the impact of quasi-identifier attributes on privacy protection. Furthermore, we conduct experiments to investigate the trade-off between privacy and utility. Based on the analysis results, this paper provides data owners with a guide on how to choose attributes for medical data publication and how to select the appropriate techniques for preserving privacy in medical data publication.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/82xm-y6g8.

References

  1. Alnemari, A., Romanowski, C.J., Raj, R.K.: An adaptive differential privacy algorithm for range queries over healthcare data. In: 2017 IEEE International Conference on Healthcare Informatics (ICHI), pp. 397–402. IEEE (2017)

    Google Scholar 

  2. Anjum, A., et al.: An efficient privacy mechanism for electronic health records. Comput. Secur. 72, 196–211 (2018)

    Article  Google Scholar 

  3. Anjum, A., Raschia, G.: BangA: an efficient and flexible generalization-based algorithm for privacy preserving data publication. Computers 6(1), 1 (2017)

    Article  Google Scholar 

  4. Begum, S.H., Nausheen, F.: A comparative analysis of differential privacy vs other privacy mechanisms for big data. In: 2018 2nd International Conference on Inventive Systems and Control (ICISC), pp. 512–516. IEEE (2018)

    Google Scholar 

  5. Belsis, P., Pantziou, G.: Protecting anonymity in wireless medical monitoring environments. In: Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments, pp. 1–6 (2011)

    Google Scholar 

  6. Belsis, P., Pantziou, G.: A k-anonymity privacy-preserving approach in wireless medical monitoring environments. Pers. Ubiquit. Comput. 18, 61–74 (2014)

    Article  Google Scholar 

  7. Bhuiyan, M.Z.A., Wang, G., Choo, K.K.R.: Secured data collection for a cloud-enabled structural health monitoring system. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1226–1231. IEEE (2016)

    Google Scholar 

  8. Carvalho, T., Moniz, N., Faria, P., Antunes, L.: Survey on privacy-preserving techniques for data publishing. arXiv preprint arXiv:2201.08120 (2022)

  9. Chong, K.M.: Privacy-preserving healthcare informatics: a review. In: ITM Web of Conferences, vol. 36, p. 04005. EDP Sciences (2021)

    Google Scholar 

  10. Domingo-Ferrer, J., Martínez, S., Sánchez, D.: Decentralized k-anonymization of trajectories via privacy-preserving tit-for-tat. Comput. Commun. 190, 57–68 (2022)

    Article  Google Scholar 

  11. Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1

    Chapter  MATH  Google Scholar 

  12. Ebadi, H., Sands, D., Schneider, G.: Differential privacy: now it’s getting personal. Acm Sigplan Not. 50(1), 69–81 (2015)

    Article  MATH  Google Scholar 

  13. El Emam, K., Dankar, F.K.: Protecting privacy using k-anonymity. J. Am. Med. Inform. Assoc. 15(5), 627–637 (2008)

    Article  Google Scholar 

  14. Fatima, M., Rehman, O., Rahman, I.M.: Impact of features reduction on machine learning based intrusion detection systems. EAI Endors. Trans. Scalable Inf. Syst. 9(6), e9–e9 (2022)

    Google Scholar 

  15. Ficek, J., Wang, W., Chen, H., Dagne, G., Daley, E.: Differential privacy in health research: a scoping review. J. Am. Med. Inform. Assoc. 28(10), 2269–2276 (2021)

    Article  Google Scholar 

  16. Fung, B.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4) (2010). https://doi.org/10.1145/1749603.1749605

  17. Ge, Y.F., Bertino, E., Wang, H., Cao, J., Zhang, Y.: Distributed cooperative coevolution of data publishing privacy and transparency. ACM Trans. Knowl. Discov. Data (2023). https://doi.org/10.1145/3613962

    Article  Google Scholar 

  18. Ge, Y.F., Orlowska, M., Cao, J., Wang, H., Zhang, Y.: MDDE: multitasking distributed differential evolution for privacy-preserving database fragmentation. VLDB J. 31(5), 957–975 (2022)

    Article  Google Scholar 

  19. Ge, Y.F., et al.: Evolutionary dynamic database partitioning optimization for privacy and utility. IEEE Trans. Dependable Secure Comput. (2023). https://doi.org/10.1109/tdsc.2023.3302284

    Article  Google Scholar 

  20. Ge, Y.F., Wang, H., Cao, J., Zhang, Y.: An information-driven genetic algorithm for privacy-preserving data publishing. In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds.) WISE 2022. LNCS, vol. 13724, pp. 340–354. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20891-1_24

    Chapter  Google Scholar 

  21. Ge, Y.F., et al.: DSGA: a distributed segment-based genetic algorithm for multi-objective outsourced database partitioning. Inf. Sci. 612, 864–886 (2022). https://doi.org/10.1016/j.ins.2022.09.003

    Article  Google Scholar 

  22. Hu, J., Sun, K., Zhang, H.: Helmholtz machine with differential privacy. Inf. Sci. 613, 888–903 (2022)

    Article  Google Scholar 

  23. Jain, P., Gyanchandani, M., Khare, N.: Big data privacy: a technological perspective and review. J. Big Data 3(1), 1–25 (2016). https://doi.org/10.1186/s40537-016-0059-y

    Article  Google Scholar 

  24. Kabir, M.E., Mahmood, A.N., Wang, H., Mustafa, A.K.: Microaggregation sorting framework for k-anonymity statistical disclosure control in cloud computing. IEEE Trans. Cloud Comput. 8(2), 408–417 (2020). https://doi.org/10.1109/tcc.2015.2469649

    Article  Google Scholar 

  25. Kong, L., Wang, L., Gong, W., Yan, C., Duan, Y., Qi, L.: LSH-aware multitype health data prediction with privacy preservation in edge environment. World Wide Web 25, 1793–1808 (2022)

    Article  Google Scholar 

  26. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115. IEEE (2006)

    Google Scholar 

  27. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 3-es (2007)

    Google Scholar 

  28. Ngatchou, P., Zarei, A., El-Sharkawi, A.: Pareto multi objective optimization. In: Proceedings of the 13th International Conference on, Intelligent Systems Application to Power Systems, pp. 84–91. IEEE (2005)

    Google Scholar 

  29. Rajendran, K., Jayabalan, M., Rana, M.E.: A study on k-anonymity, l-diversity, and t-closeness techniques. IJCSNS 17(12), 172 (2017)

    Google Scholar 

  30. Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  31. Sarki, R., Ahmed, K., Wang, H., Zhang, Y., Wang, K.: Convolutional neural network for multi-class classification of diabetic eye disease. EAI Endors. Trans. Scalable Inf. Syst. 9(4), e5–e5 (2022)

    Google Scholar 

  32. Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J. 23(5), 771–794 (2014)

    Article  Google Scholar 

  33. Sowmiyaa, P., Tamilarasu, P., Kavitha, S., Rekha, A., Krishna, G.: Privacy preservation for microdata by using k-anonymity Algorithm. Int. J. Adv. Res. Comput. Commun. Eng. 4(4), 373–5 (2015)

    Google Scholar 

  34. Sun, X., Li, M., Wang, H.: A family of enhanced (l, \(\alpha \))-diversity models for privacy preserving data publishing. Futur. Gener. Comput. Syst. 27(3), 348–356 (2011). https://doi.org/10.1016/j.future.2010.07.007

    Article  Google Scholar 

  35. Sun, X., Wang, H., Li, J., Zhang, Y.: Satisfying privacy requirements before data anonymization. Comput. J. 55(4), 422–437 (2012)

    Article  Google Scholar 

  36. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  37. Vadavalli, A., Subhashini, R.: An improved differential privacy-preserving truth discovery approach in healthcare. In: 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 1031–1037. IEEE (2019)

    Google Scholar 

  38. Vasa, J., Thakkar, A.: Deep learning: differential privacy preservation in the era of big data. J. Comput. Inf. Syst. 63, 1–24 (2022)

    Google Scholar 

  39. Venkateswaran, N., Prabaharan, S.P.: An efficient neuro deep learning intrusion detection system for mobile adhoc networks. EAI Endors. Trans. Scalable Inf. Syst. 9(6), e7–e7 (2022)

    Google Scholar 

  40. Vimalachandran, P., Liu, H., Lin, Y., Ji, K., Wang, H., Zhang, Y.: Improving accessibility of the Australian my health records while preserving privacy and security of the system. Health Inf. Sci. Syst. 8, 1–9 (2020)

    Article  Google Scholar 

  41. Wang, H., Yi, X., Bertino, E., Sun, L.: Protecting outsourced data in cloud computing through access management. Concurr. Comput.: Pract. Exp. 28(3), 600–615 (2016)

    Article  Google Scholar 

  42. Yin, J., Tang, M., Cao, J., Wang, H., You, M., Lin, Y.: Vulnerability exploitation time prediction: an integrated framework for dynamic imbalanced learning. World Wide Web 25, 401–423 (2022)

    Article  Google Scholar 

  43. You, M., et al.: A knowledge graph empowered online learning framework for access control decision-making. World Wide Web 26(2), 827–848 (2023)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong-Feng Ge .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jahan, S., Ge, YF., Kabir, E., Wang, H. (2023). Analysis and Protection of Public Medical Dataset: From Privacy Perspective. In: Li, Y., Huang, Z., Sharma, M., Chen, L., Zhou, R. (eds) Health Information Science. HIS 2023. Lecture Notes in Computer Science, vol 14305. Springer, Singapore. https://doi.org/10.1007/978-981-99-7108-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7108-4_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7107-7

  • Online ISBN: 978-981-99-7108-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics