Skip to main content

Practical Issues on Privacy-Preserving Health Data Mining

  • Conference paper
Emerging Technologies in Knowledge Discovery and Data Mining (PAKDD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4819))

Included in the following conference series:

  • 1586 Accesses

Abstract

Privacy-preserving data mining techniques could encourage health data custodians to provide accurate information for mining by ensuring that the data mining procedures and results cannot, with any reasonable degree of certainty, violate data privacy. We outline privacy-preserving data mining techniques/systems in the literature and in industry. They range from privacy-preserving data publishing, privacy-preserving (distributed) computation to privacy-preserving data mining result release. We discuss their strength and weaknesses respectively, and indicate there is no perfect technical solution yet. We also provide and discuss a possible development framework for privacy-preserving health data mining systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. The Office of Legislative Drafting: Privacy Act (Cth) (1988), Attorney-general’s Department, Canberra, Australia (2004), http://www.privacy.gov.au/act/privacyact

  2. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)

    Google Scholar 

  3. Jin, H.D., Shum, W., Leung, K.S., Wong, M.L.: Expanding self-organizing map for data visualization and cluster analysis. Information Sciences 163, 157–173 (2004)

    Article  MathSciNet  Google Scholar 

  4. Jin, H., Wong, M.L., Leung, K.S.: Scalable model-based clustering for large databases based on data summarization. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(11), 1710–1719 (2005)

    Article  Google Scholar 

  5. Jin, H., Chen, J., He, H., Williams, G.J., Kelman, C., O’Keefe, C.M.: Mining unexpected temporal associations: Applications in detecting adverse drug reactions. IEEE Transactions on Information Technology in Biomedicine  (2007)

    Google Scholar 

  6. Jin, H., Chen, J., Kelman, C., He, H., McAullay, D., O’Keefe, C.M.: Mining unexpected associations for signalling potential adverse drug reactions from administrative health databases. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 867–876. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Crompton, M.: What is privacy? In: Privacy and Security in the Information Age Conference, Melbourne (2001), http://www.privacy.gov.au/news/speeches/sp51note1.html

  8. Oliveira, S.R.M., Zaane, O.R.: Protecting sensitive knowledge by data sanitization. In: ICDM 2003. Proceedings of the Third IEEE International Conference on Data Mining, pp. 613–616. IEEE Computer Society Press, Los Alamitos (2003)

    Google Scholar 

  9. Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 1–9. IEEE Computer Society Press, Los Alamitos (2003)

    Google Scholar 

  10. U.S.Census Bureau: Public-use microdata samples (PUMS) (2007) (Accessed on 21 January 2007), http://www.census.gov/main/www/pums.html

  11. Australian Bureau of Statistics: Confidentialised unit record file (CURF) (2007) (Accessed on 20 January 2007), http://www.abs.gov.au

  12. Statistics New Zealand: Confidentialised unit record file (CURF). (2007) (Accessed on 21 January 2007), http://www.stats.govt.nz/curf-programme

  13. Sweeney, L.: k-anonymity: a model for protecting privacy. Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  14. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE 2005, pp. 217–228 (2005)

    Google Scholar 

  15. Fung, B., Wang, K., Yu, P.: Top-down specialization for information and privacy preservation. In: Proceedings of 21st International Conference on Data Engineering (ICDE 2005), pp. 205–216 (2005)

    Google Scholar 

  16. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of SIGMOD 2000, pp. 439–450. ACM Press, New York (2000)

    Chapter  Google Scholar 

  17. Li, J., Wang, H., Jin, H., Yong, J.: Current developments of k-anonymous data releasing. In: Proceedings of ehPASS 2006, Brisbane, Australia, pp. 109–121 (2006)

    Google Scholar 

  18. Jin, W., Ge, R., Qian, W.: On robust and effective k-anonymity in large databases. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 621–636. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Wang, K., Fung, B.C., Yu, P.S.: Template-based privacy preservation in classification problems. In: ICDM 2005: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 466–473. IEEE Computer Society Press, Los Alamitos (2005)

    Chapter  Google Scholar 

  20. Wang, K., Fung, B.C.M., Yu, P.S.: Handicapping attacker’s confidence: An alternative to k-anonymization. Knowledge and Information Systems: An International Journal  (2006)

    Google Scholar 

  21. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: â„“-diversity: Privacy beyond κ-anonymity. In: Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE 2006), vol. 11(3), pp. 345–368 (2007)

    Google Scholar 

  22. Wong, R., Li, J., Fu, A., Wang, K. (alpha,k)-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In: KDD 2006, pp. 754–759 (2006)

    Google Scholar 

  23. Vaidya, J., Clifton, C.: Privacy-preserving data mining: Why, how, and when. IEEE Security & Privacy 2(6), 19–27 (2004)

    Article  Google Scholar 

  24. Yao, A.: Protocols for secure computations. In: Proceedings of the twenty-third annual IEEE Symposium on Foundations of Computer Science, pp. 160–164. IEEE Computer Society Press, Los Alamitos (1982)

    Google Scholar 

  25. Gilburd, B., Schuster, A., Wolff, R.: k-TTP: a new privacy model for large-scale distributed environments. In: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 563–568. ACM Press, New York (2004)

    Chapter  Google Scholar 

  26. Wright, R., Yang, Z.: Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 713–718. ACM Press, New York (2004)

    Chapter  Google Scholar 

  27. O’Keefe, C.M., Yung, M., Gu, L., Baxter, R.: Privacy-preserving data linkage protocols. In: WPES 2004, pp. 94–102 (2004)

    Google Scholar 

  28. Jin, H., Leung, K.S., Wong, M.L., Xu, Z.B.: Scalable model-based cluster analysis using clustering features. Pattern Recognition 38(5), 637–649 (2005)

    Article  Google Scholar 

  29. Jin, H., Chen, J., He, H., O’Keefe, C.M.: Privacy-preserving sequential pattern release. In: PAKDD 2007. LNCS, vol. 4426, pp. 547–554. Springer, Heidelberg (2007)

    Google Scholar 

  30. Woodruff, D., Staddon, J.: Private inference control. In: CCS 2004: Proceedings of the 11th ACM conference on Computer and communications security, pp. 188–197. ACM Press, New York (2004)

    Chapter  Google Scholar 

  31. Li, J., Fu, A.W.C., He, H., Chen, J., Jin, H., McAullay, D., Williams, G., Sparks, R., Kelman, C.: Mining risk patterns in medical data. In: KDD 2005, pp. 770–775 (2005)

    Google Scholar 

  32. Adam, N.R., Worthmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)

    Article  Google Scholar 

  33. Australian Bureau of Statistics: Remote access data laboratory (RADL) – user guide (2006) (Accessed on 20 January 2007), http://www.abs.gov.au

  34. Sparks, R., Carter, C., Donnelly, J., Duncan, J., O’Keefe, C., Ryan, L.: A framework for performing statistical analyses of unit record health data without violating either privacy or confidentiality of individuals. In: Proceedings of the 55th Session of the International Statistical Institute, Sydney (2005)

    Google Scholar 

  35. Sparks, R., Carter, C., Donnelly, J., O’Keefe, C., Duncan, J., Keighley, T., McAullay, D., Ryan, L.: Privacy-preserving analytics: remote access methods for exploratory data analysis and statistical modelling. Under review, CSIRO  (2006)

    Google Scholar 

  36. Fule, P., Roddick, J.F.: Detecting privacy and ethical sensitivity in data mining results. In: Proceedings of ACS 2004, pp. 159–166 (2004)

    Google Scholar 

  37. Oliveira, S.R.M., Zaïane, O.R., Saygin, Y.: Secure association rule sharing. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 74–85. Springer, Heidelberg (2004)

    Google Scholar 

  38. Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: k-anonymous patterns. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 10–21. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  39. Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Blocking anonymity threats raised by frequent itemset mining. In: ICDM 2005, pp. 561–564 (2005)

    Google Scholar 

  40. Fienberg, S.E., Slavkovic, A.B.: Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Mining and Knowledge Discovery 11(2), 155–180 (2005)

    Article  MathSciNet  Google Scholar 

  41. Bayardo, R.J., Srikant, R.: Technological solutions for protecting privacy. IEEE Computer 36(9), 115–118 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Washio Zhi-Hua Zhou Joshua Zhexue Huang Xiaohua Hu Jinyan Li Chao Xie Jieyue He Deqing Zou Kuan-Ching Li Mário M. Freire

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jin, H.(. (2007). Practical Issues on Privacy-Preserving Health Data Mining. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77018-3_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77016-9

  • Online ISBN: 978-3-540-77018-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics