Practical Issues on Privacy-Preserving Health Data Mining

Jin, Huidong (Warren)

doi:10.1007/978-3-540-77018-3_8

Huidong (Warren) Jin^1,2

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4819))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1586 Accesses

Abstract

Privacy-preserving data mining techniques could encourage health data custodians to provide accurate information for mining by ensuring that the data mining procedures and results cannot, with any reasonable degree of certainty, violate data privacy. We outline privacy-preserving data mining techniques/systems in the literature and in industry. They range from privacy-preserving data publishing, privacy-preserving (distributed) computation to privacy-preserving data mining result release. We discuss their strength and weaknesses respectively, and indicate there is no perfect technical solution yet. We also provide and discuss a possible development framework for privacy-preserving health data mining systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Privacy Preservation of Electronic Health Record: Current Status and Future Direction

An Overview of Big Data Issues in Privacy-Preserving Record Linkage

Recent Advanced in Healthcare Data Privacy Techniques

References

The Office of Legislative Drafting: Privacy Act (Cth) (1988), Attorney-general’s Department, Canberra, Australia (2004), http://www.privacy.gov.au/act/privacyact
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Google Scholar
Jin, H.D., Shum, W., Leung, K.S., Wong, M.L.: Expanding self-organizing map for data visualization and cluster analysis. Information Sciences 163, 157–173 (2004)
Article MathSciNet Google Scholar
Jin, H., Wong, M.L., Leung, K.S.: Scalable model-based clustering for large databases based on data summarization. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(11), 1710–1719 (2005)
Article Google Scholar
Jin, H., Chen, J., He, H., Williams, G.J., Kelman, C., O’Keefe, C.M.: Mining unexpected temporal associations: Applications in detecting adverse drug reactions. IEEE Transactions on Information Technology in Biomedicine (2007)
Google Scholar
Jin, H., Chen, J., Kelman, C., He, H., McAullay, D., O’Keefe, C.M.: Mining unexpected associations for signalling potential adverse drug reactions from administrative health databases. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 867–876. Springer, Heidelberg (2006)
Chapter Google Scholar
Crompton, M.: What is privacy? In: Privacy and Security in the Information Age Conference, Melbourne (2001), http://www.privacy.gov.au/news/speeches/sp51note1.html
Oliveira, S.R.M., Zaane, O.R.: Protecting sensitive knowledge by data sanitization. In: ICDM 2003. Proceedings of the Third IEEE International Conference on Data Mining, pp. 613–616. IEEE Computer Society Press, Los Alamitos (2003)
Google Scholar
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 1–9. IEEE Computer Society Press, Los Alamitos (2003)
Google Scholar
U.S.Census Bureau: Public-use microdata samples (PUMS) (2007) (Accessed on 21 January 2007), http://www.census.gov/main/www/pums.html
Australian Bureau of Statistics: Confidentialised unit record file (CURF) (2007) (Accessed on 20 January 2007), http://www.abs.gov.au
Statistics New Zealand: Confidentialised unit record file (CURF). (2007) (Accessed on 21 January 2007), http://www.stats.govt.nz/curf-programme
Sweeney, L.: k-anonymity: a model for protecting privacy. Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE 2005, pp. 217–228 (2005)
Google Scholar
Fung, B., Wang, K., Yu, P.: Top-down specialization for information and privacy preservation. In: Proceedings of 21st International Conference on Data Engineering (ICDE 2005), pp. 205–216 (2005)
Google Scholar
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of SIGMOD 2000, pp. 439–450. ACM Press, New York (2000)
Chapter Google Scholar
Li, J., Wang, H., Jin, H., Yong, J.: Current developments of k-anonymous data releasing. In: Proceedings of ehPASS 2006, Brisbane, Australia, pp. 109–121 (2006)
Google Scholar
Jin, W., Ge, R., Qian, W.: On robust and effective k-anonymity in large databases. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 621–636. Springer, Heidelberg (2006)
Chapter Google Scholar
Wang, K., Fung, B.C., Yu, P.S.: Template-based privacy preservation in classification problems. In: ICDM 2005: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 466–473. IEEE Computer Society Press, Los Alamitos (2005)
Chapter Google Scholar
Wang, K., Fung, B.C.M., Yu, P.S.: Handicapping attacker’s confidence: An alternative to k-anonymization. Knowledge and Information Systems: An International Journal (2006)
Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity: Privacy beyond κ-anonymity. In: Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE 2006), vol. 11(3), pp. 345–368 (2007)
Google Scholar
Wong, R., Li, J., Fu, A., Wang, K. (alpha,k)-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In: KDD 2006, pp. 754–759 (2006)
Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving data mining: Why, how, and when. IEEE Security & Privacy 2(6), 19–27 (2004)
Article Google Scholar
Yao, A.: Protocols for secure computations. In: Proceedings of the twenty-third annual IEEE Symposium on Foundations of Computer Science, pp. 160–164. IEEE Computer Society Press, Los Alamitos (1982)
Google Scholar
Gilburd, B., Schuster, A., Wolff, R.: k-TTP: a new privacy model for large-scale distributed environments. In: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 563–568. ACM Press, New York (2004)
Chapter Google Scholar
Wright, R., Yang, Z.: Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 713–718. ACM Press, New York (2004)
Chapter Google Scholar
O’Keefe, C.M., Yung, M., Gu, L., Baxter, R.: Privacy-preserving data linkage protocols. In: WPES 2004, pp. 94–102 (2004)
Google Scholar
Jin, H., Leung, K.S., Wong, M.L., Xu, Z.B.: Scalable model-based cluster analysis using clustering features. Pattern Recognition 38(5), 637–649 (2005)
Article Google Scholar
Jin, H., Chen, J., He, H., O’Keefe, C.M.: Privacy-preserving sequential pattern release. In: PAKDD 2007. LNCS, vol. 4426, pp. 547–554. Springer, Heidelberg (2007)
Google Scholar
Woodruff, D., Staddon, J.: Private inference control. In: CCS 2004: Proceedings of the 11th ACM conference on Computer and communications security, pp. 188–197. ACM Press, New York (2004)
Chapter Google Scholar
Li, J., Fu, A.W.C., He, H., Chen, J., Jin, H., McAullay, D., Williams, G., Sparks, R., Kelman, C.: Mining risk patterns in medical data. In: KDD 2005, pp. 770–775 (2005)
Google Scholar
Adam, N.R., Worthmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)
Article Google Scholar
Australian Bureau of Statistics: Remote access data laboratory (RADL) – user guide (2006) (Accessed on 20 January 2007), http://www.abs.gov.au
Sparks, R., Carter, C., Donnelly, J., Duncan, J., O’Keefe, C., Ryan, L.: A framework for performing statistical analyses of unit record health data without violating either privacy or confidentiality of individuals. In: Proceedings of the 55th Session of the International Statistical Institute, Sydney (2005)
Google Scholar
Sparks, R., Carter, C., Donnelly, J., O’Keefe, C., Duncan, J., Keighley, T., McAullay, D., Ryan, L.: Privacy-preserving analytics: remote access methods for exploratory data analysis and statistical modelling. Under review, CSIRO (2006)
Google Scholar
Fule, P., Roddick, J.F.: Detecting privacy and ethical sensitivity in data mining results. In: Proceedings of ACS 2004, pp. 159–166 (2004)
Google Scholar
Oliveira, S.R.M., Zaïane, O.R., Saygin, Y.: Secure association rule sharing. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 74–85. Springer, Heidelberg (2004)
Google Scholar
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: k-anonymous patterns. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 10–21. Springer, Heidelberg (2005)
Chapter Google Scholar
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Blocking anonymity threats raised by frequent itemset mining. In: ICDM 2005, pp. 561–564 (2005)
Google Scholar
Fienberg, S.E., Slavkovic, A.B.: Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Mining and Knowledge Discovery 11(2), 155–180 (2005)
Article MathSciNet Google Scholar
Bayardo, R.J., Srikant, R.: Technological solutions for protecting privacy. IEEE Computer 36(9), 115–118 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

NICTA, Locked Bag 8001, Canberra ACT, 2601, Australia
Huidong (Warren) Jin
RSISE, the Australian National University, Canberra ACT, 2601, Australia
Huidong (Warren) Jin

Authors

Huidong (Warren) Jin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Takashi Washio Zhi-Hua Zhou Joshua Zhexue Huang Xiaohua Hu Jinyan Li Chao Xie Jieyue He Deqing Zou Kuan-Ching Li Mário M. Freire

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, H.(. (2007). Practical Issues on Privacy-Preserving Health Data Mining. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-77018-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77016-9
Online ISBN: 978-3-540-77018-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics