Abstract
Privacy-preserving data mining techniques could encourage health data custodians to provide accurate information for mining by ensuring that the data mining procedures and results cannot, with any reasonable degree of certainty, violate data privacy. We outline privacy-preserving data mining techniques/systems in the literature and in industry. They range from privacy-preserving data publishing, privacy-preserving (distributed) computation to privacy-preserving data mining result release. We discuss their strength and weaknesses respectively, and indicate there is no perfect technical solution yet. We also provide and discuss a possible development framework for privacy-preserving health data mining systems.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
The Office of Legislative Drafting: Privacy Act (Cth) (1988), Attorney-general’s Department, Canberra, Australia (2004), http://www.privacy.gov.au/act/privacyact
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Jin, H.D., Shum, W., Leung, K.S., Wong, M.L.: Expanding self-organizing map for data visualization and cluster analysis. Information Sciences 163, 157–173 (2004)
Jin, H., Wong, M.L., Leung, K.S.: Scalable model-based clustering for large databases based on data summarization. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(11), 1710–1719 (2005)
Jin, H., Chen, J., He, H., Williams, G.J., Kelman, C., O’Keefe, C.M.: Mining unexpected temporal associations: Applications in detecting adverse drug reactions. IEEE Transactions on Information Technology in Biomedicine (2007)
Jin, H., Chen, J., Kelman, C., He, H., McAullay, D., O’Keefe, C.M.: Mining unexpected associations for signalling potential adverse drug reactions from administrative health databases. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 867–876. Springer, Heidelberg (2006)
Crompton, M.: What is privacy? In: Privacy and Security in the Information Age Conference, Melbourne (2001), http://www.privacy.gov.au/news/speeches/sp51note1.html
Oliveira, S.R.M., Zaane, O.R.: Protecting sensitive knowledge by data sanitization. In: ICDM 2003. Proceedings of the Third IEEE International Conference on Data Mining, pp. 613–616. IEEE Computer Society Press, Los Alamitos (2003)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 1–9. IEEE Computer Society Press, Los Alamitos (2003)
U.S.Census Bureau: Public-use microdata samples (PUMS) (2007) (Accessed on 21 January 2007), http://www.census.gov/main/www/pums.html
Australian Bureau of Statistics: Confidentialised unit record file (CURF) (2007) (Accessed on 20 January 2007), http://www.abs.gov.au
Statistics New Zealand: Confidentialised unit record file (CURF). (2007) (Accessed on 21 January 2007), http://www.stats.govt.nz/curf-programme
Sweeney, L.: k-anonymity: a model for protecting privacy. Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE 2005, pp. 217–228 (2005)
Fung, B., Wang, K., Yu, P.: Top-down specialization for information and privacy preservation. In: Proceedings of 21st International Conference on Data Engineering (ICDE 2005), pp. 205–216 (2005)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of SIGMOD 2000, pp. 439–450. ACM Press, New York (2000)
Li, J., Wang, H., Jin, H., Yong, J.: Current developments of k-anonymous data releasing. In: Proceedings of ehPASS 2006, Brisbane, Australia, pp. 109–121 (2006)
Jin, W., Ge, R., Qian, W.: On robust and effective k-anonymity in large databases. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 621–636. Springer, Heidelberg (2006)
Wang, K., Fung, B.C., Yu, P.S.: Template-based privacy preservation in classification problems. In: ICDM 2005: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 466–473. IEEE Computer Society Press, Los Alamitos (2005)
Wang, K., Fung, B.C.M., Yu, P.S.: Handicapping attacker’s confidence: An alternative to k-anonymization. Knowledge and Information Systems: An International Journal (2006)
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity: Privacy beyond κ-anonymity. In: Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE 2006), vol. 11(3), pp. 345–368 (2007)
Wong, R., Li, J., Fu, A., Wang, K. (alpha,k)-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In: KDD 2006, pp. 754–759 (2006)
Vaidya, J., Clifton, C.: Privacy-preserving data mining: Why, how, and when. IEEE Security & Privacy 2(6), 19–27 (2004)
Yao, A.: Protocols for secure computations. In: Proceedings of the twenty-third annual IEEE Symposium on Foundations of Computer Science, pp. 160–164. IEEE Computer Society Press, Los Alamitos (1982)
Gilburd, B., Schuster, A., Wolff, R.: k-TTP: a new privacy model for large-scale distributed environments. In: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 563–568. ACM Press, New York (2004)
Wright, R., Yang, Z.: Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 713–718. ACM Press, New York (2004)
O’Keefe, C.M., Yung, M., Gu, L., Baxter, R.: Privacy-preserving data linkage protocols. In: WPES 2004, pp. 94–102 (2004)
Jin, H., Leung, K.S., Wong, M.L., Xu, Z.B.: Scalable model-based cluster analysis using clustering features. Pattern Recognition 38(5), 637–649 (2005)
Jin, H., Chen, J., He, H., O’Keefe, C.M.: Privacy-preserving sequential pattern release. In: PAKDD 2007. LNCS, vol. 4426, pp. 547–554. Springer, Heidelberg (2007)
Woodruff, D., Staddon, J.: Private inference control. In: CCS 2004: Proceedings of the 11th ACM conference on Computer and communications security, pp. 188–197. ACM Press, New York (2004)
Li, J., Fu, A.W.C., He, H., Chen, J., Jin, H., McAullay, D., Williams, G., Sparks, R., Kelman, C.: Mining risk patterns in medical data. In: KDD 2005, pp. 770–775 (2005)
Adam, N.R., Worthmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)
Australian Bureau of Statistics: Remote access data laboratory (RADL) – user guide (2006) (Accessed on 20 January 2007), http://www.abs.gov.au
Sparks, R., Carter, C., Donnelly, J., Duncan, J., O’Keefe, C., Ryan, L.: A framework for performing statistical analyses of unit record health data without violating either privacy or confidentiality of individuals. In: Proceedings of the 55th Session of the International Statistical Institute, Sydney (2005)
Sparks, R., Carter, C., Donnelly, J., O’Keefe, C., Duncan, J., Keighley, T., McAullay, D., Ryan, L.: Privacy-preserving analytics: remote access methods for exploratory data analysis and statistical modelling. Under review, CSIRO (2006)
Fule, P., Roddick, J.F.: Detecting privacy and ethical sensitivity in data mining results. In: Proceedings of ACS 2004, pp. 159–166 (2004)
Oliveira, S.R.M., Zaïane, O.R., Saygin, Y.: Secure association rule sharing. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 74–85. Springer, Heidelberg (2004)
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: k-anonymous patterns. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 10–21. Springer, Heidelberg (2005)
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Blocking anonymity threats raised by frequent itemset mining. In: ICDM 2005, pp. 561–564 (2005)
Fienberg, S.E., Slavkovic, A.B.: Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Mining and Knowledge Discovery 11(2), 155–180 (2005)
Bayardo, R.J., Srikant, R.: Technological solutions for protecting privacy. IEEE Computer 36(9), 115–118 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, H.(. (2007). Practical Issues on Privacy-Preserving Health Data Mining. In: Washio, T., et al. Emerging Technologies in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4819. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77018-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-77018-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77016-9
Online ISBN: 978-3-540-77018-3
eBook Packages: Computer ScienceComputer Science (R0)