Abstract
With the proliferation of healthcare data, the cloud mining technology for E-health services and applications has become a hot research topic. While on the other hand, these rapidly evolving cloud mining technologies and their deployment in healthcare systems also pose potential threats to patient’s data privacy. In order to solve the privacy problem in the cloud mining technique, this paper proposes a semi-supervised privacy-preserving clustering algorithm. By employing a small amount of supervised information, the method first learns a Large Margin Nearest Cluster metric using convex optimization. Then according to the trained metric, the method imposes multiplicative perturbation on the original data, which can change the distribution shape of the original data and thus protect the privacy information as well as ensuring high data usability. The experimental results on the brain fiber dataset provided by the 2009 PBC demonstrated that the proposed method could not only protect data privacy towards secure attacks, but improve the clustering purity.
Similar content being viewed by others
References
Wang L, Alexander CA (2014) Telemedicine based on mobile devices and mobile cloud computing. Int J Cloud Comput Serv Sci 3(1):26–36
Sultan N (2014) Making use of cloud computing for healthcare provision: opportunities and challenges. Int J Inf Manag 34(2):177–184
Uniyal D, Raychoudhury V (2014) Pervasive healthcare-a comprehensive survey of tools and techniques. Clin Orthop Relat Res
Jeong S, Kim Y-W, Youn C-H (2014) Personalized healthcare system for chronic disease care in cloud environment. J Electron Telecommunications Res Inst 36(5):730–740
Meyer J, Boll S (2014) Digital health devices for everyone! IEEE Pervasive Comput 13(2):10–13
Min J-K, Doryab A, Wiese J, Amini S, Zimmerman J, Hong JI (2014) Toss’n’turn: smartphone as sleep and sleep quality detector. In Proceedings of the 32nd annual ACM conference on human factors in computing systems. ACM 477–486
Banu PN, Andrews S (2015) Performance analysis of hard and soft clustering approaches for gene expression data. Int J Rough Sets Data Anal 2(1):58–69
Yuan B, Herbert J (2014) Context-aware hybrid reasoning framework for pervasive health- care. Pers Ubiquit Comput 18(4):865–881
Theoharidou M, Tsalis N, Gritzalis D (2014) Smart home solutions for healthcare: privacy in ubiquitous computing infrastructures. Handbook of smart homes, health care and well-being
Avancha S, Baxi A, Kotz D (2012) Privacy in mobile technology for personal healthcare. ACM Comput Surv 45(1):3
Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM Sigmod Rec 33(1):50–57
Chhinkaniwala H, Garg S (2014) Privacy preserving data mining-issues & techniques: preserving privacy of data streams and large data sets while mining. Scholars Press
Wang B, Yang J (2011) The state of the art and tendency of privacy preserving data mining. In International Conference on E-Business and E-Government. IEEE 1–3
Keyvanpour MR, Moradi SS (2014) A perturbation method based on singular value decomposition and feature selection for privacy preserving data mining. Int J Data Warehouse Min 10(1):55–76
Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data pertur- bation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1):92–106
Saygin Y, Verykios VS, Elmagarmid AK (2002) Privacy preserving association rule mining. In Proceedings of twelfth international workshop on research issues in data engineering: engineering e-commerce/e-business systems. IEEE 151–158
Fienberg SE, McIntyre J (2005) Data swapping: variations on a theme by dalenius and reiss. J Off Stat 21(2):309
Oliveira SR, Za¨ıane OR (2004) Achieving privacy preservation when sharing data for clustering. In secure data management. Springer 67–82
The pbc site. [Online]. Available: http://pbc.lrdc.pitt.edu/?q=2009b-home
Han J, Kamber M (2006) Data mining, Southeast Asia edition: concepts and techniques. Morgan kaufmann
Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. Int J Bio Sci Bio Technol 5(5):241–266
Kumar V, Park H, Basole RC, Braunstein M, Kahng M, Chau DH, Tamersoy A, Hirsh DA, Serban N, Bost J et al (2014) Exploring clinical care processes using visual and data analytics: challenges and opportunities. In Proceedings of the 20th ACM SIGKDD conference on knowledge discovery and data mining workshop on data science for social good
Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. Appl Stat 100–108
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2(3):283–304
Ball GH, Hall DJ (1965) Isodata, a novel method of data analysis and pattern classification. DTIC document. Technol Rep
Kaushik K, Kapoor D, Varadharajan V, Nallusamy R (2014) Disease management: clustering–based disease prediction. Int J Collab Enterp 4(1):69–82
Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586–600
Hajihashemi Z, Yefimova M, Popescu M (2014) Detecting daily routines of older adults using sensor time series clustering. In Proceedings of the 36th annual IEEE international conference on engineering in medicine and biology society. IEEE 5912–5915
Fahad LG, Tahir SF, Rajarajan M (2014) Activity recognition in smart homes using clustering based classification. In Proceedings of the 22nd international conference on pattern recognition. IEEE 1348–1353
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Haraty RA, Dimishkieh M, Masud M (2015) An enhanced k-means clustering algorithm for pattern discovery in healthcare data. Int J Distrib Sens Netw
Wang X, Chen M, Kwon TT, Yang L, Leung V (2013) Ames-cloud: a framework of adaptive mobile video streaming and efficient social video sharing in the clouds. IEEE Trans Multimed 15(4):811–820
Wan J, Ullah S, Lai C-F, Zhou M, Wang X et al (2013) Cloud-enabled wireless body area networks for pervasive healthcare. IEEE Netw 27(5):56–61
Raij A, Ghosh A, Kumar S, Srivastava M (2011) Privacy risks emerging from the adoption of innocuous wearable sensors in the mobile environment. In Proceedings of the SIGCHI conference on human factors in computing systems, ser. CHI ’11. New York, NY, USA. ACM 11–20
Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In Proceedings of the ninth ACM SIGKDD international conference on knowl- edge discovery and data mining. ACM 505–510
Kalaivani R, Chidambaram S (2014) Additive gaussian noise based data perturbation in multi-level trust privacy preserving data mining. Int J Data Min Knowl Manag Process 4(3):21–29
Wieland SC, Cassa CA, Mandl KD, Berger B (2008) Revealing the spatial distribution of a disease while preserving privacy. Proc Natl Acad Sci 105(46):17608–17613
Elmisery AM, Fu H (2010) Privacy preserving distributed learning clustering of healthcare data using cryptography protocols. In Proceeedings of the 34th Annual IEEE Conference on Computer Software and Applications Workshops. IEEE 140–145
Williams J (2010) Social networking applications in health care: threats to the privacy and security of health information. In Proceedings of the International Conference on Software Engeneering Workshop on Software Engineering in Health Care. ACM 39–49
Allab K, Benabdeslem K (2011) Constraint selection for semi-supervised topological clustering. In machine learning and knowledge discovery in databases. Springer 28–43
Lange T, Law MH, Jain AK, Buhmann JM (2005) Learning with constrained and unlabelled data. Proc IEEE Comput Soc Conf Comput Vis Pattern Recogn 1:731–738
Bekkerman R, Sahami M (2006) Semi-supervised clustering using combinatorial mrfs. In Proceedings of IEEE international conference of machine learning workshop on learn- ing in structured output spaces
Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Michigan State Univ 2
Guillaumin M, Verbeek J, Schmid C (2010) Multiple instance metric learning from auto- matically labeled bags of faces. In Europeon Conference on Computer Vision. Springer 634–647
Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space- level constraints: making the most of prior knowledge in data clustering. In Proceedings of the Nineteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc. 307–314
Cohn D, Caruana R, McCallum A (2003) Semi-supervised clustering with user feedback. Constrained Cluster Adv AlgorithmsTheory Appl 4(1):17–32
Wu L, Hoi SC, Jin R, Zhu J, Yu N (2012) Learning bregman distance functions for semi-supervised clustering. IEEE Trans Knowl Data Eng 24(3):478–491
Domeniconi C, Peng J, Yan B (2011) Composite kernels for semi-supervised clustering. Knowl Inf Syst 28(1):99–116
Chen Y, Rege M, Dong M, Hua J (2007) Incorporating user provided constraints into document clustering. In Proceedings of the Seventh IEEE International Conference on Data Mining. IEEE 103–112
Baghshah MS, Shouraki SB (2010) Kernel-based metric learning for semi-supervised clustering. Neurocomputing 73(7):1352–1361
Hoi SC, Jin R, Lyu MR (2007) Learning nonparametric kernel matrices from pairwise constraints. In Proceedings of the 24th International Conference on Machine Learning. ACM 361–368
Bar-Hillel A, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence relations. Proc Tenth Int Conf Mach Learn 3:11–18
Xing EP, Jordan MI, Russell S, Ng AY (2002) Distance metric learning with appli- cation to clustering with side-information. In advances in neural information processing systems. 505–512
Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In Advances in neural information processing systems. 1473–1480
Vandenberghe L, Boyd S (1996) Semidefinite programming. Soc Ind Appl Math Rev 38(1):49–95
Bertsekas DP (1976) On the goldstein-levitin-polyak gradient projection method. IEEE Trans Autom Control 21(2):174–184
Golub GH, Van Loan CF (2012) Matrix computations. JHU Press 3
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported in part by the National Natural Science Foundation of China (61173066, 61472399), and the Beijing Natural Science Foundation (4112056, 4122078, 4144085).
Rights and permissions
About this article
Cite this article
Huang, M., Chen, Y., Chen, BW. et al. A semi-supervised privacy-preserving clustering algorithm for healthcare. Peer-to-Peer Netw. Appl. 9, 864–875 (2016). https://doi.org/10.1007/s12083-015-0356-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12083-015-0356-9