Skip to main content
Log in

A semi-supervised privacy-preserving clustering algorithm for healthcare

  • Published:
Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Abstract

With the proliferation of healthcare data, the cloud mining technology for E-health services and applications has become a hot research topic. While on the other hand, these rapidly evolving cloud mining technologies and their deployment in healthcare systems also pose potential threats to patient’s data privacy. In order to solve the privacy problem in the cloud mining technique, this paper proposes a semi-supervised privacy-preserving clustering algorithm. By employing a small amount of supervised information, the method first learns a Large Margin Nearest Cluster metric using convex optimization. Then according to the trained metric, the method imposes multiplicative perturbation on the original data, which can change the distribution shape of the original data and thus protect the privacy information as well as ensuring high data usability. The experimental results on the brain fiber dataset provided by the 2009 PBC demonstrated that the proposed method could not only protect data privacy towards secure attacks, but improve the clustering purity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Wang L, Alexander CA (2014) Telemedicine based on mobile devices and mobile cloud computing. Int J Cloud Comput Serv Sci 3(1):26–36

    Google Scholar 

  2. Sultan N (2014) Making use of cloud computing for healthcare provision: opportunities and challenges. Int J Inf Manag 34(2):177–184

    Article  Google Scholar 

  3. Uniyal D, Raychoudhury V (2014) Pervasive healthcare-a comprehensive survey of tools and techniques. Clin Orthop Relat Res

  4. Jeong S, Kim Y-W, Youn C-H (2014) Personalized healthcare system for chronic disease care in cloud environment. J Electron Telecommunications Res Inst 36(5):730–740

    Google Scholar 

  5. Meyer J, Boll S (2014) Digital health devices for everyone! IEEE Pervasive Comput 13(2):10–13

    Article  Google Scholar 

  6. Min J-K, Doryab A, Wiese J, Amini S, Zimmerman J, Hong JI (2014) Toss’n’turn: smartphone as sleep and sleep quality detector. In Proceedings of the 32nd annual ACM conference on human factors in computing systems. ACM 477–486

  7. Banu PN, Andrews S (2015) Performance analysis of hard and soft clustering approaches for gene expression data. Int J Rough Sets Data Anal 2(1):58–69

    Article  Google Scholar 

  8. Yuan B, Herbert J (2014) Context-aware hybrid reasoning framework for pervasive health- care. Pers Ubiquit Comput 18(4):865–881

    Article  Google Scholar 

  9. Theoharidou M, Tsalis N, Gritzalis D (2014) Smart home solutions for healthcare: privacy in ubiquitous computing infrastructures. Handbook of smart homes, health care and well-being

  10. Avancha S, Baxi A, Kotz D (2012) Privacy in mobile technology for personal healthcare. ACM Comput Surv 45(1):3

    Article  Google Scholar 

  11. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM Sigmod Rec 33(1):50–57

    Article  Google Scholar 

  12. Chhinkaniwala H, Garg S (2014) Privacy preserving data mining-issues & techniques: preserving privacy of data streams and large data sets while mining. Scholars Press

  13. Wang B, Yang J (2011) The state of the art and tendency of privacy preserving data mining. In International Conference on E-Business and E-Government. IEEE 1–3

  14. Keyvanpour MR, Moradi SS (2014) A perturbation method based on singular value decomposition and feature selection for privacy preserving data mining. Int J Data Warehouse Min 10(1):55–76

    Article  Google Scholar 

  15. Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data pertur- bation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1):92–106

    Article  Google Scholar 

  16. Saygin Y, Verykios VS, Elmagarmid AK (2002) Privacy preserving association rule mining. In Proceedings of twelfth international workshop on research issues in data engineering: engineering e-commerce/e-business systems. IEEE 151–158

  17. Fienberg SE, McIntyre J (2005) Data swapping: variations on a theme by dalenius and reiss. J Off Stat 21(2):309

    Google Scholar 

  18. Oliveira SR, Za¨ıane OR (2004) Achieving privacy preservation when sharing data for clustering. In secure data management. Springer 67–82

  19. The pbc site. [Online]. Available: http://pbc.lrdc.pitt.edu/?q=2009b-home

  20. Han J, Kamber M (2006) Data mining, Southeast Asia edition: concepts and techniques. Morgan kaufmann

  21. Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. Int J Bio Sci Bio Technol 5(5):241–266

    Article  Google Scholar 

  22. Kumar V, Park H, Basole RC, Braunstein M, Kahng M, Chau DH, Tamersoy A, Hirsh DA, Serban N, Bost J et al (2014) Exploring clinical care processes using visual and data analytics: challenges and opportunities. In Proceedings of the 20th ACM SIGKDD conference on knowledge discovery and data mining workshop on data science for social good

  23. Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. Appl Stat 100–108

  24. Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Disc 2(3):283–304

    Article  Google Scholar 

  25. Ball GH, Hall DJ (1965) Isodata, a novel method of data analysis and pattern classification. DTIC document. Technol Rep

  26. Kaushik K, Kapoor D, Varadharajan V, Nallusamy R (2014) Disease management: clustering–based disease prediction. Int J Collab Enterp 4(1):69–82

    Article  Google Scholar 

  27. Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586–600

    Article  Google Scholar 

  28. Hajihashemi Z, Yefimova M, Popescu M (2014) Detecting daily routines of older adults using sensor time series clustering. In Proceedings of the 36th annual IEEE international conference on engineering in medicine and biology society. IEEE 5912–5915

  29. Fahad LG, Tahir SF, Rajarajan M (2014) Activity recognition in smart homes using clustering based classification. In Proceedings of the 22nd international conference on pattern recognition. IEEE 1348–1353

  30. Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137

    Article  MathSciNet  MATH  Google Scholar 

  31. Haraty RA, Dimishkieh M, Masud M (2015) An enhanced k-means clustering algorithm for pattern discovery in healthcare data. Int J Distrib Sens Netw

  32. Wang X, Chen M, Kwon TT, Yang L, Leung V (2013) Ames-cloud: a framework of adaptive mobile video streaming and efficient social video sharing in the clouds. IEEE Trans Multimed 15(4):811–820

    Article  Google Scholar 

  33. Wan J, Ullah S, Lai C-F, Zhou M, Wang X et al (2013) Cloud-enabled wireless body area networks for pervasive healthcare. IEEE Netw 27(5):56–61

    Article  Google Scholar 

  34. Raij A, Ghosh A, Kumar S, Srivastava M (2011) Privacy risks emerging from the adoption of innocuous wearable sensors in the mobile environment. In Proceedings of the SIGCHI conference on human factors in computing systems, ser. CHI ’11. New York, NY, USA. ACM 11–20

  35. Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In Proceedings of the ninth ACM SIGKDD international conference on knowl- edge discovery and data mining. ACM 505–510

  36. Kalaivani R, Chidambaram S (2014) Additive gaussian noise based data perturbation in multi-level trust privacy preserving data mining. Int J Data Min Knowl Manag Process 4(3):21–29

    Article  Google Scholar 

  37. Wieland SC, Cassa CA, Mandl KD, Berger B (2008) Revealing the spatial distribution of a disease while preserving privacy. Proc Natl Acad Sci 105(46):17608–17613

    Article  Google Scholar 

  38. Elmisery AM, Fu H (2010) Privacy preserving distributed learning clustering of healthcare data using cryptography protocols. In Proceeedings of the 34th Annual IEEE Conference on Computer Software and Applications Workshops. IEEE 140–145

  39. Williams J (2010) Social networking applications in health care: threats to the privacy and security of health information. In Proceedings of the International Conference on Software Engeneering Workshop on Software Engineering in Health Care. ACM 39–49

  40. Allab K, Benabdeslem K (2011) Constraint selection for semi-supervised topological clustering. In machine learning and knowledge discovery in databases. Springer 28–43

  41. Lange T, Law MH, Jain AK, Buhmann JM (2005) Learning with constrained and unlabelled data. Proc IEEE Comput Soc Conf Comput Vis Pattern Recogn 1:731–738

    Google Scholar 

  42. Bekkerman R, Sahami M (2006) Semi-supervised clustering using combinatorial mrfs. In Proceedings of IEEE international conference of machine learning workshop on learn- ing in structured output spaces

  43. Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Michigan State Univ 2

  44. Guillaumin M, Verbeek J, Schmid C (2010) Multiple instance metric learning from auto- matically labeled bags of faces. In Europeon Conference on Computer Vision. Springer 634–647

  45. Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space- level constraints: making the most of prior knowledge in data clustering. In Proceedings of the Nineteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc. 307–314

  46. Cohn D, Caruana R, McCallum A (2003) Semi-supervised clustering with user feedback. Constrained Cluster Adv AlgorithmsTheory Appl 4(1):17–32

    MathSciNet  MATH  Google Scholar 

  47. Wu L, Hoi SC, Jin R, Zhu J, Yu N (2012) Learning bregman distance functions for semi-supervised clustering. IEEE Trans Knowl Data Eng 24(3):478–491

    Article  Google Scholar 

  48. Domeniconi C, Peng J, Yan B (2011) Composite kernels for semi-supervised clustering. Knowl Inf Syst 28(1):99–116

    Article  Google Scholar 

  49. Chen Y, Rege M, Dong M, Hua J (2007) Incorporating user provided constraints into document clustering. In Proceedings of the Seventh IEEE International Conference on Data Mining. IEEE 103–112

  50. Baghshah MS, Shouraki SB (2010) Kernel-based metric learning for semi-supervised clustering. Neurocomputing 73(7):1352–1361

    Article  MATH  Google Scholar 

  51. Hoi SC, Jin R, Lyu MR (2007) Learning nonparametric kernel matrices from pairwise constraints. In Proceedings of the 24th International Conference on Machine Learning. ACM 361–368

  52. Bar-Hillel A, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence relations. Proc Tenth Int Conf Mach Learn 3:11–18

    MATH  Google Scholar 

  53. Xing EP, Jordan MI, Russell S, Ng AY (2002) Distance metric learning with appli- cation to clustering with side-information. In advances in neural information processing systems. 505–512

  54. Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In Advances in neural information processing systems. 1473–1480

  55. Vandenberghe L, Boyd S (1996) Semidefinite programming. Soc Ind Appl Math Rev 38(1):49–95

    MathSciNet  MATH  Google Scholar 

  56. Bertsekas DP (1976) On the goldstein-levitin-polyak gradient projection method. IEEE Trans Autom Control 21(2):174–184

    Article  MathSciNet  MATH  Google Scholar 

  57. Golub GH, Van Loan CF (2012) Matrix computations. JHU Press 3

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Ji.

Additional information

This work is supported in part by the National Natural Science Foundation of China (61173066, 61472399), and the Beijing Natural Science Foundation (4112056, 4122078, 4144085).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, M., Chen, Y., Chen, BW. et al. A semi-supervised privacy-preserving clustering algorithm for healthcare. Peer-to-Peer Netw. Appl. 9, 864–875 (2016). https://doi.org/10.1007/s12083-015-0356-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12083-015-0356-9

Keywords

Navigation