Skip to main content

A Privacy Preserving and Safety-Aware Semi-supervised Model for Dissecting Cancer Samples

  • Conference paper
  • First Online:
  • 1801 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10449))

Abstract

Research in cancer genomics has proliferated with the advent of microarray technologies. These technologies facilitate monitoring of thousands of genes in parallel, thus providing insight into disease subtypes and gene functions. Gene expression data obtained from microarray chips are typified by few samples and a large number of genes. Supervised classifiers such as support vector machines (SVM) have been deployed for prediction task. However, insufficient labeled data have resulted in a paradigm shift to semi-supervised learning, in particular, transductive SVM (TSVM). Analysis of gene expression data using TSVM revealed that the performance of the model degenerates in the presence of unlabeled data. We address this issue by using a representative sampling strategy which ensures safety of the classifier even in the presence of unlabeled data. We also address the issue of privacy violation when classifier is shipped to other medical institutes for analysis of shared data. We propose a safety aware and privacy preserving TSVM for classifying cancer subtypes. Performance of TSVM with SVM and accuracy loss of the proposed TSVM are also analyzed.

This is a preview of subscription content, log in via an institution.

References

  1. Bair, E., Tibshirani, R.: Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2, E108 (2004)

    Article  Google Scholar 

  2. Camara, F., Samb, M.L., Ndiaye, S., Slimani, Y.: Privacy preserving RFE-SVM for distributed gene selection. Int. J. Comput. Sci. Issues 154–159 (2012)

    Google Scholar 

  3. Deepthi, P.S., Thampi, S.M.: Predicting cancer subtypes from microarray data using semi-supervised fuzzy C-means algorithm. J. Intell. Fuzzy Syst. 32(4), 2797–2805 (2017)

    Article  Google Scholar 

  4. Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. 103(15), 5923–5928 (2006)

    Article  Google Scholar 

  5. Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Natl. Acad. Sci. 95(25), 14863–14868 (1998)

    Article  Google Scholar 

  6. Guo, S., Zhong, S., Zhang, A.: Privacy preserving calculation of fisher criterion score for informative gene selection. In: 2014 IEEE International Conference on Bioinformatics and Bioengineering (BIBE), pp. 90–96. IEEE (2014)

    Google Scholar 

  7. Haferlach, T., Kohlmann, A., Wieczorek, L., Basso, G., Te Kronnie, G., Bn, M.C., De Vos, J., Hernndez, J.M., Hofmann, W.K., Mills, K.I., Gilkes, A.: Clinical utility of microarray-based gene expression profling in the diagnosis and subclassifcation of leukemia: report from the International Microarray Innovations in Leukemia Study Group. J. Clin. Oncol. 28(15), 2529–2537 (2010)

    Article  Google Scholar 

  8. Hruschka, E.R., Covoes, T.F.: Feature selection for cluster analysis: an approach based on the simplified Silhouette criterion. In: 2005 International Conference on Intelligent Agents, Web Technologies and Internet Commerce, pp. 32–38 (2005)

    Google Scholar 

  9. http://orange.biolab.si.datasets.psp. Accessed 6 June 2014

  10. Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: a survey. IEEE Tran. Knowl. Data Eng. 16, 1370–1386 (2004)

    Article  Google Scholar 

  11. Li, X.: Privacy preserving clustering for distributed homogeneous gene expression data sets. In: Innovations in Data Methodologies and Computational Algorithms for Medical Applications, pp. 184–207. IGI Global (2012)

    Google Scholar 

  12. Lin, K.P., Chen, M.S.: On the design and analysis of the privacy-preserving SVM classifier. IEEE Trans. Knowl. Data Eng. 23(11), 1704–1717 (2011)

    Article  Google Scholar 

  13. Maulik, U., Mukhopadhyay, A., Chakraborty, D.: Gene-expression-based cancer subtypes prediction through feature selection and transductive SVM. IEEE Trans. Biomed. Eng. 60(4), 1111–1117 (2013)

    Article  Google Scholar 

  14. Salazar, R., Roepman, P., Capella, G., Moreno, V., Simon, I., Dreezen, C., Lopez-Doriga, A., Santos, C., Marijnen, C., Westerga, J., Bruin, S.: Gene expression signature to improve prognosis prediction of stage II and III colorectal cancer. J. Clin. Oncol. 29, 17–24 (2010)

    Article  Google Scholar 

  15. Zhang, X., Guan, N., Jia, Z., Qiu, X., Luo, Z.: Semi-supervised projective non-negative matrix factorization for cancer classification. PloS ONE 10(9), e0138814 (2015)

    Article  Google Scholar 

Download references

Acknowledgments

The research was financially supported by Department of Information Technology, Government of Kerala and the facilities were provided by Indian Institute of Information Technology and Management - Kerala.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. S. Deepthi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Deepthi, P.S., Thampi, S.M. (2017). A Privacy Preserving and Safety-Aware Semi-supervised Model for Dissecting Cancer Samples. In: Nguyen, N., Papadopoulos, G., Jędrzejowicz, P., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2017. Lecture Notes in Computer Science(), vol 10449. Springer, Cham. https://doi.org/10.1007/978-3-319-67077-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67077-5_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67076-8

  • Online ISBN: 978-3-319-67077-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics