Abstract
Individual privacy in genomic era is becoming a growing concern as more individuals get their genomes sequenced or genotyped. Infringement of genetic privacy can be conducted even without raw genotypes or sequencing data. Studies have reported that summary statistics from Genome Wide Association Studies (GWAS) can be exploited to threat individual privacy. In this study, we show that even with differentially private GWAS statistics, there is still a risk for leaking individual privacy. Specifically, we constructed a Bayesian network through mining public GWAS statistics, and evaluated two attacks, namely trait inference attack and identity inference attack, for infringement of individual privacy not only for GWAS participants but also regular individuals. We used both simulation and real human genetic data from 1000 Genome Project to evaluate our methods. Our results demonstrated that unexpected privacy breaches could occur and attackers can derive identity information and private information by utilizing these algorithms. Hence, more methodological studies should be invested to understand the infringement and protection of genetic privacy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Erlich, Y., Narayanan, A.: Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15(6), 409–421 (2014)
Greenbaum, D., Gerstein, M.: Genomic anonymity: have we already lost it? Am. J. Bioeth. 8(10), 71–74 (2008)
Greenbaum, D., Gerstein, M.: Social networking and personal genomics: suggestions for optimizing the interaction. Am. J. Bioeth. 9(6–7), 15–19 (2009)
Greenbaum, D., Sboner, A., Mu, X.J., Gerstein, M.: Genomics and privacy: implications of the new reality of closed data for the field. PLoS Comput. Biol. 7(12), e1002278 (2011)
The Health Insurance Portability and Accountability Act of 1996 (HIPAA). http://www.hhs.gov/hipaa/
Shi, X., Wu, X.: Genetic privacy: risks, ethics, and protection techniques. In: The Workshop on Data Science Learning and Applications to Biomedical and Health Sciences, pp. 57–62, New York, NY (2016)
Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J., Pearson, J.V., Stephan, D.A., Nelson, S.F., Craig, D.W.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4(8), e1000167 (2008)
Masca, N., Burton, P.R., Sheehan, N.A.: Participant identification in genetic association studies: improved methods and practical implications. Int. J. Epidemiol. 40(6), 1629–1642 (2011)
Wang, R., Li, Y.F., Wang, X., Tang, H., Zhou, X.: Learning your identity and disease from research papers: information leaks in genome wide association study. In: 16th ACM Conference on Computer and Communications Security, pp. 534–544. ACM (2009)
Zhou, X., Peng, B., Li, Y.F., Chen, Y., Tang, H., Wang, X.F.: To release or not to release: evaluating information leaks in aggregate human-genome data. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 607–627. Springer, Heidelberg (2011)
Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013)
Wang, Y., Wu, X., Shi, X.: Using aggregate human genome data for individual identification. In,: IEEE International Conference on Bioinformatics and Biomedicine, pp. 410–415. IEEE, Shenzhen, China (2013)
Hindorff, L.A., MacArthur, J., Morales, J., Junkins, H.A., Hall, P.N., Klemm, A.K., Manolio, T.A.: A Catalog of Published Genome-wide Association Studies. http://www.genome.gov/gwastudies
Fienberg, S.E., Slavkovic, A., Uhler, C.: Privacy preserving GWAS data sharing. In: 11th International Conference on Data Mining Workshops, pp. 628–635. IEEE (2011)
Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: 19th ACM International Conference on Knowledge Discovery and Data Mining, pp. 1079–1087. ACM, Chicago, IL (2013)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Dwork, C.: A firm foundation for private data analysis. Commun. ACM 54(1), 86–95 (2011)
Bhaskar, R., Laxman, S., Smith, A., Thakurta, A.: Discovering frequent patterns in sensitive data. In: 16th ACM International Conference on Knowledge Discovery and Data Mining, pp. 503–512. ACM, Washington, DC (2010)
Chaudhuri, K., Monteleoni, C.: Privacy-preserving logistic regression. In: 23rd Annual Conference on Neural Information Processing Systems, pp. 289–296. Citeseer, Vancouver, B.C., Canada (2008)
Kifer, D., Machanavajjhala, A.: No free lunch in data privacy. In: 17th ACM International Conference on Knowledge Discovery and Data Mining, pp. 193–204. ACM, San Diego, CA (2011)
Lee, J., Clifton, C.: Differential identifiability. In: 18th ACM International Conference on Knowledge Discovery and Data Mining, pp. 1041–1049. ACM, Beijing, China (2012)
Acknowledgements
The work is supported in part by US National Science Foundation (DGE-1523115 and IIS-1502273 to XW, and DGE-1523154 and IIS-1502172 to XS).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, Y., Wen, J., Wu, X., Shi, X. (2016). Infringement of Individual Privacy via Mining Differentially Private GWAS Statistics. In: Wang, Y., Yu, G., Zhang, Y., Han, Z., Wang, G. (eds) Big Data Computing and Communications. BigCom 2016. Lecture Notes in Computer Science(), vol 9784. Springer, Cham. https://doi.org/10.1007/978-3-319-42553-5_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-42553-5_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42552-8
Online ISBN: 978-3-319-42553-5
eBook Packages: Computer ScienceComputer Science (R0)