Abstract
As the cost of genome sequencing continues to fall, whole genome sequencing data have become a viable alternative for improving diagnostic accuracy and supporting personalized medicine. Although they have the potential to advance public health and accelerate scientific discoveries, massive collections of genomic data also raise significant concerns about individual privacy. Like traditional clinical information, human genomes may reveal information about individuals (e.g., identity, ethnic group, disease association, predisposition to diseases such as diabetes or cancer, etc.) Even more concerning is the fact that the information is shared with ancestors and descendants, and thus loss of privacy may put the privacy of the entire family at risk. Genome privacy is a big challenge for the entire biomedical community, particularly since scientific discoveries depend on data sharing and obfuscation of data is not a good option to protect privacy. Multiple factors are involved in genomic privacy research. The components that can be used to better protect genome privacy include, but are not limited to, legal, ethical and technical aspects, e.g., federal laws, policies and regulations, informed consent policies, data use agreements, secure data repositories, as well as privacy-preserving data analysis methods. However, genome privacy challenges cannot be addressed by any single component alone. We envision that better privacy protection can be achieved through the incorporation of multiple components. The goal of this chapter to introduce the state-of-the-art in genome privacy research. This chapter begins with an introduction of genome privacy followed by an overview of the legal, ethical and technical aspects of genome privacy. After formalizing the genome privacy problem, we will review existing attack models on genomic data. The techniques for mitigating these attacks are discussed. This chapter concludes with the discussion of the challenges and the future directions in genome privacy research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Shuang Wang and Xiaoqian Jiang share the first authorship.
References
Howe, D., Costanzo, M., Fey, P., et al.: Big data: the future of biocuration. Nature 455, 47–50 (2008). http://dx.doi.org/10.1038/455047a. Accessed 11 Jul 2014
HiSeq X Ten.: 1000 dollar genome sequencing. http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn. Accessed 11 Jul 2014
Abecasis, G.R., Auton, A., Brooks, L.D., et al.: An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). doi:10.1038/nature11632
Fu, W., O’Connor, T.D., Jun, G., et al.: Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013). doi:10.1038/nature11690
Park, J.-H., Wacholder, S., Gail, M.H., et al.: Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat. Genet. 42, 570–575. (2010). doi:10.1038/ng.610
Marx, V.: Biology: the big challenges of big data. Nature 498, 255–260 (2013). doi:10.1038/498255a
Bradbury, A.R., Dignam, J.J., Ibe, C.N., et al. How often do BRCA mutation carriers tell their young children of the family’s risk for cancer? a study of parental disclosure of BRCA mutations to minors and young adults. J. Clin. Oncol. 25, 3705–3711 (2007). doi:10.1200/JCO.2006.09.1900
Willard, H.F., Angrist, M., Ginsburg, G.S.: Genomic medicine: genetic variation and its impact on the future of health care. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360, 1543–1550 (2005). doi:10.1098/rstb.2005.1683
Pulley, J.M., Denny, J.C., Peterson, J.F., et al.: Operational implementation of prospective genotyping for personalized medicine: the design of the Vanderbilt PREDICT project. Clin. Pharmacol. Ther. 92, 87–95 (2012). doi:10.1038/clpt.2011.371
Collins, F.S., Varmus, H.: A new initiative on precision medicine. N. Engl. J. Med. 372, 793–795 (2015)
Visscher, P.M., Brown, M.A., McCarthy, M.I., et al.: Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012). doi:10.1016/j.ajhg.2011.11.029
Mailman, M.D., Feolo, M., Jin, Y., et al.: The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 39, 1181–1186 (2007). doi:10.1038/ng1007-1181
NIH Genomic Data Sharing Policy.: http://gds.nih.gov/03policy2.html (2014)
Lin, Z., Owen, A.B., Altman, R.B.: Genetics. Genomic research and human subject privacy. Science 305, 183 (2004). doi:10.1126/science.1095019
Homer, N., Szelinger, S., Redman, M., et al.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4, e1000167 (2008)
Gymrek, M., McGuire, A.L., Golan, D., et al.: Identifying personal genomes by surname inference. Science 339, 321–324 (2013)
Nyholt, D.R., Yu, C.-E., Visscher, P.M.: On Jim Watson’s APOE status: genetic information is hard to hide. Eur. J. Hum. Genet. 17, 147–149 (2009). doi:10.1038/ejhg.2008.198
Wang, R., Li, Y.F., Wang, X., et al.: Learning your identity and disease from research papers. In: Proceedings of the 16th ACM Conference on Computer and Communications Security - CCS ’09, vol. 534. ACM Press, New York (2009). doi:10.1145/1653662.1653726
Humbert, M., Ayday, E., Hubaux, J.-P., et al.: Addressing the concerns of the Lacks family: quantification of kin genomic privacy. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security - CCS ’13, pp. 1141–1152. ACM Press, New York (2013). doi:10.1145/2508859.2516707
Genetic Information Nondiscrimination Act.: (2008), http://www.eeoc.gov/laws/statutes/gina.cfm. Accessed 11 Jul 2014
McGuire, A.L., Caulfield, T., Cho, M.K.: Research ethics and the challenge of whole-genome sequencing. Nat. Rev. Genet. 9, 152–156 (2008). doi:10.1038/nrg2302
Caulfield, T., McGuire, A.L., Cho, M., et al.: Research ethics recommendations for whole-genome research: consensus statement. PLoS Biol. 6, e73 (2008). doi:10.1371/journal.pbio.0060073
Sankararaman, S., Obozinski, G., Jordan, M.I., et al.: Genomic privacy and limits of individual detection in a pool. Nat. Genet. 41, 965–967 (2009). http://dx.doi.org/10.1038/ng.436. Accessed 18 Apr 2014
Amsterdam Workshop on Genome Privacy. http://seclab.soic.indiana.edu/GenomePrivacy (2014)
2014 iDASH Genome Privacy Protection Challenge Workshop. http://www.humang enomeprivacy.org/2014 (2014)
2015 iDASH Privacy and Security Workshop. http://www.humangenomeprivacy.org/2015/. Accessed 02 Jan, 2015
Dwork, C.: Differential privacy. Int. Colloq. Autom. Lang. Program. 405, 2:1–2:12 (2006)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10, 557–570 (2002)
Li, N., Li, T., Venkatasubramanian, S.: t closeness?: privacy beyond k-anonymity and -diversity. In: IEEE 23rd International Conference on Data Engineering, pp. 106–115. IEEE (2007). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4221659
Yu, F., Fienberg, S.E., Slavkovic, A.B., et al.: Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. Published Online First: 6 February (2014). doi:10.1016/j.jbi.2014.01.008
Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies (GWAS) (2007). http://grants.nih.gov/grants/guide/notice-files/NOT-OD-07-088.html. Accessed 11 Jul 2014
Committees the NI of HGDSG.: Data use under the NIH GWAS data sharing policy and future directions. Nat. Genet. 46, 934–938 (2014). http://dx.doi.org/10.1038/ng.3062
NIH security best practices for controlled-access data subject to the NIH genomic data sharing (GDS) policy. http://www.ncbi.nlm.nih.gov/projects/gap/pdf/dbgap_2b_security_procedures.pdf. Accessed 20 Mar 2015
Dondorp, W.J., de Wert, G.M.W.R.: The ‘thousand-dollar genome’: an ethical exploration. Eur. J. Hum. Genet. 21:S6–S26 (2013)
Maryland v. King. S. Ct. 2013;133:1958
Maryland v. King. S. Ct. 2013;133:1967
Health Insurance Portability and Accountability Act (HIPAA). http://www.hhs.gov/ocr/hipaa. Accessed 11 Jul 2014
New rule protects patient privacy, secures health information. U.S. Department of Health and Human Services. http://www.hhs.gov/news/press/2013pres/01/20130117b.html. Accessed 11 Jul 2014
HIPAA Privacy Rule, 45 C.F.R. § 164 (2014)
Nass, S.J., Levit, L.A., Gostin, L.O.: Beyond the HIPAA privacy rule: enhancing privacy, improving health through research. The National Academies Press, Washington, DC (2009)
Federal policy for the protection of human subjects. U.S. Department of Health and Human Services. http://www.hhs.gov/ohrp/humansubjects/commonrule/. Accessed 12 Mar 2015
45 C.F.R. § 46.101(b)(4)
Human Subject Research Protections, 76 Fed. Reg. 44,512, 44,524–25 (July 26, 2011)
45 C.F.R. § 160.103, 164.514, 164.514
Baser v. Dep’t of Veterans Affairs, 2014 U.S. Dist. LEXIS 137602, at *11 (E.D. Mich. Sept. 30, 2014); Steinberg v. CVS Caremark Corp., 899 F. Supp. 2d 331, 336 (E.D. Pa. 2012)
42 U.S.C. § 2000ff
29 U.S.C. § 1182
E.g., Dumas v. Hurley Med. Ctr., 837 F. Supp. 2d 655, 659 (E.D. Mich. 2011); Bell v. PSS World Med., Inc., 2012 U.S. Dist. LEXIS 183288 (M.D. Fla. Dec. 7, 2012); Culbreth v. Wash. Metro. Area Transit Auth., 2012 U.S. Dist. LEXIS 37335 (D. Md. Mar. 19, 201
42 U.S.C. § 2000ff(3)
Lee v. City of Moraine Fire Dep’t, 2014 U.S. Dist. LEXIS 61385, at *16 (S.D. Ohio May 2, 2014)
Poore v. Peterbilt of Bristol, L.L.C., 852 F. Supp. 2d 727, 730–31 (W.D. Va. 2012)
Slaughter, L.: Genetic information non-discrimination act. Harv. J. Legis. 50, 41 (2013)
For the study of bioethical issues PC. Privacy and progress in whole genome sequencing (2012)
California Genetic Information Nondiscrimination Act (2011). http://geneticprivacynetwork.org/about-calgina/. Accessed 11 Jul 2014
Alaska Genetic Information Nondiscrimination Act (2014). http://doa.alaska.gov/dop/fileadmin/Equal_Employment/pdf/EEOP_Policy_Statement.pdf. Accessed 11 Mar 2015
Prince, A.E.R.: Comprehensive protection of genetic information. Brooklyn Law Rev. 79, 175–227 (2013)
Lindor, N.M.: Personal autonomy in the genomic era. In: Video Proceedings of Mayo Clinic Individualizing Medicine Conference (2012)
Khan, A., Capps, B.J., Sum, M.Y., et al.: Informed consent for human genetic and genomic studies: a systematic review. Clin. Genet. 86, 199–206 (2014)
Wolf, S.M., Crock, B.N., Van Ness, B., et al.: Managing incidental findings and research results in genomic research involving biobanks and archived data sets. Genet. Med. 14, 361–384 (2012)
Rodriguez, L.L., Brooks, L.D., Greenberg, J.H., et al.: The complexities of genomic identifiability. Science 339, 275–276 (2013)
Ball, M.P., Bobe, J.R., Chou, M.F., et al.: Harvard personal genome project: lessons from participatory public research. Genome Med. 6, 10 (2014)
Naveed, M., Ayday, E., Clayton, E.W., et al.: Privacy and security in the genomic era. Published Online First: 8 May 2014. http://arxiv.org/abs/1405.1891. Accessed 11 Aug 2014
Lin, Z., Owen, A.B., Altman, R.B. Genetics. Genomic research and human subject privacy. Science 305, 183 (2004). doi:10.1126/science.1095019
Ayday, E., De Cristofaro, E., Hubaux, J.-P., et al. Whole genome sequencing: revolutionary medicine or privacy nightmare? Computer (Long Beach Calif) 48, 58–66 (2015). doi:10.1109/MC.2015.59
Erlich, Y., Narayanan, A.: Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15, 409–421 (2014). doi:10.1038/nrg3723
Lauter, K., Lopez-Alt, A., Naehrig, M.: Private computation on encrypted genomic data. In: 14th Privacy Enhancing Technologies Symposium, Workshop on Genome Privacy. http://seclab.soic.indiana.edu/GenomePrivacy/papers/Genome%20Privacy-paper9.pdf. (2014). 29 July 2014, date last accessed
Bos, J.W., Lauter, K., Naehrig, M.: Private predictive analysis on encrypted medical data. J. Biomed. Inform. 50, 234–243 (2014). doi:10.1016/j.jbi.2014.04.003
Cheon, J.H., Kim, M., Lauter, K.: Homomorphic computation of edit distance. In: WAHC’15 - 3rd Workshop on Encrypted Computing and Applied Homomorphic Cryptography (2015)
Homomorphic_Encryption.: http://en.wikipedia.org/w/index.php?title=Homomorphic_encryption%3Doldid=653811034 (2015). Accessed 29 Mar 2015
Check Hayden, E.: Cloud cover protects gene data. Nature 519, 400–401 (2015). doi:10.1038/519400a
Ayday, E., Raisaro, J.L., Hengartner, U., et al.: Privacy-preserving processing of raw genomic data. Data Priv. Manag. Auton. Spontaneous Secur. 8247, 133–147 (2014). http://infoscience.epfl.ch/record/187573. Accessed 31 Mar 2015
Huang, Z., Ayday, E., Fellay, J., et al.: GenoGuard: protecting genomic data against brute-force attacks. In: 36th IEEE Symposium on Security and Privacy (S&P 2015), San Jose (2015). http://infoscience.epfl.ch/record/206772. Accessed 31 Mar 2015
Danezis, G.: Simpler protocols for privacy-preserving disease susceptibility testing. In: 14th Privacy Enhancing Technologies Symposium, Workshop on Genome Privacy (GenoPri’14), Amsterdam (2014)
Djatmiko, M., Friedman, A., Boreli, R., et al.: Secure evaluation protocol for personalized medicine. In: 14th Privacy Enhancing Technologies Symposium, Workshop on Genome Privacy (GenoPri’14), Amsterdam (2014)
Lu, W., Yamada, Y., Sakuma, J.: Efficient secure outsourcing of genome-wide association studies. In: 2nd International Workshop on Genome Privacy and Security (GenoPri’15), San Jose (2015)
Duverle, D., Kawasaki, S., Yamada, Y., et al.: Privacy-preserving statistical analysis by exact logistic regression. In: 2nd International Workshop on Genome Privacy and Security (GenoPri’15), San Jose (2015)
Kantarcioglu, M., Jiang, W., Liu, Y., et al.: A cryptographic approach to securely share and query genomic sequences. IEEE Trans. Inf. Technol. Biomed. 12, 606–617 (2008). doi:10.1109/TITB.2007.908465
Malin, B.A.: Protecting genomic sequence anonymity with generalization lattices. Methods Inf. Med. 44, 687–692 (2005). http://www.ncbi.nlm.nih.gov/pubmed/16400377. Accessed 12 Jan 2012
Loukides, G., Gkoulalas-Divanis, A., Malin, B.: Anonymization of electronic medical records for validating genome-wide association studies. Proc. Natl. Acad. Sci. U. S. A. 107, 7898–7903 (2010). doi:10.1073/pnas.0911686107
Yu, F., Rybar, M., Uhler, C., et al.: Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases. In: Domingo-Ferrer, J., (ed.) Privacy in Statistical Databases, pp. 170–184. Springer, Cham (2010). doi:10.1007/978-3-540-87471-3
Wang, S., Mohammed, N., Chen, R.: Differentially private genome data dissemination through top-down specialization. BMC Med. Inform. Decis. Mak. 14, S2 (2014). doi:10.1186/1472-6947-14-S1-S2
Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of the 19th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining - KDD ’13, p. 1079. ACM Press, New York (2013). doi:10.1145/2487575.2487687
Uhler, C., Slavkovic, A.B., Fienberg, S.E.: Privacy-preserving data sharing for genome-wide association studies. J. Priv. Confidentiality 5, 137–166 (2013)
Yu, F., Fienberg, S.E., Slavkovic, A.B., et al.: Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. 50, 133–141 (2014). doi:10.1016/j.jbi.2014.01.008
Yu, F., Ji, Z.: Scalable privacy-preserving data sharing methodology for genome-wide association studies: an application to iDASH healthcare privacy protection challenge. BMC Med. Inform. Decis. Mak. 14, S3 (2014). doi:10.1186/1472-6947-14-S1-S3
De Cristofaro, E.: Genomic privacy and the rise of a new research community. IEEE Secur. Priv. 12, 80–83 (2014). doi:10.1109/MSP.2014.24
2nd International Workshop on Genome Privacy and Security (GenoPri 2015). http://www.genopri.org/. Accessed 30 Mar 2015
Ohno-Machado, L., Bafna, V., Boxwala, A.A., et al.: iDASH: integrating data for analysis, anonymization, and sharing. J. Am. Med. Inform. Assoc. 19, 196–201 (2012)
Jiang, X., Zhao, Y., Wang, X., et al.: A community assessment of privacy preserving techniques for human genomes. BMC Med. Inform. Decis. Mak. 14(Suppl 1), S1 (2014). doi:10.1186/1472-6947-14-S1-S1
Acknowledgements
This work was funded by NHGRI (K99HG008175), NLM (R00LM011392, R21LM012060), and NHLBI (U54HL108460).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Wang, S., Jiang, X., Fox, D., Ohno-Machado, L. (2015). Preserving Genome Privacy in Research Studies. In: Gkoulalas-Divanis, A., Loukides, G. (eds) Medical Data Privacy Handbook. Springer, Cham. https://doi.org/10.1007/978-3-319-23633-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-23633-9_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23632-2
Online ISBN: 978-3-319-23633-9
eBook Packages: Computer ScienceComputer Science (R0)