ABSTRACT
Recent advances in DNA sequencing technologies have put ubiquitous availability of fully sequenced human genomes within reach. It is no longer hard to imagine the day when everyone will have the means to obtain and store one's own DNA sequence. Widespread and affordable availability of fully sequenced genomes immediately opens up important opportunities in a number of health-related fields. In particular, common genomic applications and tests performed in vitro today will soon be conducted computationally, using digitized genomes. New applications will be developed as genome-enabled medicine becomes increasingly preventive and personalized. However, this progress also prompts significant privacy challenges associated with potential loss, theft, or misuse of genomic data. In this paper, we begin to address genomic privacy by focusing on three important applications: Paternity Tests, Personalized Medicine, and Genetic Compatibility Tests. After carefully analyzing these applications and their privacy requirements, we propose a set of efficient techniques based on private set operations. This allows us to implement in in silico some operations that are currently performed via in vitro methods, in a secure fashion. Experimental results demonstrate that proposed techniques are both feasible and practical today.
- A. Abbott. Special section on human genetics: With your genes? Take one of these, three times a day. Nature, 425(6960), 2003.Google Scholar
- M. Adams et al. The Genome Sequence of Drosophila melanogaster. Science, 287(5461), 2000.Google Scholar
- J. Beckmann and M. Soller. Restriction fragment length polymorphisms and genetic improvement of agricultural species. Euphytica, 35(1), 1986.Google Scholar
- M. Blanton and M. Aliasgari. Secure outsourcing of dna searching via finite automata. In DBSec, 2010. Google ScholarDigital Library
- M. Brandon, D. Wallace, and P. Baldi. Data structures and compression algorithms for genomic sequence data. Bioinformatics, 25(14), 2009. Google ScholarDigital Library
- F. Bruekers, S. Katzenbeisser, K. Kursawe, and P. Tuyls. Privacy-Preserving Matching of DNA Profiles. http://eprint.iacr.org/2008/203, 2008.Google Scholar
- C. Børsting et al. Performance of the SNPforID 52 SNP-plex assay in paternity testing. Forensic Science International: Genetics, 2(4), 2008.Google Scholar
- J. Camenisch and G. Zaverucha. Private intersection of certified sets. In FC, 2009. Google ScholarDigital Library
- B. Carlson. SNPs -- A shortcut to personalized medicine. Genetic Engineering & Biotechnology News, 2008.Google Scholar
- Center for Applied Genomics, University of Toronto. Database of Genomic Variants. http://projects.tcag.ca/variation, 2011.Google Scholar
- F. Collins and V. McKusick. Implications of the Human Genome Project for medical science. Jama, 285(5), 2001.Google Scholar
- L. Cunningham. High-stakes Test. Daily Business Review, 2003.Google Scholar
- K. Daily et al. Data structures and compression algorithms for high-throughput sequencing technologies. BMC bioinformatics, 11(1), 2010.Google Scholar
- G. Danezis et al. Efficient negative databases from cryptographic hash functions. In ISC, 2007. Google ScholarDigital Library
- E. De Cristofaro, J. Kim, and G. Tsudik. Linear-complexity private set intersection protocols secure in malicious model. In Asiacrypt, 2010.Google ScholarCross Ref
- E. De Cristofaro and G. Tsudik. Practical Private Set Intersection Protocols with Linear Complexity. In FC, 2010. Google ScholarDigital Library
- E. De Cristofaro and G. Tsudik. Fast and Private Computation of Set Intersection Cardinality. Cryptology ePrint Archive, 2011.Google Scholar
- N. Dracopoli, J. Haines, and B. Korf. Current protocols in human genetics. John Wiley & Sons, 1994.Google Scholar
- R. Durbin et al. A map of human genome variation from population-scale sequencing. Nature, 467(7319), 2010.Google Scholar
- M. Durham. How Research Will Adapt to HIPAA: A View from Within the Healthcare Delivery System. Am. JL and Med., 28, 2002.Google Scholar
- T. ElGamal. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE transactions on Information Theory, 31(4), 1985.Google Scholar
- D. Endean. RFLP analysis for paternity testing: observations and caveats. In International Symposium on Human Identification, 1989.Google Scholar
- J. Fowler, J. Settle, and N. Christakis. Correlated genotypes in friendship networks. Proceedings of the National Academy of Sciences, 108(5), 2011.Google ScholarCross Ref
- M. Freedman, Y. Ishai, B. Pinkas, and O. Reingold. Keyword search and oblivious pseudorandom functions. In TCC, 2005. Google ScholarDigital Library
- M. Freedman, K. Nissim, and B. Pinkas. Efficient private matching and set intersection. In Eurocrypt, 2004.Google ScholarCross Ref
- Genetics Home Reference. HBB Gene. http://ghr.nlm.nih.gov/gene/HBB.Google Scholar
- R. Gennaro, C. Hazay, and J. Sorensen. Text Search Protocols with Simulation Based Security. In PKC, 2010. Google ScholarDigital Library
- R. Gibbs and A. Singleton. Application of genome-wide single nucleotide polymorphism typing: Simple association and beyond. PLoS Genet, 2(10), 10 2006.Google ScholarCross Ref
- G. Ginsburg and H. Willard. Genomic and personalized medicine: foundations and applications. Translational Research, 154(6), 2009.Google Scholar
- A. Goffeau et al. Life with 6000 Genes. Science, 1996.Google Scholar
- O. Goldreich. Foundations of cryptography: Basic applications, chapter 7.2.2. Cambridge Univ Press, 2004. Google ScholarDigital Library
- O. Goldreich, R. Israel, and V. Rosen. On the security of modular exponentiation with application to the construction of pseudorandom generators. Journal of Cryptology, 16, 2000.Google Scholar
- M. Gordillo et al. The molecular mechanism underlying Roberts syndrome involves loss of ESCO2 acetyltransferase activity. Human molecular genetics, 17(14), 2008.Google Scholar
- J. Gusella et al. A polymorphic DNA marker genetically linked to Huntington's disease. Nature, 306(5940), 1983.Google Scholar
- C. Hazay and Y. Lindell. Efficient protocols for set intersection and pattern matching with security against malicious and covert adversaries. In TCC, 2008. Google ScholarDigital Library
- C. Hazay and T. Toft. Computationally secure pattern matching in the presence of malicious adversaries. Asiacrypt, 2010.Google ScholarCross Ref
- J. Ho, Choi, et al. Replication study of SNP associations for colorectal cancer in Hong Kong Chinese. British Journal of Cancer, 2010.Google Scholar
- M. Hoffman. The genome-enabled electronic medical record. Journal of Biomedical Informatics, 40(1), 2007. Google ScholarDigital Library
- M. Hsi-Yang Fritz, R. Leinonen, G. Cochrane, and E. Birney. Efficient storage of high throughput dna sequencing data using reference-based compression. Genome Research, 21(5), May 2011.Google Scholar
- International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature, 409, 2001.Google Scholar
- S. Jarecki and X. Liu. Fast Secure Computation of Set Intersection. In SCN, 2010. Google ScholarDigital Library
- S. Jha, L. Kruger, and V. Shmatikov. Towards practical privacy for genomic computation. In S&P, 2008. Google ScholarDigital Library
- J. Kaiser. A plan to capture human diversity in 1000 genomes. Science, 319, 2008.Google Scholar
- M. Kantarcioglu, W. Jiang, Y. Liu, and B. Malin. A cryptographic approach to securely share and query genomic sequences. Transactions on Information Technology in Biomedicine, 12(5), 2008. Google ScholarDigital Library
- F. Kastrinos et al. Risk of pancreatic cancer in families with Lynch syndrome. JAMA: The Journal of the American Medical Association, 302(16), 2009.Google ScholarCross Ref
- J. Katz and Y. Lindell. Introduction to modern cryptography. Chapman & Hall/CRC, 2008. Google ScholarDigital Library
- J. Katz and J. Malka. Secure text processing with applications to private dna matching. In CCS, 2010. Google ScholarDigital Library
- L. Kissner and D. Song. Privacy-preserving set operations. In Crypto, 2005. Google ScholarDigital Library
- J. Kulynych and D. Korn. The New HIPAA (Health Insurance Portability and Accountability Act of 1996) Medical Privacy Rule. Circulation, 108, 2003.Google Scholar
- E. Lander. DNA fingerprinting on trial. Nature, 339(6225), 1989.Google Scholar
- V. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet Physics Doklady, volume 10, 1966.Google Scholar
- S. Levy et al. The diploid genome sequence of an individual human. PLoS biology, 5(10), 2007.Google Scholar
- R. Lewis and A. Reynolds. Human genetics: concepts and applications. McGraw-Hill, 2003.Google Scholar
- B. Malin. An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. Journal of the American Medical Informatics Association, 12(1), 2005.Google ScholarCross Ref
- A. McGuire and R. Gibbs. Currents in Contemporary Ethics: Meeting the Growing Demands of Genetic Research. JL Med. & Ethics, 34, 2006.Google Scholar
- V. McKusick and S. Antonarakis. Mendelian inheritance in man: a catalog of human genes and genetic disorders. John Hopkins University Press, 1994.Google Scholar
- A. Menezes, P. Van Oorschot, and S. Vanstone. Handbook of applied cryptography. CRC, 1997. Google ScholarDigital Library
- S. Migueles et al. HLA B* 5701 is highly associated with restriction of virus replication in a subgroup of HIV-infected long term nonprogressors. Proceedings of the National Academy of Sciences, 97(6), 2000.Google Scholar
- National Center for Biotechnology Information (US). Single Nucleotide Polymorphism Database. http://www.ncbi.nlm.nih.gov/projects/SNP/.Google Scholar
- National Center for Biotechnology Information (US). TPMT thiopurine S-methyltransferase. http://1.usa.gov/orAYkF.Google Scholar
- National Center for Biotechnology Information (US). Restriction Fragment Length Polymorphism (RFLP). http://1.usa.gov/pha5sw, 2011.Google Scholar
- NCBI. Genome Mapping. http://1.usa.gov/oWNiYo, 2011.Google Scholar
- A. Prat and J. Baselga. The role of hormonal therapy in the management of hormonal-receptor-positive breast cancer with co-expression of her2. Nature Clinical Practice Oncology, 5(9), 2008.Google Scholar
- ScientificMatch.com. http://scientificmatch.com, 2011.Google Scholar
- R. F. Service. The race for the \$1000 genome. Science, 311, 2006.Google Scholar
- N. Siva. 1000 Genomes project. Nature biotechnology, 26(3), 2008.Google Scholar
- T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147, 1981.Google Scholar
- P. Stenson et al. The human gene mutation database: 2008 update. Genome Medicine, 1(1), 2009.Google Scholar
- The Federal Bureau of Investigation. Combined DNA Index System (CODIS). http://www.fbi.gov/about-us/lab/codis, 2011.Google Scholar
- T. Tokino et al. Isolation and mapping of 62 new RFLP markers on human chromosome 11. American journal of human genetics, 48(2), 1991.Google Scholar
- J. Troncoso-Pastoriza, S. Katzenbeisser, and M. Celik. Privacy preserving error resilient dna searching through oblivious automata. In CCS, 2007. Google ScholarDigital Library
- J. Vaidya and C. Clifton. Secure set intersection cardinality with application to association rule mining. Journal of Computer Security, 13(4), 2005. Google ScholarDigital Library
- M. Wadman. Genetics bill cruises through senate. Nature, 453, 2008.Google Scholar
- J. Wang et al. The diploid genome sequence of an Asian individual. Nature, 456(7218), 2008.Google Scholar
- R. Wang, X. Wang, Z. Li, H. Tang, M. Reiter, and Z. Dong. Privacy-preserving genomic computation through program specialization. In CCS, 2009. Google ScholarDigital Library
- R. Waterston et al. Initial sequencing and comparative analysis of the mouse genome. Nature, 420(6915), 2002.Google Scholar
- A. Weston and L. Hood. Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine. Journal of proteome research, 3(2), 2004.Google Scholar
- D. Wheeler et al. The complete genome of an individual by massively parallel DNA sequencing. Nature, 452(7189), 2008.Google Scholar
- A. Yao. Protocols for secure computations. In FOCS, 1982. Google ScholarDigital Library
- C. Yates et al. Molecular diagnosis of thiopurine S-methyltransferase deficiency: genetic basis for azathioprine and mercaptopurine intolerance. Annals of internal medicine, 126(8), 1997.Google Scholar
Index Terms
- Countering GATTACA: efficient and secure testing of fully-sequenced human genomes
Recommendations
Secure genomic testing with size- and position-hiding private substring matching
WPES '13: Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic societyRecent progress in genomics and bioinformatics is bringing complete and on-demand sequencing of human (and other) genomes closer and closer to reality. Despite exciting new opportunities, affordable and ubiquitous genome sequencing prompts some serious ...
Genodroid: are privacy-preserving genomic tests ready for prime time?
WPES '12: Proceedings of the 2012 ACM workshop on Privacy in the electronic societyAs fast and accurate sequencing of human genomes becomes affordable, it is expected that individuals will soon be able to carry around copies of their sequenced DNA, using it for medical, identification, and social purposes. This will undoubtedly prompt ...
Classifying promoters by interpreting the hidden information of DNA sequences for disease prediction in clinical laboratories using Gaussian decision boundary estimation
A promoter is a brief stretch of DNA (100–1,000 bp) where RNA polymerase starts to transcribe a gene. A DNA (Deoxyribonucleic Acid) base pair is a fundamental unit of DNA structure and represents the pairing of two complementary nucleotide bases within ...
Comments