Abstract
A number of databases around the world currently host a wealth of genomic data that is invaluable to researchers conducting a variety of genomic studies. However, patients who volunteer their genomic data run the risk of privacy invasion. In this work, we give a cryptographic solution to this problem: to maintain patient privacy, we propose encrypting all genomic data in the database. To allow meaningful computation on the encrypted data, we propose using a homomorphic encryption scheme.
Specifically, we take basic genomic algorithms which are commonly used in genetic association studies and show how they can be made to work on encrypted genotype and phenotype data. In particular, we consider the Pearson Goodness-of-Fit test, the \(D'\) and \(r^2\)-measures of linkage disequilibrium, the Estimation Maximization (EM) algorithm for haplotyping, and the Cochran-Armitage Test for Trend. We also provide performance numbers for running these algorithms on encrypted data.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Adriana López-Alt—Research conducted while visiting Microsoft Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The running time is linear in the population size for a fixed parameter set. For larger population sizes, parameters need to be increased and performance degrades, but not by a large factor (see Table 1 for a comparison of the running times for two typical parameter sets).
- 2.
1 degree of freedom = 3 genotypes \(-\) 2 alleles.
- 3.
Common choices for the set of weights \({{\varvec{w}}} = (w_0, w_1, w_2)\) are: \({{\varvec{w}}} = (0,1,2)\) for the additive (co-dominant) model, \({{\varvec{w}}} = (0,1,1)\) for the dominant model (\(A\) is dominant over \(a\)), and \({{\varvec{w}}} = (0,0,1)\) for the recessive model (\(A\) is recessive to allele \(a\)).
- 4.
For a bi-allelic gene with alleles \(A\) and \(a\), the value 0 corresponds to the genotype \(AA\), the value 1 corresponds to the genotype \(Aa\) and the value 2 corresponds to the genotype \(aa\).
- 5.
An arithmetic circuit over \(\mathbb {F}_t\) has addition and multiplication gates modulo \(t\).
- 6.
- 7.
The only modification we make to the scheme of López-Alt and Naehrig is removing a step called “relinearization” or “key switching”, needed to make decryption independent of the function that was homomorphically evaluated. In our implementation, decryption depends on the number of homomorphic multiplications that were performed. We make this change for efficiency reasons, as relinearization is very costly.
- 8.
Informally, a function has degree \(D\) if it can be represented as a (possibly multivariate) polynomial of degree \(D\). See Sect. 4.4 for more details.
- 9.
Recall from Sect. 3 that we cannot perform homomorphic divisions.
- 10.
Admittedly, the size of the parameters needed does depend on the magnitude of the genotype and phenotype counts, which can be as large as the size of the population sample. This is because the size of the message encrypted at any given time (i.e. the size of the counts and all the intermediate values in the computation) cannot grow too large relative to the modulus \(q\). Therefore, larger population sizes (and therefore larger counts) require a larger modulus \(q\), which in turn requires a larger dimension \(n\) for security. However, for a fixed parameter set, it is possible to compute an upper bound on the size of the population sample and the homomorphic computations detailed in this work do work correctly for any population sample with size smaller than the given bound.
References
Ayday, E., De Cristofaro, E., Hubaux, J.-P., Tsudik, G.: The Chills and Thrills of Whole Genome Sequencing. Technical report (2013). http://infoscience.epfl.ch/record/186866/files/survey.pdf
Ayday, E., Raisaro, J.L., Hubaux, J.-P.: Personal use of the genomic data: Privacy vs. storage cost. In: Proceedings of IEEE Global Communications Conference, Exhibition and Industry Forum (Globecom) (2013)
Blanton, M., Atallah, M.J., Frikken, K.B., Malluhi, Q.: Secure and efficient outsourcing of sequence comparisons. In: Foresti, S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 505–522. Springer, Heidelberg (2012)
Bosma, W., Cannon, J., Playoust, C.: The magma algebra system. I. The user language. J. Symbolic Comput. 24(3–4), 235–265 (1997). Computational algebra and number theory (London, 1993)
Brakerski, Z., Gentry, C., Vaikuntanathan, V.: Fully homomorphic encryption without bootstrapping. In: ITCS (2012)
Bos, J.W., Lauter, K., Loftus, J., Naehrig, M.: Improved security for a ring-based fully homomorphic encryption scheme. In: Stam, M. (ed.) IMACC 2013. LNCS, vol. 8308, pp. 45–64. Springer, Heidelberg (2013)
Bos, J.W., Lauter, K., Naehrig, M.: Private predictive analysis on encrypted medical data. J. Biomed. Inform. 50, 234–243 (2014). MSR-TR-2013-81
Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical GapSVP. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 868–886. Springer, Heidelberg (2012)
Brakerski, Z., Vaikuntanathan, V.: Efficient fully homomorphic encryption from (standard) LWE. In: Ostrovsky, R. (ed.) FOCS, pp. 97–106. IEEE (2011)
Brakerski, Z., Vaikuntanathan, V.: Fully homomorphic Encryption from Ring-LWE and security for key dependent messages. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 505–524. Springer, Heidelberg (2011)
Brakerski, Z., Vaikuntanathan, V.: Lattice-based FHE as secure as PKE. In: Naor, M. (ed.) ITCS, pp. 1–12. ACM (2014)
Database of Genotypes and Phenotypes (dbGaP). http://www.ncbi.nlm.nih.gov/gap/
De Cristofaro, E., Faber, S., Tsudik, G.: Secure genomic testing with size-and position-hiding private substring matching. In: Proceedings of the 2013 ACM Workshop on Privacy in the Electronic Society (WPES 2013). ACM (2013)
European Bioinformatics Institute. http://www.ebi.ac.uk/ (Accessed 30 October 2013)
Fienberg, S.E., Slavkovic, A., Uhler, C.: Privacy preserving GWAS data sharing. In: 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW), pp. 628–635. IEEE (2011)
Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive 2012, 144 (2012)
Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Mitzenmacher, M. (ed.) STOC, pp. 169–178. ACM (2009)
Graepel, T., Lauter, K., Naehrig, M.: ML confidential: machine learning on encrypted data. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 1–21. Springer, Heidelberg (2013)
Creating a global alliance to enable responsible sharing of genomic and clinical data, White Paper (2013). http://www.broadinstitute.org/files/news/pdfs/GAWhitePaperJune3.pdf
Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013)
Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: conceptually-simpler, asymptotically-faster, attribute-based. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013, Part I. LNCS, vol. 8042, pp. 75–92. Springer, Heidelberg (2013)
Humbert, M., Ayday, E., Hubaux, J.-P., Telenti, A.: Addressing the concerns of the lacks family: quantification of kin genomic privacy. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 1141–1152. ACM (2013)
International cancer genome consortium (ICGC). http://www.icgc.org
International rare diseases research consortium (IRDiRC). http://www.irdirc.org
DNA Data Bank Of Japan. http://www.ddbj.nig.ac.jp/
Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1079–1087. ACM (2013)
Lepoint, T., Naehrig, M.: A comparison of the homomorphic encryption schemes \({\sf {FV}}\) and \({\sf {YASHE}}\). In: Pointcheval, D., Vergnaud, D. (eds.) AFRICACRYPT. LNCS, vol. 8469, pp. 318–335. Springer, Heidelberg (2014)
López-Alt, A., Naehrig, M.: Large integer plaintexts in ring-based fully homomorphic encryption. In preparation (2014)
Lauter, K., Naehrig, M., Vaikuntanathan, V.: Can homomorphic encryption be practical? In: Proceedings of the 3rd ACM Cloud Computing Security Workshop, pp. 113–124. ACM (2011)
López-Alt, A., Tromer, E., Vaikuntanathan, V.: On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Karloff, H.J., Pitassi, T. (eds.) STOC, pp. 1219–1234. ACM (2012)
McCarty, C.A., Chisholm, R.L., Chute, C.G., Kullo, I.J., Jarvik, G.P., Larson, E.B., Li, R., Masys, D.R., Ritchie, M.D., Roden, D.M., et al.: The emerge network a consortium of biorepositories linked toelectronic medical records data for conducting genomic studies. BMC Med. Genomics 4(1), 13 (2011)
Park, M.Y., Hastie, T.: Penalized logistic regression for detecting gene interactions. Biostatistics 9(1), 30–50 (2008)
Stehlé, D., Steinfeld, R.: Making NTRU as secure as worst-case problems over ideal lattices. In: Paterson, K.G. (ed.) EUROCRYPT 2011. LNCS, vol. 6632, pp. 27–47. Springer, Heidelberg (2011)
A map of human genome variation from population-scale sequencing. Nature, 467:1061–1073. http://www.1000genomes.org
van Dijk, M., Gentry, C., Halevi, S., Vaikuntanathan, V.: Fully homomorphic encryption over the integers. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 24–43. Springer, Heidelberg (2010)
Wang, R., Li, Y.F., Wang, X.F., Tang, H., Zhou, X.: Learning your identity and disease from research papers: Information leaks in genome wide association study. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS 2009, pp. 534–544. ACM, New York (2009)
Yasuda, M., Shimoyama, T., Kogure, J., Yokoyama, K., Koshiba, T.: Secure pattern matching using somewhat homomorphic encryption. In: Proceedings of the 2013 ACM Cloud Computing Security Workshop, pp. 65–76. ACM (2013)
Acknowledgments
We thank Tancrède Lepoint for suggesting the encoding in Sect. 4.1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Lauter, K., López-Alt, A., Naehrig, M. (2015). Private Computation on Encrypted Genomic Data. In: Aranha, D., Menezes, A. (eds) Progress in Cryptology - LATINCRYPT 2014. LATINCRYPT 2014. Lecture Notes in Computer Science(), vol 8895. Springer, Cham. https://doi.org/10.1007/978-3-319-16295-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-16295-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16294-2
Online ISBN: 978-3-319-16295-9
eBook Packages: Computer ScienceComputer Science (R0)