skip to main content
10.1145/1653662.1653703acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Privacy-preserving genomic computation through program specialization

Authors Info & Claims
Published:09 November 2009Publication History

ABSTRACT

In this paper, we present a new approach to performing important classes of genomic computations (e.g., search for homologous genes) that makes a significant step towards privacy protection in this domain. Our approach leverages a key property of the human genome, namely that the vast majority of it is shared across humans (and hence public), and consequently relatively little of it is sensitive. Based on this observation, we propose a privacy-protection framework that partitions a genomic computation, distributing the part on sensitive data to the data provider and the part on the pubic data to the user of the data. Such a partition is achieved through program specialization that enables a biocomputing program to perform a concrete execution on public data and a symbolic execution on sensitive data. As a result, the program is simplified into an efficient query program that takes only sensitive genetic data as inputs. We prove the effectiveness of our techniques on a set of dynamic programming algorithms common in genomic computing. We develop a program transformation tool that automatically instruments a legacy program for specialization operations. We also demonstrate that our techniques can greatly facilitate secure multi-party computations on large biocomputing problems.

References

  1. Argo genome browser. http://www.genome.wi.mit.edu/annotation/argo/.Google ScholarGoogle Scholar
  2. Jaligner: java implementation of the smith-waterman algorithm for biological sequence alignement. http://jaligner.sourceforge.net/.Google ScholarGoogle Scholar
  3. Java2xml : A java to xml converter. https://java2xml.dev.java.net/.Google ScholarGoogle Scholar
  4. Genetic variation program. http://www.genome.gov/10001551, 2008.Google ScholarGoogle Scholar
  5. F. E. Allen. Control flow analysis. In Proceedings of a symposium on Compiler optimization, pages 1--19, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J Mol Biol, 215(3):403--410, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  7. S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res, 25(17):3389--3402, Sep 1997.Google ScholarGoogle ScholarCross RefCross Ref
  8. L. O. Andersen. Program analysis and specialization for the c programming language. Phd thesis, Department of Computer Science, University of Copenhagen, May 1994.Google ScholarGoogle Scholar
  9. S. Artzi, A. Kiezun, and N. Shomron. miRNAminer: a tool for homologous microRNA gene search. BMC Bioinformatics, 9:39, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. J. Atallah, F. Kerschbaum, and W. Du. Secure and private sequence comparisons. In WPES '03: Proceedings of the 2003 ACM workshop on Privacy in the electronic society, pages 39--44, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W.-Y. Au, D. Weise, and S. Seligman. Generating compiled simulations using partial evaluation. In DAC '93: Proceedings of the 28th Design Automation Conference, pages 205--210, New York, NY, USA, 1991. IEEE Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. J. Badros. Javaml: a markup language for java source code. In Proceedings of the 9th international World Wide Web conference on Computer networks: the international journal of computer and telecommunications netowrking, pages 159--177, Amsterdam, The Netherlands, The Netherlands, 2000. North-Holland Publishing Co. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. E. Bell and L. J. LaPadula. Secure computer systems: Mathematical foundations. Technical Report ESD-TR-73-278, Hanscom AFB, Bed-ford, Mass., November 1973.Google ScholarGoogle Scholar
  14. R. Bellman. Dynamic programming. Science, 153(3731):34-37, 1966.Google ScholarGoogle ScholarCross RefCross Ref
  15. F. Bruekers, S. Katzenbeisser, K. Kursawe, and P. Tuyls. Privacy-preserving matching of dna profiles. Technical Report Report 2008/203, ACR Cryptology ePrint Archive, 2008.Google ScholarGoogle Scholar
  16. D. Brumley and D. Song. Privtrans: Automatically partitioning programs for privilege separation. In Proceedings of the 13th USENIX Security Symposium, August 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. E. Castellana, S. H. Payne, Z. Shen, M. Stanke, V. Bafna, and S. P. Briggs. Discovery and revision of Arabidopsis genes by proteogenomics. Proc. Natl. Acad. Sci. U.S.A., 105:21034-21038, Dec 2008.Google ScholarGoogle ScholarCross RefCross Ref
  18. S. Chong, J. Liu, A. C. Myers, X. Qi, K. Vikram, L. Zheng, and X. Zheng. Secure web application via automatic partitioning. In SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pages 31--44, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Consel and O. Danvy. Tutorial notes on partial evaluation. In POPL '93: Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 493--501, New York, NY, USA, 1993. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Consel and S. C. Khoo. Semantics-directed generation of a prolog compiler. Sci. Comput. Program., 21(3):263--291, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Crochemore, G. M. Landau, and M. Ziv-Ukelson. A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. In 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 02), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. A. de Carvalho Junior. Neobio - bioinformatics algorithms in java. http://neobio.sourceforge.net/.Google ScholarGoogle Scholar
  23. D. E. Denning. A lattice model of secure information flow. Commun. ACM, 19(5):236--243, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Domingo-Ferrer, editor. Inference control in statistical databases: From theory to practice. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Dutertre and L. Moura. The YICES SMT Solver. http://yices.csl.sri.com/, as of 2008.Google ScholarGoogle Scholar
  26. R. C. Edgar and S. Batzoglou. Multiple sequence alignment. Current Opinion in Structural Biology, 16(3):368--373, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  27. R. Gibbs. The international hapmap project. Nature (London), 426:789, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  28. R. Glück and J. Jorgensen. Efficient multi-level generating extensions for program specialization. In PLILPS '95: Proceedings of the 7th International Symposium on Programming Languages: Implementations, Logics and Programs, pages 259--278, London, UK, 1995. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. O. Goldreich, S.Micali, and A.Wigderson. How to play any mental game. In STOC, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. O. Gotoh. An improved algorithm for matching biological sequences. J Mol Biol, 162(3):705--708, December 1982.Google ScholarGoogle ScholarCross RefCross Ref
  31. V. Goyal, S. K. Gupta, and A. Gupta. A unified audit expression model for auditing sql queries. In Proceeedings of the 22nd annual IFIP WG 11.3 working conference on Data and Applications Security, pages 33--47, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. Gupta, S. Tanner, N. Jaitly, J. N. Adkins, M. Lipton, R. Edwards, M. Romine, A. Osterman, V. Bafna, R. D. Smith, and P. A. Pevzner. Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation. Genome Res., 17:1362--1377, Sep 2007.Google ScholarGoogle ScholarCross RefCross Ref
  33. J. N. Hirschhorn and M. J. Daly. Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics, 6(2):95--108, February 2005.Google ScholarGoogle ScholarCross RefCross Ref
  34. S. Jha, L. Kruger, and V. Shmatikov. Towards practical privacy for genomic computation. In 2008 IEEE Symposium on Security and Privacy, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. N. Jones, C. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program Generation, C.A.R. Hoare Series. Prentice-Hall, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. N. D. Jones, P. Sestoft, and H. Sondergaard. An experiment in partial evaluation: the generation of a compiler generator. In Proc. of the first international conference on Rewriting techniques and applications, pages 124--140, New York, NY, USA, 1985. Springer-Verlag New York, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Jorgensen. Generating a compiler for a lazy language by partial evaluation. In POPL '92: Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 258--268, New York, NY, USA, 1992. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. O. Keller, F. Odronitz, M. Stanke, M. Kollmar, and S. Waack. Scipio: using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species. BMC Bioinformatics, 9:278, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  39. W. J. Kent, C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and D. Haussler. The human genome browser at ucsc. GENOME RESEARCH, 25(6):996--1006, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  40. K. Kenthapadi, N. Mishra, and K. Nissim. Simulatable auditing. In PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 118--127, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. C. Khoo and R. S. Sundaresh. Compiling inheritance using partial evaluation. In PEPM '91: Proceedings of the 1991 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation, pages 211--222, New York, NY, USA, 1991. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. C. King. Symbolic execution and program testing. Commun. ACM, 19(7):385--394, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. L. Kruglyak and D. Nickerson. Variation is the spice of life. Nat. Genet., 27:234--236, Mar 2001.Google ScholarGoogle ScholarCross RefCross Ref
  44. N. Li and T. Li. t-closeness: Privacy beyond k-anonymity and âĎŞ-diversity. In In Proceedings of IEEE International Conference on Data Engineering, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  45. B. Ma, J. Tromp, and M. Li. Patternhunter: faster and more sensitive homology search. Bioinformatics, 18(3):440--445, Mar 2002.Google ScholarGoogle ScholarCross RefCross Ref
  46. A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data, 1(1):3, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. B. Malin. Protecting dna sequence anonymity with generalization lattices. Technical Report CMU-ISRI-04-134, Carnegie Mellon University, As of October 2007.Google ScholarGoogle Scholar
  48. E. M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23(2):262--272, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. T. Mogensen. The appliation of partial evaluation to ray-tracing. Master thesis, DIKU, University of Copenhagen, 1986.Google ScholarGoogle Scholar
  50. R. Motwani, S. Nabar, and D. Thomas. Auditing a batch of sql queries. Data Engineering Workshop, 2007. IEEE 23th International Conference on, pages 186--191, April 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. R. Motwani, S. Nabar, and D. Thomas. Auditing sql queries. Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, pages 287--296, April 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. A. C. Myers. Jflow: Practical mostly-static information flow control. In In Proc. 26th ACM Symp. on Principles of Programming Languages (POPL, pages 228--241, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. A. C. Myers and B. Liskov. Protecting privacy using the decentralized label model. ACM Trans. Softw. Eng. Methodol., 9(4):410--442, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. E. W. Myers and W. Miller. Optimal alignments in linear space. CABIOS, 4:11--17, 1988.Google ScholarGoogle Scholar
  55. S. U. Nabar, B. Marthi, K. Kenthapadi, N. Mishra, and R. Motwani. Towards robustness in query auditing. In VLDB '06: Proceedings of the 32nd international conference on Very large data bases, pages 151--162. VLDB Endowment, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. W. C. Needleman SB. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol, 48(3):443--453, 1970.Google ScholarGoogle ScholarCross RefCross Ref
  57. V. Nirkhe and W. Pugh. Partial evaluation of high-level imperative programming languages with applications in hard real-time systems. In POPL '92: Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 269--280, New York, NY, USA, 1992. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. G. Pavesi, F. Zambelli, C. Caggese, and G. Pesole. Exalign: a new method for comparative analysis of exon-intron gene structures. Nucleic Acids Res., 36:e47, May 2008.Google ScholarGoogle ScholarCross RefCross Ref
  59. M. Poletto, W. C. Hsieh, D. R. Engler, and M. F. Kaashoek. C and tcc: a language and compiler for dynamic code generation. ACM Trans. Program. Lang. Syst., 21(2):324--369, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. T. W. Reps and T. Turnidge. Program specialization via program slicing. In Selected Papers from the Internaltional Seminar on Partial Evaluation, pages 409--429, London, UK, 1996. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. R. G. Sadygov, D. Cociorva, and J. R. Yates. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat. Methods, 1:195--202, Dec 2004.Google ScholarGoogle ScholarCross RefCross Ref
  62. U. P. Schultz, J. L. Lawall, C. Consel, and G. Muller. Towards automatic specialization of java programs. In ECOOP '99: Proceedings of the 13th European Conference on Object-Oriented Programming, pages 367--390, London, UK, 1999. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. S. Schwartz, W. J. Kent, A. Smit, Z. Zhang, R. Baertsch, R. C. Hardison, D. Haussler, and W. Miller. Human-mouse alignments with blastz. Genome Res, 13(1):103--107, Jan 2003.Google ScholarGoogle ScholarCross RefCross Ref
  64. W. M. Smith TF. Identification of common molecular subsequences. J Mol Biol, 147:195, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  65. E. Szajda, M. Pohl, J. Owen, and B. Lawson. Toward a practical data privacy scheme for a distributed implementation of the smith-waterman genome sequence comparison algrotihm. In Proceedings of the 12th Annual Network and Distributed System Security Symposium (NDSS 06), 2006.Google ScholarGoogle Scholar
  66. T. A. Tatusova and T. L. Madden. Blast 2 sequences - a new tool for comparing protein and nucleotide sequences. FEMS Microbiology Letters, 174:247HH250, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  67. D. Tsur, S. Tanner, E. Zandi, V. Bafna, and P. A. Pevzner. Identification of post-translational modifications by blind search of mass spectra. Nat. Biotechnol., 23:1562--1567, Dec 2005.Google ScholarGoogle ScholarCross RefCross Ref
  68. R. Wang, X. Wang, Z. Li, H. Tang, M. K. Reiter, and Z. Dong. Privacy-preserving genomic computation through program specialization. Technical Report IUCS-TR679, Indiana University, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. A. Yao. How to generate and exchange secrets. In FOCS, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Privacy-preserving genomic computation through program specialization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CCS '09: Proceedings of the 16th ACM conference on Computer and communications security
          November 2009
          664 pages
          ISBN:9781605588940
          DOI:10.1145/1653662

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 November 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,261of6,999submissions,18%

          Upcoming Conference

          CCS '24
          ACM SIGSAC Conference on Computer and Communications Security
          October 14 - 18, 2024
          Salt Lake City , UT , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader