Skip to main content

Private Genome Data Dissemination

  • Chapter
Medical Data Privacy Handbook
  • 2562 Accesses

Abstract

With the rapid advances in genome sequencing technology, the collection and analysis of genome data have been made easier than ever before. In this course, sharing genome data plays a key role in enabling and facilitating significant medical breakthroughs. However, substantial privacy concerns have been raised on genome data dissemination. Such concerns are further exacerbated by several recently discovered privacy attacks. In this chapter, we review some of these privacy attacks on genome data and the current practices for privacy protection. We discuss the existing work on privacy protection strategies for genome data. We also introduce a very recent effort to disseminating genome data while satisfying differential privacy, a rigorous privacy model that is widely adopted for privacy protection. The proposed algorithm splits raw genome sequences into blocks, subdivides the blocks in a top-down fashion, and finally adds noise to counts in order to preserve privacy. It has been empirically shown that it can retain essential data utility to support different genome data analysis tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Naveed, M., Ayday, E., Clayton, E.W., Fellay, J., Gunter, C.A., Hubaux, J.-P., Malin, B.A., Wang, X.F.: Privacy in the Genomic Era. ACM Comput. Surv. to appear

    Google Scholar 

  2. Roche, P.A., Annas, G.J.: DNA testing, banking and genetic privacy. N. Engl. J. Med. 355, 545–546 (2006)

    Article  Google Scholar 

  3. Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J., Pearson, J.V., Stephan, D.A., Nelson, S.F., Craig, D.W.:: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using highdensity SNP genotyping microarrays. PLoS Genet. 4(8), e1000167 (2008)

    Article  Google Scholar 

  4. Wang, R., Li, Y.F., Wang, X.F., Tang, H., Zhou, X.: Learning your identity and disease from research papers: information leaks in genome wide association study. In: Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS), New York, pp. 534–544 (2009)

    Google Scholar 

  5. Goodrich, M.T.: The mastermind attack on genomic data. In: Proceedings of the 30th IEEE Symposium on Security and Privacy (S&P), pp. 204–218 (2009)

    Google Scholar 

  6. Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013)

    Article  Google Scholar 

  7. Rodriguez, L.L., Brooks, L.D., Greenberg, J.H., Green, E.D.: The complexities of genomic identifiability. Science 339(6117), 275–276 (2013)

    Article  Google Scholar 

  8. Health Insurance Portability and Accountability Act of 1996. Public L. No. 104–191, 110 Stat. 1936, 1996. http://www.gpo.gov/fdsys/pkg/PLAW-104publ191/pdf/PLAW-104publ191.pdf.

  9. Zhou, X., Peng, B., Li, Y., Chen, Y.: To release or not to release: evaluating information leaks in aggregate human-genome data. In: Security ESORICS, Leuven, pp. 1–27 (2011)

    Google Scholar 

  10. Weaver, T., Maurer, J., Hayashizaki, Y.: Sharing genomes: an integrated approach to funding, managing and distributing genomic clone resources. Nat. Rev. Genet. 5(11), 861–866 (2004)

    Article  Google Scholar 

  11. Malin, B.A., Sweeney, L.A.: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. J. Biomed. Inform. 37(3), 179–192 (2004)

    Article  Google Scholar 

  12. Presidential Commission for the Study of Bioethical Issues: Privacy and Progress in Whole Genome Sequencing (October) (2012)

    Google Scholar 

  13. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Conference on Theory of Cryptography (TCC), pp. 265–284 (2006)

    Google Scholar 

  14. Caulfield, T., Knoppers, B.: Consent, privacy and research biobanks: policy brief No. 1. Genomics, Public Policy and Society, Genome Canada (2010)

    Google Scholar 

  15. Ogbogu, U., Burningham, S.: Privacy protection and genetic research: where does the public interest lie? Alberta Law Rev. 51(3), 471–496 (2014)

    Google Scholar 

  16. Sweeney, L., Abu, A., Winn, J.: Identifying participants in the personal genome project by name (a re-identification experiment) (2013) [arXiv:1304.7605]

    Google Scholar 

  17. National Institutes of Health, Modifications to Genome-Wide Association Studies (GWAS) Data Access, 28 August 2008

    Google Scholar 

  18. Erlich, Y., Narayanan, A.: Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15(6), 409–21 (2014)

    Article  Google Scholar 

  19. Mailman, M., et al.: The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 39(10), 1181–1186 (2007)

    Article  Google Scholar 

  20. Emam, K.: Data anonymization practices in clinical research: a descriptive study. Health Canada, Access to Information and Privacy Division (2006).

    Google Scholar 

  21. Emam, K.: Methods for the de-identification of electronic health records for genomic research. Genome Med. 3, 25 (2011). doi:10.1186/gm239

    Article  Google Scholar 

  22. Paltoo, D., et al.: Data use under the NIH GWAS data sharing policy and future directions. Nat. Genet. 46, 934–938 (2014)

    Article  Google Scholar 

  23. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10, 557–570 (2002)

    Google Scholar 

  24. Emam, K.: A globally optimal k-anonymity method for the de-identification of health data. J. Am. Med. Inform. Assoc. 16(5), 670–682 (2009)

    Article  MathSciNet  Google Scholar 

  25. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), Article No. 3 (2007)

    Google Scholar 

  26. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: a new privacy measure for data publishing. IEEE Trans. Knowl. Data Eng. 22(7), 943–956 (2010)

    Google Scholar 

  27. Zhang, L., Jajodia, S., Brodsky, A.: Information disclosure under realistic assumptions: privacy versus optimality. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS), pp. 573–583 (2007)

    Google Scholar 

  28. Ganta, S., Kasiviswanathan, S., Smith, A.: Composition attacks and auxiliary information in data privacy. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 265–273 (2008)

    Google Scholar 

  29. Fung, B., Wang, K., Yu, P.: Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Eng. 19(5), 711–725 (2007)

    Article  Google Scholar 

  30. Mohammed, N., Chen, R., Fung, B.C.M., Yu, P.S.: Differentially private data release for data mining. In Proceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 493–501, San Diego, CA (2011)

    Google Scholar 

  31. Terrovitis, M., Mamoulis, N., Kalnis, P.: Local and global recoding methods for anonymizing set-valued data. J. Very Large Data Bases 20(1), 83–106 (2011)

    Article  Google Scholar 

  32. Fan, L., Xiong, L., Sunderam, V.: Differentially private multi-dimensional time-series release for traffic monitoring. In Proceedings of the 27th IFIP WG 11.3 Conference on Data and Applications Security and Privacy (2013)

    Google Scholar 

  33. Loukides, G., Gkoulalas-Divanis, A., Malin, B.:. Anonymization of electronic medical records for validating genome-wide association studies. Proc. Natl. Acad. Sci. U. S. A. 107(17), 7898–7903 (2010)

    Article  Google Scholar 

  34. Heatherly, R., Loukides, G., Denny, J., Haines, J., Roden, D., Malin, B.: Enabling genomic–phenomic association discovery without sacrificing anonymity. PLoS ONE 8(2), e53875 (2013)

    Article  Google Scholar 

  35. Malin, B.A.: Protecting DNA sequences anonymity with generalization lattices. Methods Inf. Med. 12(1), 687–692 (2005)

    Google Scholar 

  36. Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1079–1087 (2013)

    Google Scholar 

  37. Chen, R., Peng, Y., Choi, B., Xu, J., Hu, H.: A private DNA motif finding algorithm. J. Biomed. Inform. 50, 122–132 (2014)

    Article  Google Scholar 

  38. Yu, F., Fienberg, S.E., Slavkovic, A.B., Uhler, C.: Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. 50, 133–141 (2014)

    Article  Google Scholar 

  39. Kantarcioglu, M., Jiang, W., Liu, Y., Malin, B.: A cryptographic approach to securely share and query genomic sequences. IEEE Trans. Inf. Technol. Biomed. 12(5), 606–617 (2008).

    Article  Google Scholar 

  40. Canim, M., Kantarcioglu, M., Malin, B.: Secure management of biomedical data with cryptographic hardware. IEEE Trans. Inf. Technol. Biomed. 16(1), 166–175 (2012)

    Article  Google Scholar 

  41. Malin, B., Benitez, K., Masys, D.: Never too old for anonymity: a statistical standard for demographic data sharing via the hipaa privacy rule. J. Am. Med. Inform. Assoc. 18(1), 3–10 (2011)

    Article  Google Scholar 

  42. Sankararaman, S., Obozinski, G., Jordan, M.I., Halperin, E.: Genomic privacy and limits of individual detection in a pool. Nat. Genet. 41(9), 965–967 (2009)

    Article  Google Scholar 

  43. Malin, B.A.: An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. J. Am. Med. Inform. Assoc. 12(1), 28–34 (2005)

    Article  Google Scholar 

  44. McSherry, F.: Privacy integrated queries. In: Proceedings of the 35th SIGMOD International Conference on Management of Data (SIGMOD), pp. 19–30 (2009)

    Google Scholar 

Download references

Acknowledgements

This article was funded by iDASH (U54HL108460), NHGRI (K99HG 008175), NLM (R00LM011392, R21LM012060), NCBC-linked grant (R01HG007078) and NSERC Discovery Grants (RGPIN-2015-04147).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Noman Mohammed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Mohammed, N., Wang, S., Chen, R., Jiang, X. (2015). Private Genome Data Dissemination. In: Gkoulalas-Divanis, A., Loukides, G. (eds) Medical Data Privacy Handbook. Springer, Cham. https://doi.org/10.1007/978-3-319-23633-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23633-9_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23632-2

  • Online ISBN: 978-3-319-23633-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics