Skip to main content

Privacy-Preserving Genomic Data Publishing via Differentially-Private Suffix Tree

  • Conference paper
  • First Online:
Security and Privacy in Communication Networks (SecureComm 2019)

Abstract

Privacy-preserving data publishing is a mechanism for sharing data while ensuring that the privacy of individuals is preserved in the published data, and utility is maintained for data mining and analysis. There is a huge need for sharing genomic data to advance medical and health researches. However, since genomic data is highly sensitive and the ultimate identifier, it is a big challenge to publish genomic data while protecting the privacy of individuals in the data. In this paper, we address the aforementioned challenge by presenting an approach for privacy-preserving genomic data publishing via differentially-private suffix tree. The proposed algorithm uses a top-down approach and utilizes the Laplace mechanism to divide the raw genomic data into disjoint partitions, and then normalize the partitioning structure to ensure consistency and maintain utility. The output of our algorithm is a differentially-private suffix tree, a data structure most suitable for efficient search on genomic data. We experiment on real-life genomic data obtained from the Human Genome Privacy Challenge project, and we show that our approach is efficient, scalable, and achieves high utility with respect to genomic sequence matching count queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Human genome privacy protection challenge

    Google Scholar 

  2. Health insurance portability and accountability act (hipaa) (1996)

    Google Scholar 

  3. Genetic information nondiscrimination act (gena) (2008)

    Google Scholar 

  4. Akgün, M., Bayrak, A.O., Ozer, B., Sağıroğlu, M.Ş.: Privacy preserving processing of genomic data: a survey. J. Biomed. Inform. 56, 103–111 (2015)

    Article  Google Scholar 

  5. Bhaskar, R., Laxman, S., Smith, A., Thakurta, A.: Discovering frequent patterns in sensitive data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 503–512. ACM (2010)

    Google Scholar 

  6. Bonomi, L., Xiong, L.: A two-phase algorithm for mining sequential patterns with differential privacy. In: Proceedings of the 22Nd ACM CIKM, pp. 269–278 (2013)

    Google Scholar 

  7. Chen, R., Fung, B.C.M., Desai, B.C., Sossou, N.M.: Differentially private transit data publication: a case study on the montreal transportation system. In: Proceedings of the 18th ACM SIGKDD on KDD, pp. 213–221 (2012)

    Google Scholar 

  8. Dwork, C.: Differential privacy. In ICALP, pp. 1–12 (2006)

    Google Scholar 

  9. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In TCC (2006)

    Google Scholar 

  10. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)

    MathSciNet  MATH  Google Scholar 

  11. Fienberg, S.E., Slavkovic, A., Uhler, C.: Privacy preserving GWAS data sharing. In: IEEE International Conference on Data Mining Workshops, pp. 628–635 (2011)

    Google Scholar 

  12. Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)

    Article  MathSciNet  Google Scholar 

  13. Giegerich, R., Kurtz, S.: From ukkonen to mccreight and weiner: a unifying view of linear-time suffix tree construction. Algorithmica 19(3), 331–353 (1997)

    Article  MathSciNet  Google Scholar 

  14. Goodrich, M.T.: The mastermind attack on genomic data (2009)

    Google Scholar 

  15. Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339, 321–324 (2013)

    Article  Google Scholar 

  16. Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3, 1021–1032 (2010)

    Article  Google Scholar 

  17. Homer, N., et al.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays (2006)

    Google Scholar 

  18. Huang, Z.: Privacy preserving algorithms for genomic data

    Google Scholar 

  19. Jiang, X., et al.: A community assessment of privacy preserving techniques for human genomes. BMC Med. Inform. Decis. Making 14(Suppl 1), S1 (2014)

    Article  MathSciNet  Google Scholar 

  20. Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1079–1087 (2013)

    Google Scholar 

  21. Li, Y.D., Zhang, Z., Winslett, M., Yang, Y.: Compressive mechanism: utilizing sparse representation in differential privacy. In: Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society, pp. 177–182 (2011)

    Google Scholar 

  22. Frank D. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the SIGMOD 2009, pp. 19–30 (2009)

    Google Scholar 

  23. Naveed, M., et al.: Privacy in the genomic era. ACM Comput. Surv. 48(1), 6:1–6:44 (2015)

    Article  Google Scholar 

  24. Rodriguez, L.L., Brooks, L.D., Greenberg, J.H., Green, E.D.: The complexities of genomic identifiability

    Google Scholar 

  25. Roozgard, A., Barzigar, N., Verma, P.K., Cheng, S.: Genomic data privacy protection using compressed sensing. Trans. Data Privacy 9(1)–13 (2016)

    Google Scholar 

  26. Uhlerop, C., Slavković, A., Fienberg, S.E.: Privacy-preserving data sharing for genome-wide association studies. J. Priv. Confidentiality 5(1), 137 (2013)

    Google Scholar 

  27. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MathSciNet  Google Scholar 

  28. Wang, R., Li, Y.F., Wang, X.F., Tang, H., Zhou, X.: Learning your identity and disease from research papers: information leaks in genome wide association study (2009)

    Google Scholar 

  29. Wang, S., Mohammed, N., Chen, R.: Differentially private genome data dissemination through top-down specialization. BMC Med. Inform. Decis. Making 14(1), S2 (2014)

    Article  Google Scholar 

  30. Weiner, P.: Linear pattern matching algorithms. In: SWAT 1973, pp. 1–11 (1973)

    Google Scholar 

  31. Yu, F., Fienberg, S.E., Slavković, A.B., Uhler, C.: Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. 50, 133–141 (2014)

    Article  Google Scholar 

Download references

Acknowledgement

This research was partially supported by Forsta, Inc (www.forsta.io).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gaby G. Dagher .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khatri, T., Dagher, G.G., Hou, Y. (2019). Privacy-Preserving Genomic Data Publishing via Differentially-Private Suffix Tree. In: Chen, S., Choo, KK., Fu, X., Lou, W., Mohaisen, A. (eds) Security and Privacy in Communication Networks. SecureComm 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 304. Springer, Cham. https://doi.org/10.1007/978-3-030-37228-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37228-6_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37227-9

  • Online ISBN: 978-3-030-37228-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics