Abstract
Privacy-preserving data publishing is a mechanism for sharing data while ensuring that the privacy of individuals is preserved in the published data, and utility is maintained for data mining and analysis. There is a huge need for sharing genomic data to advance medical and health researches. However, since genomic data is highly sensitive and the ultimate identifier, it is a big challenge to publish genomic data while protecting the privacy of individuals in the data. In this paper, we address the aforementioned challenge by presenting an approach for privacy-preserving genomic data publishing via differentially-private suffix tree. The proposed algorithm uses a top-down approach and utilizes the Laplace mechanism to divide the raw genomic data into disjoint partitions, and then normalize the partitioning structure to ensure consistency and maintain utility. The output of our algorithm is a differentially-private suffix tree, a data structure most suitable for efficient search on genomic data. We experiment on real-life genomic data obtained from the Human Genome Privacy Challenge project, and we show that our approach is efficient, scalable, and achieves high utility with respect to genomic sequence matching count queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Human genome privacy protection challenge
Health insurance portability and accountability act (hipaa) (1996)
Genetic information nondiscrimination act (gena) (2008)
Akgün, M., Bayrak, A.O., Ozer, B., Sağıroğlu, M.Ş.: Privacy preserving processing of genomic data: a survey. J. Biomed. Inform. 56, 103–111 (2015)
Bhaskar, R., Laxman, S., Smith, A., Thakurta, A.: Discovering frequent patterns in sensitive data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 503–512. ACM (2010)
Bonomi, L., Xiong, L.: A two-phase algorithm for mining sequential patterns with differential privacy. In: Proceedings of the 22Nd ACM CIKM, pp. 269–278 (2013)
Chen, R., Fung, B.C.M., Desai, B.C., Sossou, N.M.: Differentially private transit data publication: a case study on the montreal transportation system. In: Proceedings of the 18th ACM SIGKDD on KDD, pp. 213–221 (2012)
Dwork, C.: Differential privacy. In ICALP, pp. 1–12 (2006)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In TCC (2006)
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Fienberg, S.E., Slavkovic, A., Uhler, C.: Privacy preserving GWAS data sharing. In: IEEE International Conference on Data Mining Workshops, pp. 628–635 (2011)
Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)
Giegerich, R., Kurtz, S.: From ukkonen to mccreight and weiner: a unifying view of linear-time suffix tree construction. Algorithmica 19(3), 331–353 (1997)
Goodrich, M.T.: The mastermind attack on genomic data (2009)
Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339, 321–324 (2013)
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3, 1021–1032 (2010)
Homer, N., et al.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays (2006)
Huang, Z.: Privacy preserving algorithms for genomic data
Jiang, X., et al.: A community assessment of privacy preserving techniques for human genomes. BMC Med. Inform. Decis. Making 14(Suppl 1), S1 (2014)
Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1079–1087 (2013)
Li, Y.D., Zhang, Z., Winslett, M., Yang, Y.: Compressive mechanism: utilizing sparse representation in differential privacy. In: Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society, pp. 177–182 (2011)
Frank D. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the SIGMOD 2009, pp. 19–30 (2009)
Naveed, M., et al.: Privacy in the genomic era. ACM Comput. Surv. 48(1), 6:1–6:44 (2015)
Rodriguez, L.L., Brooks, L.D., Greenberg, J.H., Green, E.D.: The complexities of genomic identifiability
Roozgard, A., Barzigar, N., Verma, P.K., Cheng, S.: Genomic data privacy protection using compressed sensing. Trans. Data Privacy 9(1)–13 (2016)
Uhlerop, C., Slavković, A., Fienberg, S.E.: Privacy-preserving data sharing for genome-wide association studies. J. Priv. Confidentiality 5(1), 137 (2013)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Wang, R., Li, Y.F., Wang, X.F., Tang, H., Zhou, X.: Learning your identity and disease from research papers: information leaks in genome wide association study (2009)
Wang, S., Mohammed, N., Chen, R.: Differentially private genome data dissemination through top-down specialization. BMC Med. Inform. Decis. Making 14(1), S2 (2014)
Weiner, P.: Linear pattern matching algorithms. In: SWAT 1973, pp. 1–11 (1973)
Yu, F., Fienberg, S.E., Slavković, A.B., Uhler, C.: Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. 50, 133–141 (2014)
Acknowledgement
This research was partially supported by Forsta, Inc (www.forsta.io).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Khatri, T., Dagher, G.G., Hou, Y. (2019). Privacy-Preserving Genomic Data Publishing via Differentially-Private Suffix Tree. In: Chen, S., Choo, KK., Fu, X., Lou, W., Mohaisen, A. (eds) Security and Privacy in Communication Networks. SecureComm 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 304. Springer, Cham. https://doi.org/10.1007/978-3-030-37228-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-37228-6_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37227-9
Online ISBN: 978-3-030-37228-6
eBook Packages: Computer ScienceComputer Science (R0)