Abstract
The decreasing cost of DNA-sequencing empowers high availability of genetic-oriented services, which further promote growing number of genomes and traits of individuals being accessible online. Notoriously, these data are sensitive and may further lead to more sensitive data leakage. In this paper, we formulate the trait and genotype inference problem and develop an efficient inference method based on factor graph and belief propagation. An adversary then can infer the potential traits and genotypes of the victims whose portions of data are observed, depending on trait/SNP associations available from GWAS catalog. To protect against such inference attacks, we detail privacy and utility metrics then propose a genomic data-sanitization method that can effectively tradeoff genomic data openness and privacy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
The NHGRI-EBI catalog of published genome-wide association studies. https://www.ebi.ac.uk/gwas/docs/about
Ayday, E., Cristofaro, E.D., Hubaux, J., Tsudik, G.: The chills and thrills of whole genome sequencing (2013). CoRR abs/1306.1264
Cai, Z., He, Z., Guan, X., Li, Y.: Collective data-sanitization for preventing sensitive information inference attacks in social networks. IEEE Trans. Dependable Secur. Comput. PP(99), 1 (2016)
Collins, F.S., Hamburg, M.A.: First FDA authorization for next-generation sequencer. New Engl. J. Med. 369(25), 2369–2371 (2013)
Erlich, Y., Narayanan, A.: Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15(6), 409–421 (2014)
Fishelson, M., Geiger, D.: Exact genetic linkage computations for general pedigrees. Bioinformatics 18, S189 (2002)
Guo, X., Zhang, J., Cai, Z., Du, D.-Z., Pan, Y.: DAM: a Bayesian method for detecting genome-wide associations on multiple diseases. In: Harrison, R., Li, Y., Măndoiu, I. (eds.) ISBRA 2015. LNCS, vol. 9096, pp. 96–107. Springer, Cham (2015). doi:10.1007/978-3-319-19048-8_9
Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013)
Han, M., Li, J., Cai, Z., Han, Q.: Privacy reserved influence maximization in GPS-enabled cyber-physical and online social networks. In: 2016 IEEE International Conferences on Social Computing and Networking (SocialCom), pp. 284–292. IEEE (2016)
He, Z., Cai, Z., Han, Q., Tong, W., Sun, L., Li, Y.: An energy efficient privacy-preserving content sharing scheme in mobile social networks. Pers. Ubiquit. Comput. 20(5), 833–846 (2016)
He, Z., Cai, Z., Sun, Y., Li, Y., Cheng, X.: Customized privacy preserving for inherent data and latent data. Pers. Ubiquit. Comput. 21(1), 43–54 (2017)
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J., Abecasis, G.R.: Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44(8), 955–959 (2012)
Humbert, M., Ayday, E., Hubaux, J.P., Telenti, A.: Reconciling utility with privacy in genomics. In: Proceedings of the 13th Workshop on Privacy in the Electronic Society, WPES 2014, pp. 11–20. ACM (2014)
Humbert, M., Ayday, E., Hubaux, J.P., Telenti, A.: Addressing the concerns of the lacks family: quantification of kin genomic privacy. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, pp. 1141–1152. ACM (2013)
Humbert, M., Huguenin, K., Hugonot, J., Ayday, E., Hubaux, J.P.: De-anonymizing genomic databases using phenotypic traits. Proc. Priv. Enhanc. Technol. 2015(2), 99–114 (2015)
Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, pp. 1079–1087. ACM, New York (2013)
Lauritzen, S.L., Sheehan, N.A.: Graphical models for genetic analyses. Stat. Sci. 18, 489–514 (2003)
Marchini, J., Howie, B.: Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11(7), 499–511 (2010)
Nyholt, D.R., Yu, C.-E., Visscher, P.M.: On Jim Watson’s APOE status: genetic information is hard to hide. Eur. J. Hum. Genet. 17(2), 147–149 (2009)
O’Connell, J., Sharp, K., Shrine, N., Wain, L., Hall, I., Tobin, M., Zagury, J.F., Delaneau, O., Marchini, J.: Haplotype estimation for biobank-scale data sets. Technical report, Nature Publishing Group (2016)
Sviridenko, M.: A note on maximizing a submodular set function subject to a knapsack constraint. Oper. Res. Lett. 32(1), 41–43 (2004)
Wang, Y., Wu, X., Shi, X.: Using aggregate human genome data for individual identification. In: 2013 IEEE International Conference on Bioinformatics and Biomedicine, pp. 410–415, December 2013
Zhang, L., Cai, Z., Wang, X.: Fakemask: a novel privacy preserving approach for smartphones. IEEE Trans. Netw. Serv. Manag. 13(2), 335–348 (2016)
Zhang, L., Pan, Q., Wu, X., Shi, X.: Building Bayesian networks from GWAS statistics based on independence of causal influence. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 529–532, December 2016
Zheng, X., Cai, Z., Li, J., Gao, H.: Location-privacy-aware review publication mechanism for local business service systems. In: The 36th Annual IEEE International Conference on Computer Communications (INFOCOM) (2017)
Acknowledgments
This work is partly supported by the National Science Foundation (NSF) of China under grant 61632010, 61602129.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
He, Z., Li, Y., Li, J., Yu, J., Gao, H., Wang, J. (2017). Addressing the Threats of Inference Attacks on Traits and Genotypes from Individual Genomic Data. In: Cai, Z., Daescu, O., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2017. Lecture Notes in Computer Science(), vol 10330. Springer, Cham. https://doi.org/10.1007/978-3-319-59575-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-59575-7_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59574-0
Online ISBN: 978-3-319-59575-7
eBook Packages: Computer ScienceComputer Science (R0)