Abstract
Following the publication of an attack on genome-wide association studies (GWAS) data proposed by Homer et al., considerable attention has been given to developing methods for releasing GWAS data in a privacy-preserving way. Here, we develop an end-to-end differentially private method for solving regression problems with convex penalty functions and selecting the penalty parameters by cross-validation. In particular, we focus on penalized logistic regression with elastic-net regularization, a method widely used to in GWAS analyses to identify disease-causing genes. We show how a differentially private procedure for penalized logistic regression with elastic-net regularization can be applied to the analysis of GWAS data and evaluate our method’s performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Austin, E., Pan, W., Shen, X.: Penalized regression and risk prediction in genome-wide association studies. Statistical Analysis and Data Mining 6(4) (August 2013)
Cho, S., et al.: Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis. BMC Proceedings 3(suppl. 7), S25 (2009)
Homer, N., et al.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP geno-typing microarrays. PLoS Genetics 4(8), e1000167 (2008)
Couzin, J.: Whole-genome data not anonymous, challenging assumptions. Science 321(5894), 1278 (2008)
Zerhouni, E.A., Nabel, E.G.: Protecting aggregate genomic data. Science 322(5898), 44 (2008)
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Uhler, C., Slavkovic, A.B., Fienberg, S.E.: Privacy-preserving data sharing for genome-wide association studies. Journal of Privacy and Confidentiality 5(1), 137–166 (2013)
Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1079–1087 (2013)
Yu, F., et al.: Scalable Privacy-Preserving Data Sharing Methodology for Genome-Wide Association Studies. Journal of Biomedical Informatics (February 2014)
Kifer, D., Smith, A., Thakurta, A.: Private convex empirical risk minimization and high-dimensional regression. Proceedings of Journal of Machine Learning Research - Proceedings Track 23, 25.1–25.40 (2012)
Chaudhuri, K., Vinterbo, S.A.: A stability-based validation procedure for differentially private machine learning. In: Advances in Neural Information Processing Systems, pp. 1–19 (2013)
Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. JMLR 12(7), 1069–1109 (2011)
Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Annals of Statistics 28(5), 1302–1338 (2000)
Wright, F.A., et al.: Simulating association studies: a data-based resampling method for candidate regions or whole genome scans. Bioinformatics 23(19), 2581–2588 (2007)
Malaspinas, A.S., Uhler, C.: Detecting epistasis via Markov bases. Journal of Algebraic Statistics 2(1), 36–53 (2010)
GĂ³mez, E., Gomez-Viilegas, M.A., MarĂn, J.M.: A multivariate generalization of the power exponential family of distributions. Communications in Statistics - Theory and Methods 27(3), 589–600 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Yu, F., Rybar, M., Uhler, C., Fienberg, S.E. (2014). Differentially-Private Logistic Regression for Detecting Multiple-SNP Association in GWAS Databases. In: Domingo-Ferrer, J. (eds) Privacy in Statistical Databases. PSD 2014. Lecture Notes in Computer Science, vol 8744. Springer, Cham. https://doi.org/10.1007/978-3-319-11257-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-11257-2_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11256-5
Online ISBN: 978-3-319-11257-2
eBook Packages: Computer ScienceComputer Science (R0)