Secure Bayesian model averaging for horizontally partitioned data

Ghosh, Joyee; Reiter, Jerome P.

doi:10.1007/s11222-011-9312-6

Secure Bayesian model averaging for horizontally partitioned data

Published: 10 January 2012

Volume 23, pages 311–322, (2013)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Joyee Ghosh¹ &
Jerome P. Reiter²

328 Accesses
7 Citations
Explore all metrics

Abstract

When multiple data owners possess records on different subjects with the same set of attributes—known as horizontally partitioned data—the data owners can improve analyses by concatenating their databases. However, concatenation of data may be infeasible because of confidentiality concerns. In such settings, the data owners can use secure computation techniques to obtain the results of certain analyses on the integrated database without sharing individual records. We present secure computation protocols for Bayesian model averaging and model selection for both linear regression and probit regression. Using simulations based on genuine data, we illustrate the approach for probit regression, and show that it can provide reasonable model selection outputs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD on Management of Data, pp. 439–450 (2000)
Chapter Google Scholar
Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)
Article MathSciNet MATH Google Scholar
Barbieri, M., Berger, J.: Optimal predictive model selection. Ann. Stat. 32(3), 870–897 (2004)
Article MathSciNet MATH Google Scholar
Benaloh, J.: Secret sharing homomorphisms: keeping shares of a secret. In: Odlyzko, A. (ed.) Advances in Cryptography: CRYPTO86, vol. 263, pp. 251–260. Springer, New York (1987)
Google Scholar
Berger, J., Perichhi, L.: Objective Bayesian methods for model selection: introduction and comparison [with discussion]. In: Lahiri, P. (ed.) Institute of Mathematical Statistics Lecture Notes, Monograph Series, vol. 38, Beachwood Ohio, pp. 135–207 (2001)
Google Scholar
Berger, J.O., Ghosh, J.K., Mukhopadhyay, N.: An overview of robust Bayesian analysis. J. Stat. Plan. Inference 112, 241–258 (2003)
Article MathSciNet MATH Google Scholar
Carlin, B., Chib, S.: Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. B 57, 473–484 (1995)
MATH Google Scholar
Chib, S.: Marginal likelihood from the Gibbs output. J. Am. Stat. Assoc. 90, 1313–1321 (1995)
Article MathSciNet MATH Google Scholar
Churches, T., Christen, P.: Some methods for blindfolded record linkage. BMC Med. Inf. Decis. Mak. 4, 9 (2004)
Article Google Scholar
Clyde, M.: Bayesian model averaging and model search strategies (with discussion). In: Bayesian Statistics 6—Proceedings of the Sixth Valencia International Meeting, pp. 157–185 (1999)
Google Scholar
Clyde, M.: Model averaging. In: Press, S.J. (ed.) Subjective and Objective Bayesian Statistics: Principles, Models and Applications. Wiley, New York (2002)
Google Scholar
Clyde, M., George, E.I.: Model uncertainty. Stat. Sci. 19(1), 81–94 (2004)
Article MathSciNet MATH Google Scholar
Clyde, M.A.: BAS: Bayesian Adaptive Sampling for Bayesian Model Averaging. R package version 0.90 (2010)
Clyde, M.A., Ghosh, J., Littman, M.L.: Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph. Stat. 20(1), 80–101 (2011)
Article MathSciNet Google Scholar
Dellaportas, P., Forster, J.J., Ntzoufras, I.: On Bayesian model and variable selection using MCMC. Stat. Comput. 12(1), 27–36 (2002)
Article MATH Google Scholar
Du, W., Han, Y., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Proceedings of the 4th SIAM International Conference on Data Mining, pp. 222–233 (2004)
Google Scholar
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules (invited journal version). Inf. Syst. 29(4), 343–364 (2004)
Article Google Scholar
Gelfand, A.E., Smith, A.F.M.: Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85, 398–409 (1990)
Article MathSciNet MATH Google Scholar
Gelman, A., Meng, X.-L.: Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Stat. Sci. 13, 163–185 (1998)
Article MathSciNet MATH Google Scholar
George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)
Article Google Scholar
Ghosh, J., Clyde, M.A.: Rao-Blackwellization for Bayesian variable selection and model averaging in linear and binary regression: a novel data augmentation approach. J. Am. Stat. Assoc. 106(495), 1041–1052 (2011)
Article MathSciNet MATH Google Scholar
Ghosh, J., Reiter, J.P., Karr, A.F.: Secure computation with horizontally partitioned data using adaptive regression splines. Comput. Stat. Data Anal. 51, 5813–5820 (2007)
Article MathSciNet MATH Google Scholar
Heaton, M., Scott, J.: Bayesian computation and the linear model. In: Chen, M.-H., Dey, D.K., Mueller, P., Sun, D., Ye, K. (eds.) Frontiers of Statistical Decision Making and Bayesian Analysis (2010)
Google Scholar
Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: a tutorial (with discussion). Stat. Sci. 14(4), 382–401 (1999)
Article MathSciNet MATH Google Scholar
Holmes, C.C., Held, L.: Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1, 145–168 (2006)
Article MathSciNet Google Scholar
Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD 2002), Madison, Wisconsin, pp. 24–31 (2002)
Google Scholar
Karr, A., Lin, X., Sanil, A., Reiter, J.: Secure regressions on distributed databases. J. Comput. Graph. Stat. 14, 263–279 (2005)
Article MathSciNet Google Scholar
Kurgan, L., Cios, K., Tadeusiewicz, R., Ogiela, M., Goodenday, L.: Knowledge discovery approach to automated cardiac spect diagnosis. Artif. Intell. Med. 23(2), 149–169 (2001)
Article Google Scholar
Liang, F., Paulo, R., Molina, G., Clyde, M., Berger, J.: Mixtures of g-priors for Bayesian variable selection. J. Am. Stat. Assoc. 103, 410–423 (2008)
Article MathSciNet MATH Google Scholar
Lin, X., Clifton, C., Zhu, Y.: Privacy preserving clustering with distributed em mixture modeling. Int. J. Knowl. Inf. Syst. 8(1), 68–81 (2005)
Article Google Scholar
Lindell, Y., Pinkas, B.: Privacy-preserving data mining. In: Advances in Cryptology: CRYPTO2000, pp. 36–54. Springer, New York (2000)
Google Scholar
Meng, X.-L., Wong, W.H.: Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Stat. Sin. 6, 831–860 (1996)
MathSciNet MATH Google Scholar
Raftery, A.E.: Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83, 251–266 (1996)
Article MathSciNet MATH Google Scholar
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
MATH Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MATH Google Scholar
Slavkovic, A.B., Nardi, Y., Tibbits, M.M.: Secure logistic regression of horizontally and vertically partitioned distributed databases. In: Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on, pp. 723–728 (2007)
Chapter Google Scholar
Tierney, L., Kadane, J.: Accurate approximations for posterior moments and marginal densities. J. Am. Stat. Assoc. 81, 82–86 (1986)
Article MathSciNet MATH Google Scholar
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 639–644 (2002)
Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 206–215 (2003)
Google Scholar
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)
Book MATH Google Scholar
Zellner, A.: On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pp. 233–243. North-Holland/Elsevier, Amsterdam (1986)
Google Scholar
Zellner, A., Siow, A.: Posterior odds ratios for selected regression hypotheses. In: Bayesian Statistics: Proceedings of the First International Meeting held in Valencia (Spain), pp. 585–603 (1980)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, The University of Iowa, Iowa City, USA
Joyee Ghosh
Department of Statistical Science, Duke University, Durham, USA
Jerome P. Reiter

Authors

Joyee Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Jerome P. Reiter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joyee Ghosh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghosh, J., Reiter, J.P. Secure Bayesian model averaging for horizontally partitioned data. Stat Comput 23, 311–322 (2013). https://doi.org/10.1007/s11222-011-9312-6

Download citation

Received: 01 July 2011
Accepted: 28 December 2011
Published: 10 January 2012
Issue Date: May 2013
DOI: https://doi.org/10.1007/s11222-011-9312-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Secure Bayesian model averaging for horizontally partitioned data

Abstract

Access this article

Similar content being viewed by others

Securely Aggregating Testimonies with Threshold Multi-key FHE

Secure Linear Regression Algorithms: A Comparison

VertiBayes: learning Bayesian network parameters from vertically partitioned data with missing values

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Secure Bayesian model averaging for horizontally partitioned data

Abstract

Access this article

Similar content being viewed by others

Securely Aggregating Testimonies with Threshold Multi-key FHE

Secure Linear Regression Algorithms: A Comparison

VertiBayes: learning Bayesian network parameters from vertically partitioned data with missing values

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation