Skip to main content
Log in

Secure Bayesian model averaging for horizontally partitioned data

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

When multiple data owners possess records on different subjects with the same set of attributes—known as horizontally partitioned data—the data owners can improve analyses by concatenating their databases. However, concatenation of data may be infeasible because of confidentiality concerns. In such settings, the data owners can use secure computation techniques to obtain the results of certain analyses on the integrated database without sharing individual records. We present secure computation protocols for Bayesian model averaging and model selection for both linear regression and probit regression. Using simulations based on genuine data, we illustrate the approach for probit regression, and show that it can provide reasonable model selection outputs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD on Management of Data, pp. 439–450 (2000)

    Chapter  Google Scholar 

  • Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Barbieri, M., Berger, J.: Optimal predictive model selection. Ann. Stat. 32(3), 870–897 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Benaloh, J.: Secret sharing homomorphisms: keeping shares of a secret. In: Odlyzko, A. (ed.) Advances in Cryptography: CRYPTO86, vol. 263, pp. 251–260. Springer, New York (1987)

    Google Scholar 

  • Berger, J., Perichhi, L.: Objective Bayesian methods for model selection: introduction and comparison [with discussion]. In: Lahiri, P. (ed.) Institute of Mathematical Statistics Lecture Notes, Monograph Series, vol. 38, Beachwood Ohio, pp. 135–207 (2001)

    Google Scholar 

  • Berger, J.O., Ghosh, J.K., Mukhopadhyay, N.: An overview of robust Bayesian analysis. J. Stat. Plan. Inference 112, 241–258 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Carlin, B., Chib, S.: Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. B 57, 473–484 (1995)

    MATH  Google Scholar 

  • Chib, S.: Marginal likelihood from the Gibbs output. J. Am. Stat. Assoc. 90, 1313–1321 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Churches, T., Christen, P.: Some methods for blindfolded record linkage. BMC Med. Inf. Decis. Mak. 4, 9 (2004)

    Article  Google Scholar 

  • Clyde, M.: Bayesian model averaging and model search strategies (with discussion). In: Bayesian Statistics 6—Proceedings of the Sixth Valencia International Meeting, pp. 157–185 (1999)

    Google Scholar 

  • Clyde, M.: Model averaging. In: Press, S.J. (ed.) Subjective and Objective Bayesian Statistics: Principles, Models and Applications. Wiley, New York (2002)

    Google Scholar 

  • Clyde, M., George, E.I.: Model uncertainty. Stat. Sci. 19(1), 81–94 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Clyde, M.A.: BAS: Bayesian Adaptive Sampling for Bayesian Model Averaging. R package version 0.90 (2010)

  • Clyde, M.A., Ghosh, J., Littman, M.L.: Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph. Stat. 20(1), 80–101 (2011)

    Article  MathSciNet  Google Scholar 

  • Dellaportas, P., Forster, J.J., Ntzoufras, I.: On Bayesian model and variable selection using MCMC. Stat. Comput. 12(1), 27–36 (2002)

    Article  MATH  Google Scholar 

  • Du, W., Han, Y., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Proceedings of the 4th SIAM International Conference on Data Mining, pp. 222–233 (2004)

    Google Scholar 

  • Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules (invited journal version). Inf. Syst. 29(4), 343–364 (2004)

    Article  Google Scholar 

  • Gelfand, A.E., Smith, A.F.M.: Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85, 398–409 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  • Gelman, A., Meng, X.-L.: Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Stat. Sci. 13, 163–185 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)

    Article  Google Scholar 

  • Ghosh, J., Clyde, M.A.: Rao-Blackwellization for Bayesian variable selection and model averaging in linear and binary regression: a novel data augmentation approach. J. Am. Stat. Assoc. 106(495), 1041–1052 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Ghosh, J., Reiter, J.P., Karr, A.F.: Secure computation with horizontally partitioned data using adaptive regression splines. Comput. Stat. Data Anal. 51, 5813–5820 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Heaton, M., Scott, J.: Bayesian computation and the linear model. In: Chen, M.-H., Dey, D.K., Mueller, P., Sun, D., Ye, K. (eds.) Frontiers of Statistical Decision Making and Bayesian Analysis (2010)

    Google Scholar 

  • Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: a tutorial (with discussion). Stat. Sci. 14(4), 382–401 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Holmes, C.C., Held, L.: Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1, 145–168 (2006)

    Article  MathSciNet  Google Scholar 

  • Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD 2002), Madison, Wisconsin, pp. 24–31 (2002)

    Google Scholar 

  • Karr, A., Lin, X., Sanil, A., Reiter, J.: Secure regressions on distributed databases. J. Comput. Graph. Stat. 14, 263–279 (2005)

    Article  MathSciNet  Google Scholar 

  • Kurgan, L., Cios, K., Tadeusiewicz, R., Ogiela, M., Goodenday, L.: Knowledge discovery approach to automated cardiac spect diagnosis. Artif. Intell. Med. 23(2), 149–169 (2001)

    Article  Google Scholar 

  • Liang, F., Paulo, R., Molina, G., Clyde, M., Berger, J.: Mixtures of g-priors for Bayesian variable selection. J. Am. Stat. Assoc. 103, 410–423 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Lin, X., Clifton, C., Zhu, Y.: Privacy preserving clustering with distributed em mixture modeling. Int. J. Knowl. Inf. Syst. 8(1), 68–81 (2005)

    Article  Google Scholar 

  • Lindell, Y., Pinkas, B.: Privacy-preserving data mining. In: Advances in Cryptology: CRYPTO2000, pp. 36–54. Springer, New York (2000)

    Google Scholar 

  • Meng, X.-L., Wong, W.H.: Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Stat. Sin. 6, 831–860 (1996)

    MathSciNet  MATH  Google Scholar 

  • Raftery, A.E.: Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83, 251–266 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)

    MATH  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MATH  Google Scholar 

  • Slavkovic, A.B., Nardi, Y., Tibbits, M.M.: Secure logistic regression of horizontally and vertically partitioned distributed databases. In: Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on, pp. 723–728 (2007)

    Chapter  Google Scholar 

  • Tierney, L., Kadane, J.: Accurate approximations for posterior moments and marginal densities. J. Am. Stat. Assoc. 81, 82–86 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  • Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 639–644 (2002)

    Google Scholar 

  • Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 206–215 (2003)

    Google Scholar 

  • Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)

    Book  MATH  Google Scholar 

  • Zellner, A.: On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pp. 233–243. North-Holland/Elsevier, Amsterdam (1986)

    Google Scholar 

  • Zellner, A., Siow, A.: Posterior odds ratios for selected regression hypotheses. In: Bayesian Statistics: Proceedings of the First International Meeting held in Valencia (Spain), pp. 585–603 (1980)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joyee Ghosh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghosh, J., Reiter, J.P. Secure Bayesian model averaging for horizontally partitioned data. Stat Comput 23, 311–322 (2013). https://doi.org/10.1007/s11222-011-9312-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-011-9312-6

Keywords

Navigation