Skip to main content
Log in

Comparing two multinomial samples using hierarchical Bayesian models

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

Two-sample statistical tests are commonly used when deciding whether two samples can be considered to be drawn from the same population. However, statistical tests face problems when confronted to situations involving extremely large volumes of data, in which case the power of the test is so high that they reject the null hypothesis even if the differences found in the data are minimal. Furthermore, the fact that they may require to explore the whole sample each time they are applied is a serious limitation, for instance, in streaming data contexts. In this paper, we apply a class of Bayesian models that have been successfully used in streaming data context, to the problem of comparing multinomial populations. The underlying tool is latent variable models with hierarchical power priors. We show how it is possible, by means of a relevant parameter, to decide whether two populations are different or not.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Barndorff-Nielsen, O.: Information and Exponential Families: In Statistical Theory. Wiley, Hoboken (2014)

    Book  Google Scholar 

  2. Bernardo, J.M., Smith, A.F.: Bayesian Theory, vol. 405. Wiley, Hoboken (2009)

    Google Scholar 

  3. Bishop, C.M.: Latent variable models. In: Learning in graphical models, pp. 371–403. Springer (1998)

  4. Blei, D.M.: Build, compute, critique, repeat: data analysis with latent variable models. Annu. Rev. Stat. Appl. 1, 203–232 (2014)

    Article  Google Scholar 

  5. Borgwardt, K., Ghahramani, Z.: Bayesian two-sample tests. arXiv preprint arXiv:0906.4032 (2009)

  6. Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)

    MATH  Google Scholar 

  7. Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14, 1303–1347 (2013)

    MathSciNet  MATH  Google Scholar 

  8. Lehmann, E.: Testing Statistical Hypothesis. Springer, Berlin (2006)

    Google Scholar 

  9. Masegosa, A., Nielsen, T.D., Langseth, H., Ramos-López, D., Salmerón, A., Madsen, A.L.: Bayesian models of data streams with hierarchical power priors. In: International Conference on Machine Learning, pp. 2334–2343 (2017)

  10. Sullivan, G., Feinn, R.: Using effect size—or why the p value is not enough. J. Grad. Med. Educ. 4, 279–282 (2012)

    Article  Google Scholar 

  11. Torres, A., Masegosa, A.R., Salmerón, A.: Un test de dos muestras multinomiales basado en modelos Bayesianos jerárquicos. In: Proceedings of the 18th Conference of the Spanish Assocciation for Artificial Intelligence, pp. 7–12 (2018)

  12. van der Laan, M., Rose, S.: Next generation of statisticians must build tools for massive data sets. Amstat News (2010)

Download references

Acknowledgements

This work has been supported by the Spanish Ministry of Economy and Competitiveness through projects TIN2016-77902-C3-3-P, TIN2015-74368-JIN and has received FEDER funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Salmerón.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Masegosa, A.R., Torres, A., Morales, M. et al. Comparing two multinomial samples using hierarchical Bayesian models. Prog Artif Intell 9, 145–154 (2020). https://doi.org/10.1007/s13748-019-00202-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-019-00202-1

Keywords

Navigation