Abstract
Two-sample statistical tests are commonly used when deciding whether two samples can be considered to be drawn from the same population. However, statistical tests face problems when confronted to situations involving extremely large volumes of data, in which case the power of the test is so high that they reject the null hypothesis even if the differences found in the data are minimal. Furthermore, the fact that they may require to explore the whole sample each time they are applied is a serious limitation, for instance, in streaming data contexts. In this paper, we apply a class of Bayesian models that have been successfully used in streaming data context, to the problem of comparing multinomial populations. The underlying tool is latent variable models with hierarchical power priors. We show how it is possible, by means of a relevant parameter, to decide whether two populations are different or not.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Barndorff-Nielsen, O.: Information and Exponential Families: In Statistical Theory. Wiley, Hoboken (2014)
Bernardo, J.M., Smith, A.F.: Bayesian Theory, vol. 405. Wiley, Hoboken (2009)
Bishop, C.M.: Latent variable models. In: Learning in graphical models, pp. 371–403. Springer (1998)
Blei, D.M.: Build, compute, critique, repeat: data analysis with latent variable models. Annu. Rev. Stat. Appl. 1, 203–232 (2014)
Borgwardt, K., Ghahramani, Z.: Bayesian two-sample tests. arXiv preprint arXiv:0906.4032 (2009)
Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)
Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14, 1303–1347 (2013)
Lehmann, E.: Testing Statistical Hypothesis. Springer, Berlin (2006)
Masegosa, A., Nielsen, T.D., Langseth, H., Ramos-López, D., Salmerón, A., Madsen, A.L.: Bayesian models of data streams with hierarchical power priors. In: International Conference on Machine Learning, pp. 2334–2343 (2017)
Sullivan, G., Feinn, R.: Using effect size—or why the p value is not enough. J. Grad. Med. Educ. 4, 279–282 (2012)
Torres, A., Masegosa, A.R., Salmerón, A.: Un test de dos muestras multinomiales basado en modelos Bayesianos jerárquicos. In: Proceedings of the 18th Conference of the Spanish Assocciation for Artificial Intelligence, pp. 7–12 (2018)
van der Laan, M., Rose, S.: Next generation of statisticians must build tools for massive data sets. Amstat News (2010)
Acknowledgements
This work has been supported by the Spanish Ministry of Economy and Competitiveness through projects TIN2016-77902-C3-3-P, TIN2015-74368-JIN and has received FEDER funds.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Masegosa, A.R., Torres, A., Morales, M. et al. Comparing two multinomial samples using hierarchical Bayesian models. Prog Artif Intell 9, 145–154 (2020). https://doi.org/10.1007/s13748-019-00202-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-019-00202-1