Abstract
This paper proposes a Bayesian method for combining the output of multiple base classifiers. The focus is put on combination methods for merging the outputs of several and possibly heterogeneous classifiers with the aim of gaining in the final accuracy. Our work is based on the Dawid and Skene’s work [11] for modelling disagreement among human assessors. We also take advantage of the Bayesian Model Averaging (BMA) framework without requiring the ensemble of base classifiers to correspond in a mutually exclusive and exhaustive way to all the possible data generating models. This makes our method relevant for combining multiple classifiers’ output each observing and predicting the behavior of an entity by means of divers aspects of the underlying environment. The proposed method, called Hierarchical Bayesian Classifier Combination (HBCC) is for discrete classifiers and assumes that the individual classifiers are conditionally independent given the true class label. The comparison of HBCC with majority voting on six benchmark classification data sets shows that it generally outperforms majority voting in the classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The Data Generation Model (DGM) is the model that produced the observed data.
- 2.
For a given value of \(t^{*}\) and \(x^{*}\) the distribution \(p(t^{*}|x^{*},h)\) depends only on h and remains constant for all values of \(\mathcal {D}\). It means that the random variable \(t^{*}|x^{*}\) is conditionally independent of \(\mathcal {D}\) given h which yields \( \mathcal {P}(t^{*}|x^{*},h)= \mathcal {P}(t^{*}|x^{*},h,\mathcal {D})\) .
- 3.
Wilcoxon signed rank test with continuity-correction; the alternative hypothesis was: the mean accuracy of HBCC is equal to the mean accuracy of majority voting.
- 4.
- 5.
The computational complexity of majority voting is O(JK), where J and K are respectively the number of the true class labels and the number of individual classifiers used in the combination.
References
Ahdesmäki, M., Strimmer, K., et al.: Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. Ann. Appl. Stat. 4(1), 503–519 (2010)
Bauer, E., Ron, K.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999)
Bernardo, J.M., Smith, A.F.M.: Bayesian Theory, vol. 405. Wiley, Hoboken (2009)
Bishop, C.M., et al.: Pattern Recognition and Machine Learning, vol. 4. Springer, New York (2006)
Bolstad, W.M.: Introduction to Bayesian Statistics. Wiley, Hoboken (2013)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)
Clarke, B.: Comparing bayes model averaging and stacking when model approximation error cannot be ignored. J. Mach. Learn. Res. 4, 683–712 (2003)
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28, 20–28 (1979)
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Domingos, P.: Bayesian averaging of classifiers and the overfitting problem. In: ICML, pp. 223–230 (2000)
Džeroski, S., Ženko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54(3), 255–273 (2004)
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. ICML 96, 148–156 (1996)
Haitovsky, Y., Smith, A., Liu, Y.: Modelling disagreements among and within raters’ assessments from the bayesian point of view. In: Draft. Venue: Presented at the Valencia Meeting (2002)
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6(2), 181–214 (1994)
Kim, H.-C., Ghahramani, Z.: Bayesian classifier combination. In: International Conference on Artificial Intelligence and Statistics, pp. 619–627 (2012)
Lacoste, A., Marchand, M., Laviolette, F., Larochelle, H.: Agnostic Bayesian learning of ensembles. In: Proceedings of the 31st International Conference on Machine Learning, pp. 611–619 (2014)
Maclin, R., Opitz, D.: An empirical evaluation of bagging and boosting. AAAI/IAAI 1997, 546–551 (1997)
Minka, T.P.: Bayesian model averaging is not model combination, pp. 1–2 (2000). http://www.stat.cmu.edu/minka/papers/bma.html
Monteith, K., Carroll, J.L., Seppi, K., Martinez, T.: Turning Bayesian model averaging into Bayesian model combination. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 2657–2663. IEEE (2011)
Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)
Quinlan, J.R.: Bagging, boosting, and c4. 5. In: AAAI/IAAI, vol. 1, pp. 725–730 (1996)
Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)
Schölkopf, B., Smola, A.: Support vector machines. In: Encyclopedia of Biostatistics (1998)
Simpson, E., Roberts, S., Psorakis, I., Smith, A.: Dynamic Bayesian combination of multiple imperfect classifiers. In: Guy, T., Karny, M., Wolpert, D. (eds.) Decision Making and Imperfection, vol. 474, pp. 1–35. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36406-8_1
Ting, K.M., Witten, I.H.: Stacking bagged and dagged models. In: ICML, pp. 367–375. Citeseer (1997)
Todorovski, L., Džeroski, S.: Combining classifiers with meta decision trees. Mach. Learn. 50(3), 223–249 (2003)
Tulyakov, S., Jaeger, S., Govindaraju, V., Doermann, D.: Review of classifier combination methods. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition, vol. 90, pp. 361–386. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-76280-5_14
Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Xu, L., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)
Yuksel, S.E., Wilson, J.N., Gader, P.D.: Twenty years of mixture of experts. IEEE Trans. Neural Netw. Learn. Syst. 23(8), 1177–1193 (2012)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Ghasemi Hamed, M., Akbari, A. (2018). Hierarchical Bayesian Classifier Combination. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10934. Springer, Cham. https://doi.org/10.1007/978-3-319-96136-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-96136-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96135-4
Online ISBN: 978-3-319-96136-1
eBook Packages: Computer ScienceComputer Science (R0)