Hierarchical Bayesian Classifier Combination

Ghasemi Hamed, Mohammad; Akbari, Ahmad

doi:10.1007/978-3-319-96136-1_10

Mohammad Ghasemi Hamed¹³ &
Ahmad Akbari¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10934))

Included in the following conference series:

International Conference on Machine Learning and Data Mining in Pattern Recognition

1892 Accesses

Abstract

This paper proposes a Bayesian method for combining the output of multiple base classifiers. The focus is put on combination methods for merging the outputs of several and possibly heterogeneous classifiers with the aim of gaining in the final accuracy. Our work is based on the Dawid and Skene’s work [11] for modelling disagreement among human assessors. We also take advantage of the Bayesian Model Averaging (BMA) framework without requiring the ensemble of base classifiers to correspond in a mutually exclusive and exhaustive way to all the possible data generating models. This makes our method relevant for combining multiple classifiers’ output each observing and predicting the behavior of an entity by means of divers aspects of the underlying environment. The proposed method, called Hierarchical Bayesian Classifier Combination (HBCC) is for discrete classifiers and assumes that the individual classifiers are conditionally independent given the true class label. The comparison of HBCC with majority voting on six benchmark classification data sets shows that it generally outperforms majority voting in the classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Combining Predictions Under Uncertainty: The Case of Random Decision Trees

A geometric framework for multiclass ensemble classifiers

Article Open access 27 September 2023

On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers

Article 26 July 2018

Notes

1.
The Data Generation Model (DGM) is the model that produced the observed data.
2.
For a given value of $t^{*}$ and $x^{*}$ the distribution $p(t^{*}|x^{*},h)$ depends only on h and remains constant for all values of $\mathcal {D}$. It means that the random variable $t^{*}|x^{*}$ is conditionally independent of $\mathcal {D}$ given h which yields $ \mathcal {P}(t^{*}|x^{*},h)= \mathcal {P}(t^{*}|x^{*},h,\mathcal {D})$ .
3.
Wilcoxon signed rank test with continuity-correction; the alternative hypothesis was: the mean accuracy of HBCC is equal to the mean accuracy of majority voting.
4.
Look at the difference between bold values in Table 4 and HBCC or majority voting results in Table 3.
5.
The computational complexity of majority voting is O(JK), where J and K are respectively the number of the true class labels and the number of individual classifiers used in the combination.

References

Ahdesmäki, M., Strimmer, K., et al.: Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. Ann. Appl. Stat. 4(1), 503–519 (2010)
Article MathSciNet MATH Google Scholar
Bauer, E., Ron, K.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999)
Article Google Scholar
Bernardo, J.M., Smith, A.F.M.: Bayesian Theory, vol. 405. Wiley, Hoboken (2009)
Google Scholar
Bishop, C.M., et al.: Pattern Recognition and Machine Learning, vol. 4. Springer, New York (2006)
MATH Google Scholar
Bolstad, W.M.: Introduction to Bayesian Statistics. Wiley, Hoboken (2013)
MATH Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)
MATH Google Scholar
Clarke, B.: Comparing bayes model averaging and stacking when model approximation error cannot be ignored. J. Mach. Learn. Res. 4, 683–712 (2003)
MathSciNet MATH Google Scholar
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28, 20–28 (1979)
Article Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Chapter Google Scholar
Domingos, P.: Bayesian averaging of classifiers and the overfitting problem. In: ICML, pp. 223–230 (2000)
Google Scholar
Džeroski, S., Ženko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54(3), 255–273 (2004)
Article MATH Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. ICML 96, 148–156 (1996)
Google Scholar
Haitovsky, Y., Smith, A., Liu, Y.: Modelling disagreements among and within raters’ assessments from the bayesian point of view. In: Draft. Venue: Presented at the Valencia Meeting (2002)
Google Scholar
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Article Google Scholar
Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6(2), 181–214 (1994)
Article Google Scholar
Kim, H.-C., Ghahramani, Z.: Bayesian classifier combination. In: International Conference on Artificial Intelligence and Statistics, pp. 619–627 (2012)
Google Scholar
Lacoste, A., Marchand, M., Laviolette, F., Larochelle, H.: Agnostic Bayesian learning of ensembles. In: Proceedings of the 31st International Conference on Machine Learning, pp. 611–619 (2014)
Google Scholar
Maclin, R., Opitz, D.: An empirical evaluation of bagging and boosting. AAAI/IAAI 1997, 546–551 (1997)
Google Scholar
Minka, T.P.: Bayesian model averaging is not model combination, pp. 1–2 (2000). http://www.stat.cmu.edu/minka/papers/bma.html
Monteith, K., Carroll, J.L., Seppi, K., Martinez, T.: Turning Bayesian model averaging into Bayesian model combination. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 2657–2663. IEEE (2011)
Google Scholar
Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)
Article MATH Google Scholar
Quinlan, J.R.: Bagging, boosting, and c4. 5. In: AAAI/IAAI, vol. 1, pp. 725–730 (1996)
Google Scholar
Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)
MathSciNet Google Scholar
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)
Article Google Scholar
Schölkopf, B., Smola, A.: Support vector machines. In: Encyclopedia of Biostatistics (1998)
Google Scholar
Simpson, E., Roberts, S., Psorakis, I., Smith, A.: Dynamic Bayesian combination of multiple imperfect classifiers. In: Guy, T., Karny, M., Wolpert, D. (eds.) Decision Making and Imperfection, vol. 474, pp. 1–35. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36406-8_1
Chapter Google Scholar
Ting, K.M., Witten, I.H.: Stacking bagged and dagged models. In: ICML, pp. 367–375. Citeseer (1997)
Google Scholar
Todorovski, L., Džeroski, S.: Combining classifiers with meta decision trees. Mach. Learn. 50(3), 223–249 (2003)
Article MATH Google Scholar
Tulyakov, S., Jaeger, S., Govindaraju, V., Doermann, D.: Review of classifier combination methods. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition, vol. 90, pp. 361–386. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-76280-5_14
Chapter Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Article Google Scholar
Xu, L., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)
Article Google Scholar
Yuksel, S.E., Wilson, J.N., Gader, P.D.: Twenty years of mixture of experts. IEEE Trans. Neural Netw. Learn. Syst. 23(8), 1177–1193 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Audio & Speech Processing Lab, Computer Engineering Department, Iran University of Science and Technology, Tehran, Iran
Mohammad Ghasemi Hamed & Ahmad Akbari

Authors

Mohammad Ghasemi Hamed
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Akbari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mohammad Ghasemi Hamed or Ahmad Akbari .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghasemi Hamed, M., Akbari, A. (2018). Hierarchical Bayesian Classifier Combination. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10934. Springer, Cham. https://doi.org/10.1007/978-3-319-96136-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-96136-1_10
Published: 08 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96135-4
Online ISBN: 978-3-319-96136-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics