Skip to main content

Hierarchical Bayesian Classifier Combination

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10934))

  • 1895 Accesses

Abstract

This paper proposes a Bayesian method for combining the output of multiple base classifiers. The focus is put on combination methods for merging the outputs of several and possibly heterogeneous classifiers with the aim of gaining in the final accuracy. Our work is based on the Dawid and Skene’s work [11] for modelling disagreement among human assessors. We also take advantage of the Bayesian Model Averaging (BMA) framework without requiring the ensemble of base classifiers to correspond in a mutually exclusive and exhaustive way to all the possible data generating models. This makes our method relevant for combining multiple classifiers’ output each observing and predicting the behavior of an entity by means of divers aspects of the underlying environment. The proposed method, called Hierarchical Bayesian Classifier Combination (HBCC) is for discrete classifiers and assumes that the individual classifiers are conditionally independent given the true class label. The comparison of HBCC with majority voting on six benchmark classification data sets shows that it generally outperforms majority voting in the classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The Data Generation Model (DGM) is the model that produced the observed data.

  2. 2.

    For a given value of \(t^{*}\) and \(x^{*}\) the distribution \(p(t^{*}|x^{*},h)\) depends only on h and remains constant for all values of \(\mathcal {D}\). It means that the random variable \(t^{*}|x^{*}\) is conditionally independent of \(\mathcal {D}\) given h which yields \( \mathcal {P}(t^{*}|x^{*},h)= \mathcal {P}(t^{*}|x^{*},h,\mathcal {D})\) .

  3. 3.

    Wilcoxon signed rank test with continuity-correction; the alternative hypothesis was: the mean accuracy of HBCC is equal to the mean accuracy of majority voting.

  4. 4.

    Look at the difference between bold values in Table 4 and HBCC or majority voting results in Table 3.

  5. 5.

    The computational complexity of majority voting is O(JK), where J and K are respectively the number of the true class labels and the number of individual classifiers used in the combination.

References

  1. Ahdesmäki, M., Strimmer, K., et al.: Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. Ann. Appl. Stat. 4(1), 503–519 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bauer, E., Ron, K.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999)

    Article  Google Scholar 

  3. Bernardo, J.M., Smith, A.F.M.: Bayesian Theory, vol. 405. Wiley, Hoboken (2009)

    Google Scholar 

  4. Bishop, C.M., et al.: Pattern Recognition and Machine Learning, vol. 4. Springer, New York (2006)

    MATH  Google Scholar 

  5. Bolstad, W.M.: Introduction to Bayesian Statistics. Wiley, Hoboken (2013)

    MATH  Google Scholar 

  6. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  7. Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)

    MATH  Google Scholar 

  8. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  9. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)

    MATH  Google Scholar 

  10. Clarke, B.: Comparing bayes model averaging and stacking when model approximation error cannot be ignored. J. Mach. Learn. Res. 4, 683–712 (2003)

    MathSciNet  MATH  Google Scholar 

  11. Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28, 20–28 (1979)

    Article  Google Scholar 

  12. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1

    Chapter  Google Scholar 

  13. Domingos, P.: Bayesian averaging of classifiers and the overfitting problem. In: ICML, pp. 223–230 (2000)

    Google Scholar 

  14. Džeroski, S., Ženko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54(3), 255–273 (2004)

    Article  MATH  Google Scholar 

  15. Frank, A., Asuncion, A.: UCI machine learning repository (2010)

    Google Scholar 

  16. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. ICML 96, 148–156 (1996)

    Google Scholar 

  17. Haitovsky, Y., Smith, A., Liu, Y.: Modelling disagreements among and within raters’ assessments from the bayesian point of view. In: Draft. Venue: Presented at the Valencia Meeting (2002)

    Google Scholar 

  18. Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)

    Article  Google Scholar 

  19. Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6(2), 181–214 (1994)

    Article  Google Scholar 

  20. Kim, H.-C., Ghahramani, Z.: Bayesian classifier combination. In: International Conference on Artificial Intelligence and Statistics, pp. 619–627 (2012)

    Google Scholar 

  21. Lacoste, A., Marchand, M., Laviolette, F., Larochelle, H.: Agnostic Bayesian learning of ensembles. In: Proceedings of the 31st International Conference on Machine Learning, pp. 611–619 (2014)

    Google Scholar 

  22. Maclin, R., Opitz, D.: An empirical evaluation of bagging and boosting. AAAI/IAAI 1997, 546–551 (1997)

    Google Scholar 

  23. Minka, T.P.: Bayesian model averaging is not model combination, pp. 1–2 (2000). http://www.stat.cmu.edu/minka/papers/bma.html

  24. Monteith, K., Carroll, J.L., Seppi, K., Martinez, T.: Turning Bayesian model averaging into Bayesian model combination. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 2657–2663. IEEE (2011)

    Google Scholar 

  25. Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)

    Article  MATH  Google Scholar 

  26. Quinlan, J.R.: Bagging, boosting, and c4. 5. In: AAAI/IAAI, vol. 1, pp. 725–730 (1996)

    Google Scholar 

  27. Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. J. Mach. Learn. Res. 11, 1297–1322 (2010)

    MathSciNet  Google Scholar 

  28. Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)

    Article  Google Scholar 

  29. Schölkopf, B., Smola, A.: Support vector machines. In: Encyclopedia of Biostatistics (1998)

    Google Scholar 

  30. Simpson, E., Roberts, S., Psorakis, I., Smith, A.: Dynamic Bayesian combination of multiple imperfect classifiers. In: Guy, T., Karny, M., Wolpert, D. (eds.) Decision Making and Imperfection, vol. 474, pp. 1–35. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36406-8_1

    Chapter  Google Scholar 

  31. Ting, K.M., Witten, I.H.: Stacking bagged and dagged models. In: ICML, pp. 367–375. Citeseer (1997)

    Google Scholar 

  32. Todorovski, L., Džeroski, S.: Combining classifiers with meta decision trees. Mach. Learn. 50(3), 223–249 (2003)

    Article  MATH  Google Scholar 

  33. Tulyakov, S., Jaeger, S., Govindaraju, V., Doermann, D.: Review of classifier combination methods. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition, vol. 90, pp. 361–386. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-76280-5_14

    Chapter  Google Scholar 

  34. Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)

    Article  Google Scholar 

  35. Xu, L., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)

    Article  Google Scholar 

  36. Yuksel, S.E., Wilson, J.N., Gader, P.D.: Twenty years of mixture of experts. IEEE Trans. Neural Netw. Learn. Syst. 23(8), 1177–1193 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mohammad Ghasemi Hamed or Ahmad Akbari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ghasemi Hamed, M., Akbari, A. (2018). Hierarchical Bayesian Classifier Combination. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10934. Springer, Cham. https://doi.org/10.1007/978-3-319-96136-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96136-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96135-4

  • Online ISBN: 978-3-319-96136-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics