Skip to main content

Ensembles of Nested Dichotomies with Multiple Subset Evaluation

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11439))

Included in the following conference series:

  • 2815 Accesses

Abstract

A system of nested dichotomies (NDs) is a method of decomposing a multiclass problem into a collection of binary problems. Such a system recursively applies binary splits to divide the set of classes into two subsets, and trains a binary classifier for each split. Many methods have been proposed to perform this split, each with various advantages and disadvantages. In this paper, we present a simple, general method for improving the predictive performance of NDs produced by any subset selection techniques that employ randomness to construct the subsets. We provide a theoretical expectation for performance improvements, as well as empirical results showing that our method improves the root mean squared error of NDs, regardless of whether they are employed as an individual model or in an ensemble setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    This is a variant of the approach from [11], where each member of the space of NDs has an equal probability of being sampled.

  2. 2.

    Appropriate values for \(\alpha \) for a given \(\lambda \) can be found in Table 3 of [15].

References

  1. Bengio, S., Weston, J., Grangier, D.: Label embedding trees for large multi-class tasks. In: NIPS, pp. 163–171 (2010)

    Google Scholar 

  2. Beygelzimer, A., Langford, J., Lifshits, Y., Sorkin, G., Strehl, A.: Conditional probability tree estimation analysis and algorithms. In: UAI, pp. 51–58 (2009)

    Google Scholar 

  3. Beygelzimer, A., Langford, J., Ravikumar, P.: Error-correcting tournaments. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds.) ALT 2009. LNCS (LNAI), vol. 5809, pp. 247–262. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04414-4_22

    Chapter  Google Scholar 

  4. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  5. Brier, G.: Verification of forecasts expressed in term of probabilities. Mon. Weather Rev. 78, 1–3 (1950)

    Article  Google Scholar 

  6. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. JMLR 7(Jan), 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  7. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. JAIR 2, 263–286 (1995)

    Article  MATH  Google Scholar 

  8. Dong, L., Frank, E., Kramer, S.: Ensembles of balanced nested dichotomies for multi-class problems. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 84–95. Springer, Heidelberg (2005). https://doi.org/10.1007/11564126_13

    Chapter  Google Scholar 

  9. Duarte-Villaseñor, M.M., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Flores-Garrido, M.: Nested dichotomies based on clustering. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) CIARP 2012. LNCS, vol. 7441, pp. 162–169. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33275-3_20

    Chapter  Google Scholar 

  10. Fox, J.: Applied Regression Analysis, Linear Models, and Related Methods. Sage, Thousand Oaks (1997)

    Google Scholar 

  11. Frank, E., Kramer, S.: Ensembles of nested dichotomies for multi-class problems. In: ICML, p. 39. ACM (2004)

    Google Scholar 

  12. Freund, Y., Schapire, R.E.: Game theory, on-line prediction and boosting. In: COLT, pp. 325–332 (1996)

    Google Scholar 

  13. Fürnkranz, J.: Round robin classification. JMLR 2(Mar), 721–747 (2002)

    MathSciNet  MATH  Google Scholar 

  14. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  15. Harter, H.L.: Expected values of normal order statistics. Biometrika 48(1/2), 151–165 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  16. Hastie, T., Tibshirani, R., et al.: Classification by pairwise coupling. Ann. Stat. 26(2), 451–471 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  17. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)

    Article  MATH  Google Scholar 

  18. Leathart, T., Frank, E., Holmes, G., Pfahringer, B.: On calibration of nested dichotomies. In: Yang, Q., et al. (eds.) Advances in Knowledge Discovery and Data Mining. LNAI, vol. 11439, pp. 69–80. Springer, Heidelberg (2019)

    Chapter  Google Scholar 

  19. Leathart, T., Pfahringer, B., Frank, E.: Building ensembles of adaptive nested dichotomies with random-pair selection. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9852, pp. 179–194. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46227-1_12

    Chapter  Google Scholar 

  20. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  21. Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  22. Meilă, M.: Comparing clusterings by the variation of information. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT-Kernel 2003. LNCS (LNAI), vol. 2777, pp. 173–187. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45167-9_14

    Chapter  Google Scholar 

  23. Melnikov, V., Hüllermeier, E.: On the effectiveness of heuristics for learning nested dichotomies: an empirical analysis. Mach. Learn. 107(8–10), 1–24 (2018)

    MathSciNet  MATH  Google Scholar 

  24. Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with supervised learning. In: ICML, pp. 625–632. ACM (2005)

    Google Scholar 

  25. Pimenta, E., Gama, J.: A study on error correcting output codes. In: Portuguese Conference on Artificial Intelligence, pp. 218–223. IEEE (2005)

    Google Scholar 

  26. Rifkin, R., Klautau, A.: defense of one-vs-all classification. JMLR 5, 101–141 (2004)

    MathSciNet  MATH  Google Scholar 

  27. Rodríguez, J.J., García-Osorio, C., Maudes, J.: Forests of nested dichotomies. Pattern Recognit. Lett. 31(2), 125–132 (2010)

    Article  Google Scholar 

  28. Royston, J.: Algorithm AS 177: expected normal order statistics (exact and approximate). J. R. Stat. Soc. Ser. C (Appl. Stat.) 31(2), 161–165 (1982)

    Google Scholar 

  29. Wever, M., Mohr, F., Hüllermeier, E.: Ensembles of evolved nested dichotomies for classification. In: GECCO, pp. 561–568. ACM (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tim Leathart .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 39 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Leathart, T., Frank, E., Pfahringer, B., Holmes, G. (2019). Ensembles of Nested Dichotomies with Multiple Subset Evaluation. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11439. Springer, Cham. https://doi.org/10.1007/978-3-030-16148-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-16148-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16147-7

  • Online ISBN: 978-3-030-16148-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics