Skip to main content

To Ensemble or Not Ensemble: When Does End-to-End Training Fail?

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020)

Abstract

End-to-End training (E2E) is becoming more and more popular to train complex Deep Network architectures. An interesting question is whether this trend will continue—are there any clear failure cases for E2E training? We study this question in depth, for the specific case of E2E training an ensemble of networks. Our strategy is to blend the gradient smoothly in between two extremes: from independent training of the networks, up to to full E2E training. We find clear failure cases, where overparameterized models cannot be trained E2E. A surprising result is that the optimum can sometimes lie in between the two, neither an ensemble or an E2E system. The work also uncovers links to Dropout, and raises questions around the nature of ensemble diversity and multi-branch networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Anastasopoulos, A., Chiang, D.: Leveraging translations for speech transcription in low-resource settings. arXiv:1803.08991 (2018)

  • Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  • Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inf. Fusion 6(1), 5–20 (2005a)

    Google Scholar 

  • Brown, G., Wyatt, J.L., Tiňo, P.: Managing diversity in regression ensembles. J. Mach. Learn. Res. 6, 1621–1650 (2005b). http://dl.acm.org/citation.cfm?id=1046920.1194899

  • Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1

    Chapter  Google Scholar 

  • Dutt, A., Pellerin, D., Quénot, G.: Coupled ensembles of neural networks. Neurocomputing 396, 346–357 (2020)

    Article  Google Scholar 

  • Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks (2018). arXiv:1805.04770

  • Glasmachers, T.: Limits of end-to-end learning (2017). arXiv preprint arXiv:1704.08305

  • Heskes, T.: Selecting weighting factors in logarithmic opinion pools. In: NIPS, pp. 266–272. The MIT Press (1998)

    Google Scholar 

  • Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). arXiv:1503.02531

  • Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.: Multi-scale dense networks for resource efficient image classification. In: ICLR (2018). http://openreview.net/forum?id=Hk2aImxAb

  • Huang, G., Liu, Z., Maaten, L.v.d., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 2261–2269 (2017)

    Google Scholar 

  • Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  • Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)

    Article  Google Scholar 

  • Lee, S., Purushwalkam, S., Cogswell, M., Crandall, D., Batra, D.: Why M heads are better than one: training a diverse ensemble of deep networks. arXiv preprint arXiv:1511.06314 (2015)

  • Liu, Y., Yao, X.: Ensemble learning via negative correlation. Neural Netw. 12(10), 1399–1404 (1999)

    Article  Google Scholar 

  • Reeve, H.W.J., Mu, T., Brown, G.: Modular dimensionality reduction. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11051, pp. 605–619. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10925-7_37

    Chapter  Google Scholar 

  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 1929–1958 (2014). http://jmlr.org/papers/v15/srivastava14a.html

  • Veit, A., Wilber, M.J., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems, pp. 550–558 (2016)

    Google Scholar 

  • Welling, M.: Product of experts. Scholarpedia 2(10), 3879 (2007). revision #137078

    Article  Google Scholar 

  • Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)

    Article  Google Scholar 

  • Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR, pp. 5987–5995 (2017)

    Google Scholar 

  • Zhao, L., Wang, J., Li, X., Tu, Z., Zeng, W.: On the connection of deep fusion to ensembling. Technical report MSR-TR-2016-1118 (2016)

    Google Scholar 

  • Zhu, S., Dong, X., Su, H.: Binary ensemble neural network: more bits per network or more networks per bit? In: CVPR, pp. 4923–4932 (2019)

    Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the support of the EPSRC for the LAMBDA project (EP/N035127/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gavin Brown .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 100 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Webb, A. et al. (2021). To Ensemble or Not Ensemble: When Does End-to-End Training Fail?. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12459. Springer, Cham. https://doi.org/10.1007/978-3-030-67664-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67664-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67663-6

  • Online ISBN: 978-3-030-67664-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics