To Ensemble or Not Ensemble: When Does End-to-End Training Fail?

Webb, Andrew; Reynolds, Charles; Chen, Wenlin; Reeve, Henry; Iliescu, Dan; Luján, Mikel; Brown, Gavin

doi:10.1007/978-3-030-67664-3_7

Andrew Webb¹²,
Charles Reynolds¹²,
Wenlin Chen¹²,
Henry Reeve¹³,
Dan Iliescu¹⁴,
Mikel Luján¹² &
…
Gavin Brown¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12459))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2343 Accesses
3 Citations
3 Altmetric

Abstract

End-to-End training (E2E) is becoming more and more popular to train complex Deep Network architectures. An interesting question is whether this trend will continue—are there any clear failure cases for E2E training? We study this question in depth, for the specific case of E2E training an ensemble of networks. Our strategy is to blend the gradient smoothly in between two extremes: from independent training of the networks, up to to full E2E training. We find clear failure cases, where overparameterized models cannot be trained E2E. A surprising result is that the optimum can sometimes lie in between the two, neither an ensemble or an E2E system. The work also uncovers links to Dropout, and raises questions around the nature of ensemble diversity and multi-branch networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Adaptive Consensus-Based Ensemble for Improved Deep Learning Inference Cost

Ensembles of Networks Produced from Neural Architecture Search

Introduction

References

Anastasopoulos, A., Chiang, D.: Leveraging translations for speech transcription in low-resource settings. arXiv:1803.08991 (2018)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inf. Fusion 6(1), 5–20 (2005a)
Google Scholar
Brown, G., Wyatt, J.L., Tiňo, P.: Managing diversity in regression ensembles. J. Mach. Learn. Res. 6, 1621–1650 (2005b). http://dl.acm.org/citation.cfm?id=1046920.1194899
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Chapter Google Scholar
Dutt, A., Pellerin, D., Quénot, G.: Coupled ensembles of neural networks. Neurocomputing 396, 346–357 (2020)
Article Google Scholar
Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks (2018). arXiv:1805.04770
Glasmachers, T.: Limits of end-to-end learning (2017). arXiv preprint arXiv:1704.08305
Heskes, T.: Selecting weighting factors in logarithmic opinion pools. In: NIPS, pp. 266–272. The MIT Press (1998)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). arXiv:1503.02531
Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.: Multi-scale dense networks for resource efficient image classification. In: ICLR (2018). http://openreview.net/forum?id=Hk2aImxAb
Huang, G., Liu, Z., Maaten, L.v.d., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 2261–2269 (2017)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)
Article Google Scholar
Lee, S., Purushwalkam, S., Cogswell, M., Crandall, D., Batra, D.: Why M heads are better than one: training a diverse ensemble of deep networks. arXiv preprint arXiv:1511.06314 (2015)
Liu, Y., Yao, X.: Ensemble learning via negative correlation. Neural Netw. 12(10), 1399–1404 (1999)
Article Google Scholar
Reeve, H.W.J., Mu, T., Brown, G.: Modular dimensionality reduction. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11051, pp. 605–619. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10925-7_37
Chapter Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 1929–1958 (2014). http://jmlr.org/papers/v15/srivastava14a.html
Veit, A., Wilber, M.J., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems, pp. 550–558 (2016)
Google Scholar
Welling, M.: Product of experts. Scholarpedia 2(10), 3879 (2007). revision #137078
Article Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Article Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR, pp. 5987–5995 (2017)
Google Scholar
Zhao, L., Wang, J., Li, X., Tu, Z., Zeng, W.: On the connection of deep fusion to ensembling. Technical report MSR-TR-2016-1118 (2016)
Google Scholar
Zhu, S., Dong, X., Su, H.: Binary ensemble neural network: more bits per network or more networks per bit? In: CVPR, pp. 4923–4932 (2019)
Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the support of the EPSRC for the LAMBDA project (EP/N035127/1).

Author information

Authors and Affiliations

University of Manchester, Manchester, UK
Andrew Webb, Charles Reynolds, Wenlin Chen, Mikel Luján & Gavin Brown
University of Bristol, Bristol, UK
Henry Reeve
University of Cambridge, Cambridge, UK
Dan Iliescu

Authors

Andrew Webb
View author publications
You can also search for this author in PubMed Google Scholar
Charles Reynolds
View author publications
You can also search for this author in PubMed Google Scholar
Wenlin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Henry Reeve
View author publications
You can also search for this author in PubMed Google Scholar
Dan Iliescu
View author publications
You can also search for this author in PubMed Google Scholar
Mikel Luján
View author publications
You can also search for this author in PubMed Google Scholar
Gavin Brown
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gavin Brown .

Editor information

Editors and Affiliations

Albert-Ludwigs-Universität, Freiburg, Germany
Frank Hutter
TU Darmstadt, Darmstadt, Germany
Kristian Kersting
Ghent University, Ghent, Belgium
Jefrey Lijffijt
Saarland University, Saarbrücken, Germany
Isabel Valera

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 100 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Webb, A. et al. (2021). To Ensemble or Not Ensemble: When Does End-to-End Training Fail?. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12459. Springer, Cham. https://doi.org/10.1007/978-3-030-67664-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-67664-3_7
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67663-6
Online ISBN: 978-3-030-67664-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)