Accelerating Deep Learning with Millions of Classes

Yuan, Zhuoning; Guo, Zhishuai; Yu, Xiaotian; Wang, Xiaoyu; Yang, Tianbao

doi:10.1007/978-3-030-58592-1_42

Zhuoning Yuan¹²,
Zhishuai Guo¹²,
Xiaotian Yu¹⁴,
Xiaoyu Wang¹³ &
…
Tianbao Yang¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12368))

Included in the following conference series:

European Conference on Computer Vision

3380 Accesses
1 Citations

Abstract

Deep learning has achieved remarkable success in many classification tasks because of its great power of representation learning for complex data. However, it remains challenging when extending to classification tasks with millions of classes. Previous studies are focused on solving this problem in a distributed fashion or using a sampling-based approach to reduce the computational cost caused by the softmax layer. However, these approaches still need high GPU memory in order to work with large models and it is non-trivial to extend them to parallel settings. To address these issues, we propose an efficient training framework to handle extreme classification tasks based on Random Projection. The key idea is that we first train a slimmed model with a random projected softmax classifier and then we recover it to the original classifier. We also show a theoretical guarantee that this recovered classifier can approximate the original classifier with a small error. Later, we extend our framework to parallel settings by adopting a communication reduction technique. In our experiments, we demonstrate that the proposed framework is able to train deep learning models with millions of classes and achieve above \(10{\times }\) speedup compared to existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Blanc, G., Rendle, S.: Adaptive sampled softmax with kernel based sampling. arXiv preprint arXiv:1712.00527 (2017)
Borwein, J., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer, New York (2010)
Google Scholar
Boutsidis, C., Zouzias, A., Drineas, P.: Random projections for \(k\)-means clustering. In: Advances in Neural Information Processing Systems, pp. 298–306 (2010)
Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)
Book Google Scholar
Chelba, C., et al.: One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005 (2013)
Data.gov: Nist mugshot identification database mid. NIST Mugshot Identification Database MID - NIST Special Database 18 (2016)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Goyal, P., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 87–102. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_6
Chapter Google Scholar
Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297–304 (2010)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation. arXiv preprint arXiv:1412.2007 (2014)
Jia, X., et al.: Highly scalable deep learning training system with mixed-precision: training ImageNet in four minutes. arXiv preprint arXiv:1807.11205 (2018)
Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410 (2016)
Kemelmacher-Shlizerman, I., Seitz, S.M., Miller, D., Brossard, E.: The MegaFace benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4873–4882 (2016)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Miltsakaki, E., Prasad, R., Joshi, A.K., Webber, B.L.: The PENN discourse treebank. In: LREC (2004)
Google Scholar
Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course, Applied Optimization, vol. 87. Springer, New York (2004). https://doi.org/10.1007/978-1-4419-8853-9
Book MATH Google Scholar
You, Y., Zhang, Z., Hsieh, C.J., Demmel, J., Keutzer, K.: ImageNet training in minutes. In: Proceedings of the 47th International Conference on Parallel Processing, p. 1. ACM (2018)
Google Scholar
Yu, H., Yang, S., Zhu, S.: Parallel restarted SGD with faster convergence and less communication: demystifying why model averaging works for deep learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5693–5700 (2019)
Google Scholar
Yuan, Z., Yan, Y., Jin, R., Yang, T.: Stagewise training accelerates convergence of testing error over SGD. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019, pp. 2604–2614 (2019)
Google Scholar
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)
Zhang, L., Mahdavi, M., Jin, R., Yang, T., Zhu, S.: Recovering the optimal solution by dual random projection. In: Conference on Learning Theory, pp. 135–157 (2013)
Google Scholar
Zhang, X., Yang, L., Yan, J., Lin, D.: Accelerated training for massive classification via dynamic class selection. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Iowa, Iowa City, IA, 52242, USA
Zhuoning Yuan, Zhishuai Guo & Tianbao Yang
The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
Xiaoyu Wang
Shenzhen, China
Xiaotian Yu

Authors

Zhuoning Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Zhishuai Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xiaotian Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianbao Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhuoning Yuan .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 558 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuan, Z., Guo, Z., Yu, X., Wang, X., Yang, T. (2020). Accelerating Deep Learning with Millions of Classes. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12368. Springer, Cham. https://doi.org/10.1007/978-3-030-58592-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-58592-1_42
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58591-4
Online ISBN: 978-3-030-58592-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics