ABSTRACT
Training neural networks efficiently is a thoroughly-researched topic that plays an important role in their adoption. Major advancements have been made, including the use of multiple nodes to further decrease training time. However, training at scale usually means adding on multiple layers of complex deployment logic and parallelization concerns, distracting researchers from the core of their algorithms. This paper presents a framework called BK.Synapse that can facilitate distributed training while maintaining clarity, simplicity, and user-friendliness. The design is modular, allowing flexible and easy deployment on a variety of hardware specifications. The framework is benchmarked in a case study: training a neural network for an object detection problem. Our results show a good amount of improvements over conventional training, with very few modifications to the existing codebase. The resulting model also performs relatively well upon further testing.
- Overview - icdar 2019 robust reading challenge on scanned receipts ocr and information extraction. https://rrc.cvc.uab.es/?ch=13.Google Scholar
- M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265--283, 2016.Google ScholarDigital Library
- S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759, 2014.Google Scholar
- M. P. Forum. Mpi: A message-passing interface standard. Technical report, Knoxville, TN, USA, 1994.Google ScholarDigital Library
- M. Grinberg. Flask Web Development: Developing Web Applications with Python. O'Reilly Media, Inc., 1st edition, 2014.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.Google ScholarCross Ref
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675--678. ACM, 2014.Google ScholarDigital Library
- N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al. In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 1--12. IEEE, 2017.Google ScholarDigital Library
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal loss for dense object detection. 2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017.Google ScholarCross Ref
- A. Paszke, S. Gross, S. Chintala, and G. Chanan. Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration. PyTorch: Tensors and dynamic neural networks in Python with strong GPU acceleration, 2017.Google Scholar
- R. Raina, A. Madhavan, and A. Y. Ng. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th annual international conference on machine learning, pages 873--880. ACM, 2009.Google ScholarDigital Library
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779--788, 2016.Google ScholarCross Ref
- A. Sergeev and M. D. Balso. Horovod: fast and easy distributed deep learning in tensorflow. CoRR, abs/1802.05799, 2018.Google Scholar
- L. Yeager, J. Bernauer, A. Gray, and M. Houston. Digits: the deep learning gpu training system. In ICML 2015 AutoML Workshop, 2015.Google Scholar
Recommendations
Distributed training for accelerating metalearning algorithms
BiDEDE '21: Proceedings of the International Workshop on Big Data in Emergent Distributed EnvironmentsThe lack of large amounts of training data diminishes the power of deep learning to train models with a high accuracy. Few shot learning (i.e. learning using few data samples) is implemented by Meta-learning, a learn to learn approach. Most gradient ...
A Hitchhiker’s Guide On Distributed Training Of Deep Neural Networks
AbstractDeep learning has led to tremendous advancements in the field of Artificial Intelligence. One caveat, however, is the substantial amount of compute needed to train these deep learning models. Training a benchmark dataset like ImageNet ...
Highlights- End to end survey on various aspects of distributed training of neural networks.
SINGA: A Distributed Deep Learning Platform
MM '15: Proceedings of the 23rd ACM international conference on MultimediaDeep learning has shown outstanding performance in various machine learning tasks. However, the deep complex model structure and massive training data make it expensive to train. In this paper, we present a distributed deep learning system, called SINGA, ...
Comments