skip to main content
10.1145/3344341.3368814acmconferencesArticle/Chapter ViewAbstractPublication PagesuccConference Proceedingsconference-collections
research-article

Exploring the Cost-benefit of AWS EC2 GPU Instances for Deep Learning Applications

Published:02 December 2019Publication History

ABSTRACT

Deep Learning is a subfield of machine learning methods based on artificial neural networks. Thanks to the increased data availability and computational power, such as Graphic Process Units (GPU), training deep networks - a time-consuming process - became possible. Cloud computing is an excellent option to acquire the computational power to train these models since it provides elastic products with a pay-per-use model. Amazon Web Services (AWS), for instance, has GPU-based virtual machine instances in its catalog, which differentiates themselves by the GPU type, number of GPUs, and price per hour. The challenge consists in determining which instance is better for a specific deep learning problem. This paper presents the implications, in terms of runtime and cost, of running two different deep learning problems on AWS GPU-based instances, and it proposes a methodology, based on the previous study cases, that analyzes instances for deep learning algorithms by using the information provided by the Keras framework. Our experimental results indicate that, despite having a higher price per hour, the instances that contain the NVIDIA V100 GPUs (p3) are faster and usually less expensive to use than the instances that contain the NVIDIA K80 GPUs (p2) for the problems we analyzed. Also, the results indicate that the performance of both applications did not scale well with the number of GPUs and that increasing the batch size to improve scalability may affect the final model accuracy. Finally, the proposed methodology provides accurate cost and estimated runtime for the tested applications on different AWS instances with a small cost.

References

  1. Lucas Araújo, Fabíola Oliveira, Jorge Faccipieri, Tiago Coimbra, Sandra Avila, Martin Tygel, and Edson Borin. 2018. Deteccc ao de estruturas em dados sísmicos com Deep Learning. Boletim SBGf 104 (2018), 18--21.Google ScholarGoogle Scholar
  2. Tal Ben-Nun and Torsten Hoefler. 2018. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis. arXiv preprint arXiv:1802.09941 (2018).Google ScholarGoogle Scholar
  3. Thamiris Coelho, Lucas Araújo, Tiago Coimbra, Martin Tygel, Sandra Avila, and Edson Borin. 2019. Automatic detection of diffraction-apex using fully convolutional networks. In Int. Congress of the Brazilian Geophysical Society .Google ScholarGoogle ScholarCross RefCross Ref
  4. Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. 2005. A tutorial on the cross-entropy method. Annals of Operations Research , Vol. 134, 1 (2005), 19--67.Google ScholarGoogle ScholarCross RefCross Ref
  5. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  6. Li Deng. 2012. The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine , Vol. 29, 6 (2012), 141--142.Google ScholarGoogle ScholarCross RefCross Ref
  7. Otero Enrico. 2017. Training Deep Learning Models on Multi-GPUs. https://labs.beeva.com/training-deep-learning-models-on-multi-gpus-a3cb7ca07e97.Google ScholarGoogle Scholar
  8. Otero Enrico. 2018. Accelerating the training of deep neural networks with MxNet on AWS P3 instances. https://www.bbva.com/en/accelerate-training-deep-neural-networks-mxnet-aws-p3-instances/.Google ScholarGoogle Scholar
  9. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle Scholar
  10. Alex Kaplunovich and Yelena Yesha. 2017. Cloud big data decision support system for machine learning on AWS: Analytics of analytics. In IEEE International Conference on Big Data. 3508--3516.Google ScholarGoogle ScholarCross RefCross Ref
  11. Nitish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tang. 2017. On large-batch training for deep learning: Generalization gap and sharp minima. In Int. Conference on Learning Representations .Google ScholarGoogle Scholar
  12. Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images . Technical Report. University of Toronto. https://www.cs.toronto.edu/ kriz/cifar.html.Google ScholarGoogle Scholar
  13. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature , Vol. 521, 7553 (2015), 436.Google ScholarGoogle Scholar
  14. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE , Vol. 86. 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  15. Naomi Takemoto, Lucas Araújo, Tiago Coimbra, Martin Tygel, Sandra Avila, and Edson Borin. 2019. Enriching synthetic data with real noise using Neural Style Transfer. In Int. Congress of the Brazilian Geophysical Society .Google ScholarGoogle ScholarCross RefCross Ref
  16. Yuanshun Yao, Zhujun Xiao, Bolun Wang, Bimal Viswanath, Haitao Zheng, and Ben Zhao. 2017. Complexity vs. performance: empirical analysis of machine learning as a service. In Internet Measurement Conference. 384--397.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploring the Cost-benefit of AWS EC2 GPU Instances for Deep Learning Applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          UCC'19: Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing
          December 2019
          307 pages
          ISBN:9781450368940
          DOI:10.1145/3344341

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 December 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate38of125submissions,30%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader