research-article

Exploring the Cost-benefit of AWS EC2 GPU Instances for Deep Learning Applications

Authors:
Eva Maia Malta

University of Campinas, Campinas, Brazil

University of Campinas, Campinas, Brazil
View Profile

,
Sandra Avila

University of Campinas, Campinas, Brazil

University of Campinas, Campinas, Brazil
View Profile

,
Edson Borin

University of Campinas, Campinas, Brazil

University of Campinas, Campinas, Brazil
View Profile

UCC'19: Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud ComputingDecember 2019Pages 21–29https://doi.org/10.1145/3344341.3368814

Published:02 December 2019Publication History

UCC'19: Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing

Pages 21–29

ABSTRACT

Deep Learning is a subfield of machine learning methods based on artificial neural networks. Thanks to the increased data availability and computational power, such as Graphic Process Units (GPU), training deep networks - a time-consuming process - became possible. Cloud computing is an excellent option to acquire the computational power to train these models since it provides elastic products with a pay-per-use model. Amazon Web Services (AWS), for instance, has GPU-based virtual machine instances in its catalog, which differentiates themselves by the GPU type, number of GPUs, and price per hour. The challenge consists in determining which instance is better for a specific deep learning problem. This paper presents the implications, in terms of runtime and cost, of running two different deep learning problems on AWS GPU-based instances, and it proposes a methodology, based on the previous study cases, that analyzes instances for deep learning algorithms by using the information provided by the Keras framework. Our experimental results indicate that, despite having a higher price per hour, the instances that contain the NVIDIA V100 GPUs (p3) are faster and usually less expensive to use than the instances that contain the NVIDIA K80 GPUs (p2) for the problems we analyzed. Also, the results indicate that the performance of both applications did not scale well with the number of GPUs and that increasing the batch size to improve scalability may affect the final model accuracy. Finally, the proposed methodology provides accurate cost and estimated runtime for the tested applications on different AWS instances with a small cost.

References

Lucas Araújo, Fabíola Oliveira, Jorge Faccipieri, Tiago Coimbra, Sandra Avila, Martin Tygel, and Edson Borin. 2018. Deteccc ao de estruturas em dados sísmicos com Deep Learning. Boletim SBGf 104 (2018), 18--21.Google Scholar
Tal Ben-Nun and Torsten Hoefler. 2018. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis. arXiv preprint arXiv:1802.09941 (2018).Google Scholar
Thamiris Coelho, Lucas Araújo, Tiago Coimbra, Martin Tygel, Sandra Avila, and Edson Borin. 2019. Automatic detection of diffraction-apex using fully convolutional networks. In Int. Congress of the Brazilian Geophysical Society .Google ScholarCross Ref
Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein. 2005. A tutorial on the cross-entropy method. Annals of Operations Research , Vol. 134, 1 (2005), 19--67.Google ScholarCross Ref
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarCross Ref
Li Deng. 2012. The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine , Vol. 29, 6 (2012), 141--142.Google ScholarCross Ref
Otero Enrico. 2017. Training Deep Learning Models on Multi-GPUs. https://labs.beeva.com/training-deep-learning-models-on-multi-gpus-a3cb7ca07e97.Google Scholar
Otero Enrico. 2018. Accelerating the training of deep neural networks with MxNet on AWS P3 instances. https://www.bbva.com/en/accelerate-training-deep-neural-networks-mxnet-aws-p3-instances/.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
Alex Kaplunovich and Yelena Yesha. 2017. Cloud big data decision support system for machine learning on AWS: Analytics of analytics. In IEEE International Conference on Big Data. 3508--3516.Google ScholarCross Ref
Nitish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tang. 2017. On large-batch training for deep learning: Generalization gap and sharp minima. In Int. Conference on Learning Representations .Google Scholar
Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images . Technical Report. University of Toronto. https://www.cs.toronto.edu/ kriz/cifar.html.Google Scholar
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature , Vol. 521, 7553 (2015), 436.Google Scholar
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE , Vol. 86. 2278--2324.Google ScholarCross Ref
Naomi Takemoto, Lucas Araújo, Tiago Coimbra, Martin Tygel, Sandra Avila, and Edson Borin. 2019. Enriching synthetic data with real noise using Neural Style Transfer. In Int. Congress of the Brazilian Geophysical Society .Google ScholarCross Ref
Yuanshun Yao, Zhujun Xiao, Bolun Wang, Bimal Viswanath, Haitao Zheng, and Ben Zhao. 2017. Complexity vs. performance: empirical analysis of machine learning as a service. In Internet Measurement Conference. 384--397.Google ScholarDigital Library

Index Terms

Exploring the Cost-benefit of AWS EC2 GPU Instances for Deep Learning Applications
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
  2. Parallel computing methodologies
    1. Parallel algorithms

Recommendations

Performance and Cost Comparison of Cloud Services for Deep Learning Workload
ICPE '21: Companion of the ACM/SPEC International Conference on Performance Engineering

Many organizations are migrating their on-premise artificial intelligence workloads to the cloud due to the availability of cost-effective and highly scalable infrastructure, software and platform services. To ease the process of migration, many cloud ...
Read More
MATE-EC2: a middleware for processing data with AWS
MTAGS '11: Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers

Recently, there has been growing interest in using Cloud resources for a variety of high performance and data-intensive applications. While there is currently a number of commercial Cloud service providers, Amazon Web Services (AWS) appears to be the ...
Read More
WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds
IMC '14: Proceedings of the 2014 Conference on Internet Measurement Conference

Public infrastructure-as-a-service (IaaS) clouds such as Amazon EC2 and Microsoft Azure host an increasing number of web services. The dynamic, pay-as-you-go nature of modern IaaS systems enable web services to scale up or down with demand, and only pay ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
UCC'19: Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing
December 2019
307 pages
ISBN:9781450368940
DOI:10.1145/3344341
General Chairs:
Kenneth Johnson
Auckland University of Technology, New Zealand
,
Josef Spillner
Zurich University of Applied Sciences, Switzerland
,
Program Chairs:
Dalibor Klusáček
CESNET
,
Ashiq Anjum
University of Derby
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 December 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
aws
cloud computing
cost-benefit
ec2
machine learning
methodology
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate38of125submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 376
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploring the Cost-benefit of AWS EC2 GPU Instances for Deep Learning Applications

UCC'19: Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Performance and Cost Comparison of Cloud Services for Deep Learning Workload

MATE-EC2: a middleware for processing data with AWS

WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds