research-article

Cloud API issues: an empirical study and impact

Authors:
Qinghua Lu

NICTA, Sydney, Australia

NICTA, Sydney, Australia
View Profile

,
Liming Zhu

NICTA, Sydney, Australia

NICTA, Sydney, Australia
View Profile

,
Len Bass

NICTA, Sydney, Australia

NICTA, Sydney, Australia
View Profile

,
Xiwei Xu

NICTA, Sydney, Australia

NICTA, Sydney, Australia
View Profile

,
Zhanwen Li

NICTA, Sydney, Australia

NICTA, Sydney, Australia
View Profile

,
Hiroshi Wada

NICTA, Sydney, Australia

NICTA, Sydney, Australia
View Profile

QoSA '13: Proceedings of the 9th international ACM Sigsoft conference on Quality of software architecturesJune 2013Pages 23–32https://doi.org/10.1145/2465478.2465481

Published:17 June 2013Publication History

QoSA '13: Proceedings of the 9th international ACM Sigsoft conference on Quality of software architectures

Pages 23–32

ABSTRACT

Outages to the cloud infrastructures have been widely publicized and it would be easy to conclude that application developers only need to be concerned with large scale cloud provider infrastructure outages. Unfortunately, this is not the case. In-cloud applications heavily rely on cloud infrastructure APIs (directly or indirectly through scripts and consoles) for many sporadic activities such as deployment change, scaling out/in, backup, recovery and migration. Failures and/or issues around API calls are a large source of faults that could lead to application failures, especially during sporadic activities. Infrastructure outages can also be greatly exacerbated by API-related issues.

In this paper we present an empirical study of issues in Amazon EC2 APIs. Some of the major findings around API issues include: 1) A majority (60%) of the cases of API failures are related to "stuck" API calls or unresponsive API calls. 2) A significant portion (12%) of the cases of API failures are about slow responsive API calls. 3) 19% of the cases of API failures are related to the output issues of API calls, including failed calls with unclear error messages, as well as missing output, wrong output, and unexpected output of API calls. 4) There are 9% cases of API failures reporting that their calls (performing some actions and expecting a state change) were pending for a certain time and then returned to the original state without informing the caller properly or the calls were reported to be successful first but failed later. We also classify the causes of API issues and discuss the impact of API issues on application architectures.

References

Netflix. 2013. The Netflix Tech Blog. Available: http://techblog.netflix.com/Google Scholar
Yuruware. 2013. Yuruware Bolt Migration and Disaster Recovery. Available: http://www.yuruware.com/Google Scholar
Amazon. 2013. Amazon Elastic Compute Cloud (Amazon EC2). Available: http://aws.amazon.com/ec2/Google Scholar
Amazon. 2013. Amazon Elastic Compute Cloud Forum. Available: https://forums.aws.amazon.com/forum.jspa?forumID=30Google Scholar
Amazon. 2011. Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region. Available: http://aws.amazon.com/message/65648/Google Scholar
Amazon. 2012. Summary of the AWS Service Event in the US East Region. Available: http://aws.amazon.com/message/67457/Google Scholar
Amazon. 2011. Summary of the Amazon SimpleDB Service Disruption. Available: http://aws.amazon.com/message/65649/Google Scholar
Amazon. 2012. Summary of the December 24, 2012 Amazon ELB Service Event in the US-East Region. Available: http://aws.amazon.com/message/680587/Google Scholar
Amazon. 2011. Summary of the Amazon EC2, Amazon EBS, and Amazon RDS Service Event in the EU West Region. Available: http://aws.amazon.com/message/2329B7/Google Scholar
Amazon. 2012. Summary of the October 22,2012 AWS Service Event in the US-East Region. Available: http://aws.amazon.com/message/680342/Google Scholar
Netflix. 2013. Netflix - Watch TV Shows Online, Watch Movies Online. Available: https://www.netflix.com/Google Scholar
Netflix. 2013. Netflix Open Source Center. Available: http://netflix.github.com/Google Scholar
Avizienis, A., Laprie, J. C., Randell, B., and Landwehr, C. 2004. Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing. vol. 1, pp. 11--33, 2004. Google ScholarDigital Library
Netflix. 2013. Netflix presentations channel on SlideShare. Available: http://www.slideshare.net/netflixGoogle Scholar
Amazon. 2012. API Tool Reference. Available: http://docs.aws.amazon.com/AWSEC2/latest/CommandLineReference/command-reference.htmlGoogle Scholar
Reason, J. 1990. Human Error. Cambridge university press.Google Scholar
Amazon. 2012. Common Options for API Tools. Available: http://docs.aws.amazon.com/AWSEC2/latest/CommandLineReference/CLTRG-common-args-api.htmlGoogle Scholar
Amazon. 2013. Amazon EC2 Documentation Archive. Available: http://aws.amazon.com/archives/Amazon%20EC2?_encoding=UTF8&jiveRedirect=1Google Scholar
Russell, N., Aalst, W. V. D. and Hofstede, A. T. 2006. Workflow Exception Patterns. In Advanced Information Systems Engineering. pp. 288--302. Google ScholarDigital Library
Russell, N., Aalst, W. V. D., Hofstede, A. T., and Edmond, D. 2005. Workflow Resource Patterns: Identification, Representation and Tool Support. In Advanced Information Systems Engineering. pp. 11--42. Google ScholarDigital Library
Cockcroft, A. 2012. Highly Available Architecture at Netflix. Available: http://www.slideshare.net/adrianco/high-availability-architecture-at-netflixGoogle Scholar
Joshi, K. R., Bunker, G., Jahanian, F., Moorsel, A. V., and Weinman, J. 2009. Dependability in the Cloud: Challenges and Opportunities. In IEEE/IFIP International Conference on Dependable Systems & Networks. pp. 103--104.Google Scholar
Ford, D., Labelle, F., Popovici, F. I., Stokely, M., Truong, V. A., Barroso, L. 2010. Availability in Globally Distributed Storage Systems. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation. Google ScholarDigital Library
Yin, Z., Ma, X., Zheng, J., Zhou, Y., Bairavasundaram, L. N., and Pasupathy, S. 2011. An Empirical Study on Configuration Errors in Commercial and Open Source Systems. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. pp. 159--172. Google ScholarDigital Library
Vishwanath, K. V. and Nagappan, N. 2010. Characterizing Cloud Computing Hardware Reliability. In Proceedings of the 1st ACM symposium on Cloud computing. pp. 193--204. Google ScholarDigital Library
Gill, P. 2011. Understanding Network Failures in Data Centers: Measurement, Analysis and Implications. In Proceedings of the ACM SIGCOMM 2011 conference. pp. 350--361. Google ScholarDigital Library
Dean, J. and Barroso, L. A. The Tail at Scale. Communications of the ACM. vol. 56. pp. 74--80. Google ScholarDigital Library
Malek, S., Medvidovic, N., and Mikic-Rakic, M. 2012. An Extensible Framework for Improving a Distributed Software System's Deployment Architecture. IEEE Transactions on Software Engineering. vol. 38. pp. 73--100. Google ScholarDigital Library

Index Terms

Cloud API issues: an empirical study and impact
1. General and reference
  1. Cross-computing tools and techniques
    1. Reliability
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software reliability

Recommendations

API Learning: Applying Machine Learning to Manage the Rise of API Economy
WWW '18: Companion Proceedings of the The Web Conference 2018

Application Programming Interface (API) exposes data and functions of a software application to third-party users. In digital business, API economy is one of the key component for determining the value of provided services. With the rise in number of ...
Read More
The OWL API: A Java API for OWL ontologies

We present the OWL API, a high level Application Programming Interface (API) for working with OWL ontologies. The OWL API is closely aligned with the OWL 2 structural specification. It supports parsing and rendering in the syntaxes defined in the W3C ...
Read More
The Lowly API Is Ready to Step Front and Center

The API is taking on new roles and is becoming critical to important technologies such as cloud computing and to the use of both Web and mobile applications.

Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
QoSA '13: Proceedings of the 9th international ACM Sigsoft conference on Quality of software architectures
June 2013
180 pages
ISBN:9781450321266
DOI:10.1145/2465478
General Chair:
Philippe Kruchten
University of British Columbia, Canada
,
Program Chairs:
Anne Koziolek
Karlsruhe Institute of Technology, Germany
,
Robert Nord
Software Engineering Institute, USA
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
api
architecture
cloud computing
empirical study
fault-tolerant design
reliability
Qualifiers
- research-article
Conference

Acceptance Rates
QoSA '13 Paper Acceptance Rate17of42submissions,40%Overall Acceptance Rate46of131submissions,35%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 491
  Total Downloads
- Downloads (Last 12 months)56
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cloud API issues: an empirical study and impact

QoSA '13: Proceedings of the 9th international ACM Sigsoft conference on Quality of software architectures

ABSTRACT

References

Cited By

Index Terms

Recommendations

API Learning: Applying Machine Learning to Manage the Rise of API Economy

The OWL API: A Java API for OWL ontologies

The Lowly API Is Ready to Step Front and Center