skip to main content
10.1145/3297663.3309675acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
short-paper

A Cloud Performance Analytics Framework to Support Online Performance Diagnosis and Monitoring Tools

Published: 04 April 2019 Publication History

Abstract

Traditionally, performance analysis, de-bugging, triaging, troubleshooting, and optimization are left in the hands of performance experts. The main rationale behind this is that performance engi-neering is considered a specialized do-main expertise, and therefore left to the trained hands of experts. However, this approach requires human manpower to be put behind every performance escala-tion. This is no longer future proof in enterprise environments because of the following reasons: (i) Enterprise customers now expect much quicker performance troubleshooting, particularly in cloud platforms as Soft-ware As A Service (SaaS) offerings where the billing is subscription based, (ii) As products grow more distributed and complex, the number of performance met-rics required to troubleshoot a perfor-mance problem implodes, making it very time consuming for human intervention and analysis, and (iii) Our past experi-ences show that while many customers land up on similar performance issues, the human effort to troubleshoot each of these performance issues in a different infrastructural environment is non-trivial. We believe that data analytics platforms that can quickly mine through performance data and point out potential bottlenecks offer a good solution for non-domain experts to debug and solve a performance issue. In this work, we showcase a cloud based performance data analytics framework which can be leveraged to build tools which analyze and root-cause performance issues in enterprise sys-tems. We describe the architecture of this framework which consists of: (i) A cloud service (which we term as a plugin), (ii) Supporting libraries that may be used to interact with this plugin from end-systems such as computer serv-ers or appliance Virtual Machines (VMs), and (iii) A solution to monitor and ana-lyze the results delivered by the plugin. We demonstrate how this platform can be used to develop different perfor-mance analyses and debugging tools. We provide one example of a tool that we have built on top of this framework and released: VMware Virtual SAN (vSAN) per-formance diagnostics.
We specifically discuss how collecting performance data in the cloud from over a thousand deployments, and then analyz-ing to detect performance issues, helped us write rules that can easily detect similar performance issues. Finally, we discuss a framework for monitoring the performance of the rules and improving them.

References

[1]
VSAN Performance Diagnostics Knowledge Base at https://kb.vmware.com/s/article/2148770
[2]
Nimble Storage Infosight Predictive Analytics: https://www.adn.de/fileadmin/user_upload/Hersteller/Nimble/Datenblaetter/nimblestorage-ds-infosight.pdf
[3]
Nutanix: Improving customer experience with Analytics: https://www.nutanix.com/2017/04/20/improving-nutanix-customer-experience-analytics/
[4]
"Tracking down the Villains: Outlier Detection at Netflix", at https://medium.com/netflix-techblog/tracking-down-the-villains-outlier-detection-at-netflix-40360b31732
[5]
HCIBench at https://labs.vmware.com/flings/hcibench
[6]
Flexible I/O Tester by Jen Axboe at https://github.com/axboe/fio
[7]
Wang, C., Kavulya, S.P., Tan, J., Hu, L., Kutare, M., Kasick, M., Schwan, K., Narasimhan, P. and Gandhi, R., "Performance troubleshooting in data centers: an annotated bibliography?" ACM SIGOPS Operating Systems Review, 47(3), pp.50--62.
[8]
Wavefront Query Language (WQL) at https://docs.wavefront.com/query_language_reference.html
[9]
Marvasti, M.A., Poghosyan, A.V., Harutyunyan, A.N. and Grigoryan, N., 2014. An Enterprise Dynamic Thresholding System. In ICAC (pp. 129--135)
[10]
Sivasubramanian, S., 2012, May. Amazon dynamoDB: a seamlessly scalable non-relational database service. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (pp. 729--730). ACM.
[11]
Amazon Relational Database Service (RDS) at https://aws.amazon.com/rds/

Cited By

View all
  • (2022)Performance Analysis in HyperFlex and vSAN Hyper Convergence Platforms for Online Course ConsiderationIEEE Access10.1109/ACCESS.2022.322443510(124464-124474)Online publication date: 2022

Index Terms

  1. A Cloud Performance Analytics Framework to Support Online Performance Diagnosis and Monitoring Tools

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICPE '19: Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering
    April 2019
    348 pages
    ISBN:9781450362399
    DOI:10.1145/3297663
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 April 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Short-paper

    Conference

    ICPE '19

    Acceptance Rates

    ICPE '19 Paper Acceptance Rate 13 of 71 submissions, 18%;
    Overall Acceptance Rate 252 of 851 submissions, 30%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Performance Analysis in HyperFlex and vSAN Hyper Convergence Platforms for Online Course ConsiderationIEEE Access10.1109/ACCESS.2022.322443510(124464-124474)Online publication date: 2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media