skip to main content
10.1145/3427921.3450248acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

SuanMing: Explainable Prediction of Performance Degradations in Microservice Applications

Published: 09 April 2021 Publication History

Abstract

Application performance management (APM) tools are useful to observe the performance properties of an application during production. However, APM is normally purely reactive, that is, it can only report about current or past performance degradation. Although some approaches capable of predictive application monitoring have been proposed, they can only report a predicted degradation but cannot explain its root-cause, making it hard to prevent the expected degradation.
In this paper, we present SuanMing---a framework for predicting performance degradation of microservice applications running in cloud environments. SuanMing is able to predict future root causes for anticipated performance degradations and therefore aims at preventing performance degradations before they actually occur. We evaluate SuanMing on two realistic microservice applications, TeaStore and TrainTicket, and we show that our approach is able to predict and pinpoint performance degradations with an accuracy of over 90%.

References

[1]
Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, and Yuyang Wang. 2020. GluonTS: Probabilistic and Neural Time Series Modeling in Python. Journal of Machine Learning Research, Vol. 21, 116 (2020), 1--6.
[2]
Andre Bauer, Marwin Zufle, Nikolas Herbst, Albin Zehe, Andreas Hotho, and Samuel Kounev. 2020. Time Series Forecasting for Self-Aware Systems. Proc. IEEE, Vol. 108, 7 (2020), 1068--1093.
[3]
Christoph Bergmeir, Mauro Costantini, and José M. Benítez. 2014. On the usefulness of cross-validation for directional forecast evaluation. Computational Statistics & Data Analysis, Vol. 76 (2014), 132--143.
[4]
Ricardo Bianchini, Marcus Fontoura, Eli Cortez, Anand Bonde, Alexandre Muzio, Ana-Maria Constantin, Thomas Moscibroda, Gabriel Magalhaes, Girish Bablani, and Mark Russinovich. 2020. Toward ML-Centric Cloud Platforms. Commun. ACM, Vol. 63, 2 (2020), 50--59. https://doi.org/10.1145/3364684
[5]
Pedro Capelastegui, Alvaro Navas, Francisco Huertas, Rodrigo Garcia-Carmona, and Juan Carlos Dueñas. 2013. An online failure prediction system for private IaaS platforms. In Proceedings of the 2nd International Workshop on Dependability Issues in Cloud Computing (DISCCO '13). Association for Computing Machinery, New York, NY, USA, 1--3.
[6]
Alexander Clemm and Malte Hartwig. 2010. NETradamus: A forecasting system for system event messages. In IEEE/IFIP Network Operations and Management Symposium (NOMS) (2010), Yoshiaki Kiriha, Lisandro Zambenedetti Granville, Deep Medhi, Toshio Tonouchi, and Myung-Sup Kim (Eds.). IEEE, USA, 623--630. https://doi.org/10.1109/NOMS.2010.5488430
[7]
Simon Eismann, Cor-Paul Bezemer, Weiyi Shang, Dusan Okanovic, and Andre van Hoorn. 2020. Microservices: A Performance Tester's Dream or Nightmare?. In Proceedings of the 2020 ACM/SPEC International Conference on Performance Engineering (ICPE) (ICPE'20). ACM, New York, NY, USA, 12 pages. Acceptance Rate: 23.4% (15/64).
[8]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12, 85 (2011), 2825--2830.
[9]
Maria Fazio, Antonio Celesti, Rajiv Ranjan, Chang Liu, Lydia Chen, and Massimo Villari. 2016. Open Issues in Scheduling Microservices in the Cloud. IEEE Cloud Computing, Vol. 3, 5 (2016), 81--88.
[10]
Benito E. Flores. 1986. A pragmatic view of accuracy measurement in forecasting. Omega, Vol. 14, 2 (1986), 93--98.
[11]
Martin Fowler. 2015. Microservice Trade-Offs. https://martinfowler.com/articles/microservice-trade-offs.html
[12]
Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. 2019. Seer. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). Association for Computing Machinery, New York, NY, USA, 19--33.
[13]
Johannes Grohmann, Nikolas Herbst, Avi Chalbani, Yair Arian, Noam Peretz, and Samuel Kounev. 2020. A Taxonomy of Techniques for SLO Failure Prediction in Software Systems. Computers, Vol. 9, 1 (2020), 10.
[14]
Johannes Grohmann, Nikolas Herbst, Simon Spinner, and Samuel Kounev. 2017. Self-Tuning Resource Demand Estimation. In Proceedings of the 14th IEEE International Conference on Autonomic Computing (ICAC 2017). IEEE, USA, 21--26.
[15]
Johannes Grohmann, Patrick K. Nicholson, Jesus Omana Iglesias, Samuel Kounev, and Diego Lugones. 2019. Monitorless: Predicting Performance Degradation in Cloud Applications with Machine Learning. In Proceedings of the 20th International Middleware Conference (Davis, CA, USA) (Middleware '19). Association for Computing Machinery, New York, NY, USA, 149--162.
[16]
Xiaohui Gu, Spiros Papadimitriou, Philip S. Yu, and Shu-Ping Chang. 2008. Online Failure Forecast for Fault-Tolerant Data Stream Processing. In 2008 IEEE 24th International Conference on Data Engineering. IEEE, USA, 1388--1390.
[17]
Nikolas Herbst, Ayman Amin, Artur Andrzejak, Lars Grunske, Samuel Kounev, Ole J. Mengshoel, and Priya Sundararajan. 2017. Online Workload Forecasting. In Self-Aware Computing Systems, Samuel Kounev, Jeffrey O. Kephart, Xiaoyun Zhu, and Aleksandar Milenkoski (Eds.). Springer Verlag, Berlin Heidelberg, Germany, 529--553.
[18]
Pooyan Jamshidi, Claus Pahl, Nabor C. Mendonca, James Lewis, and Stefan Tilkov. 2018. Microservices: The Journey So Far and Challenges Ahead. IEEE Software, Vol. 35, 3 (2018), 24--35.
[19]
Hiranya Jayathilaka, Chandra Krintz, and Rich Wolski. 2017. Performance Monitoring and Root Cause Analysis for Cloud-hosted Web Applications. In Proceedings of the 26th International Conference on World Wide Web (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 469--478.
[20]
Anshul Jindal, Vladimir Podolskiy, and Michael Gerndt. 2019. Performance Modeling for Cloud Microservice Applications. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (ICPE '19). Association for Computing Machinery, New York, NY, USA, 25--32.
[21]
James Lewis and Martin Fowler. 2014. Microservices: a definition of this new architectural term. https://martinfowler.com/articles/microservices.html
[22]
Jinjin Lin, Pengfei Chen, and Zibin Zheng. 2018. Microscope: Pinpoint Performance Issues with Causal Graphs in Micro-service Environments. In Service-Oriented Computing, Claus Pahl, Maja Vukovic, Jianwei Yin, and Qi Yu (Eds.), Vol. 11236. Springer International Publishing, Cham, 3--20.
[23]
Leonardo Mariani, Mauro Pezzè, Oliviero Riganelli, and Rui Xin. 2020. Predicting failures in multi-tier distributed systems. Journal of Systems and Software, Vol. 161 (2020), 110464.
[24]
Burcu Ozcelik and Cemal Yilmaz. 2016. Seer: A Lightweight Online Failure Prediction Approach. IEEE Transactions on Software Engineering, Vol. 42, 1 (2016), 26--46.
[25]
Teerat Pitakrat, Jonas Grunert, Oliver Kabierschke, Fabian Keller, and Andre van Hoorn. 2014. A Framework for System Event Classification and Prediction by Means of Machine Learning. In Proceedings of the 8th International Conference on Performance Evaluation Methodologies and Tools (VALUETOOLS '14). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Brussels, BEL, 173--180.
[26]
Teerat Pitakrat, Dusan Okanovic, André van Hoorn, and Lars Grunske. 2018. Hora: Architecture-aware online failure prediction. Journal of Systems and Software, Vol. 137 (2018), 669--685.
[27]
Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report. Google, Inc.
[28]
Simon Spinner, Giuliano Casale, Fabian Brosig, and Samuel Kounev. 2015. Evaluating approaches to resource demand estimation. Performance Evaluation, Vol. 92 (2015), 51--71.
[29]
André van Hoorn, Jan Waller, and Wilhelm Hasselbring. 2012. Kieker. In Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering (ICPE 2012). ACM, New York, NY, USA, 247.
[30]
Joakim von Kistowski, Maximilian Deffner, and Samuel Kounev. 2018a. Run-Time Prediction of Power Consumption for Component Deployments. In 2018 IEEE International Conference on Autonomic Computing (ICAC). IEEE, USA, 151--156.
[31]
Joakim von Kistowski, Simon Eismann, Norbert Schmitt, Andre Bauer, Johannes Grohmann, and Samuel Kounev. 2018b. TeaStore: A Micro-Service Reference Application for Benchmarking, Modeling and Resource Management Research. In 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, USA, 223--236.
[32]
Ping Wang, Jingmin Xu, Meng Ma, Weilan Lin, Disheng Pan, Yuan Wang, and Pengfei Chen. 2018. CloudRanger: Root Cause Identification for Cloud Native Systems. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '18). IEEE Press, USA, 492--502.
[33]
Jianping Weng, Jessie Hui Wang, Jiahai Yang, and Yang Yang. 2018. Root Cause Analysis of Anomalies of Multitier Services in Public Clouds. IEEE/ACM Trans. Netw., Vol. 26, 4 (2018), 1646--1659.
[34]
Li Wu, Johan Tordsson, Erik Elmroth, and Odej Kao. 2020. MicroRCA: Root Cause Localization of Performance Issues in Microservices. In IEEE/IFIP Network Operations and Management Symposium (NOMS). IEEE, Budapest, Hungary, 1--9.
[35]
Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Chao Ji, Wenhai Li, and Dan Ding. 2018. Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study. IEEE Transactions on Software Engineering, Vol. 1, 01 (2018), 1--1.
[36]
Marwin Züfle, André Bauer, Nikolas Herbst, Valentin Curtef, and Samuel Kounev. 2017. Telescope: A Hybrid Forecast Method for Univariate Time Series. In Proceedings of the International work-conference on Time Series (ITISE 2017). Springer, Berlin Heidelberg, Germany.

Cited By

View all
  • (2025)OSCAR-P and aMLLibrary: Profiling and predicting the performance of FaaS-based applications in computing continuaJournal of Systems and Software10.1016/j.jss.2024.112282221(112282)Online publication date: Mar-2025
  • (2023)Towards Performance Management of Large-Scale Microservices Applications2023 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)10.1109/ACSOS-C58168.2023.00028(24-26)Online publication date: 25-Sep-2023
  • (2022)Challenges in regression test selection for end-to-end testing of microservice-based software systemsProceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test10.1145/3524481.3527217(1-5)Online publication date: 17-May-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPE '21: Proceedings of the ACM/SPEC International Conference on Performance Engineering
April 2021
301 pages
ISBN:9781450381949
DOI:10.1145/3427921
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. explainability
  2. forecasting
  3. microservices
  4. performance prediction

Qualifiers

  • Research-article

Conference

ICPE '21

Acceptance Rates

ICPE '21 Paper Acceptance Rate 16 of 61 submissions, 26%;
Overall Acceptance Rate 252 of 851 submissions, 30%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)5
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)OSCAR-P and aMLLibrary: Profiling and predicting the performance of FaaS-based applications in computing continuaJournal of Systems and Software10.1016/j.jss.2024.112282221(112282)Online publication date: Mar-2025
  • (2023)Towards Performance Management of Large-Scale Microservices Applications2023 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)10.1109/ACSOS-C58168.2023.00028(24-26)Online publication date: 25-Sep-2023
  • (2022)Challenges in regression test selection for end-to-end testing of microservice-based software systemsProceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test10.1145/3524481.3527217(1-5)Online publication date: 17-May-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media