loading page

Graph-Based Root Cause Localization in Microservice Systems with Protection Mechanisms
  • +1
  • Haitao Zhang,
  • Wei Tian,
  • Neng Yang,
  • Yepeng Zhang
Haitao Zhang
Beijing University of Posts and Telecommunications School of Computer Science

Corresponding Author:[email protected]

Author Profile
Wei Tian
Beijing University of Posts and Telecommunications School of Computer Science
Author Profile
Neng Yang
Beijing University of Posts and Telecommunications School of Computer Science
Author Profile
Yepeng Zhang
Beijing University of Posts and Telecommunications School of Computer Science
Author Profile

Abstract

Nowadays, the protection mechanisms are introduced into microservice systems to ensure the stable operation of services. However, existing approaches ignore the impact of protection mechanisms on the root cause localization of abnormal services. Specifically, the circuit breaking and rate limiting mechanisms can refuse service requests and thus change the way of anomaly propagation. Moreover, different service request frequencies and response time make service dependencies change dynamically, resulting in different probabilities of anomaly propagation among services. In this paper, we propose a novel framework named MicroGBPM to locate the root cause of abnormal services, which considers the impact of the protection mechanisms. We model anomaly propagation among services as a dynamically constructed service attributed graph with metrics and traces when a failure occurs. To eliminate the impact of the protection mechanisms, we design a two-stage dynamic calibration strategy to adjust the probability of anomaly propagation among services. Then we propose a random walking approach to calculate the root cause results by using the PageRank algorithm. The experimental results show that MicroGBPM improves the accuracy of root cause localization compared to other approaches in microservice systems with protection mechanisms.