skip to main content
10.1145/3629527.3651426acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article
Free Access

Context-aware Root Cause Localization in Distributed Traces Using Social Network Analysis (Work In Progress paper)

Published:07 May 2024Publication History

ABSTRACT

The complexity of microservices and their distributed nature necessitates constant monitoring and tracing of their execution to identify performance problems and underlying root causes. However, the large volume of collected data and the complexity of distributed communications pose challenges in identifying and locating abnormal services. In this paper, we propose a novel approach that takes into consideration the importance of execution contexts in propagating and localizing performance root causes. We achieve this by integrating social network analysis techniques with spectrum analysis. To evaluate our proposed approach, we conducted an experiment using a real-world benchmark, and we observed promising preliminary results, with a success rate of 91.3% in correctly identifying the primary root cause (top-1), and a perfect 100% success rate in finding the root cause within the top three candidates (top-3).

References

  1. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10 (2008), P10008.Google ScholarGoogle ScholarCross RefCross Ref
  2. Lizhe Chen, Ji Wu, Haiyan Yang, and Kui Zhang. 2022. Does PageRank apply to service ranking in microservice regression testing? Software Quality Journal 30, 3 (2022), 757--779.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. James A. Jones, Mary Jean Harrold, and John Stasko. 2002. Visualization of Test Information to Assist Fault Localization. In Proceedings of the 24th International Conference on Software Engineering (Orlando, Florida) (ICSE '02). Association for Computing Machinery, New York, NY, USA, 467--477. https://doi.org/10.1145/ 581339.581397Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Zeyan Li, Junjie Chen, Rui Jiao, Nengwen Zhao, Zhijun Wang, Shuwei Zhang, Yanjun Wu, Long Jiang, Leiqin Yan, Zikai Wang, Zhekang Chen, Wenchi Zhang, Xiaohui Nie, Kaixin Sui, and Dan Pei. 2021. Practical Root Cause Localization for Microservice Systems via Trace Analysis. In 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS). 1--10. https://doi.org/10.1109/ IWQOS52092.2021.9521340Google ScholarGoogle ScholarCross RefCross Ref
  5. Zeyan Li, Nengwen Zhao, Mingjie Li, Xianglin Lu, LixinWang, Dongdong Chang, Xiaohui Nie, Li Cao, Wenchi Zhang, Kaixin Sui, et al. 2022. Actionable and interpretable fault localization for recurring failures in online service systems. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 996--1008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Zeyan Li, Nengwen Zhao, Shenglin Zhang, Yongqian Sun, Pengfei Chen, Xidao Wen, Minghua Ma, and Dan Pei. 2022. Constructing large-scale real-world benchmark datasets for AIOps. arXiv preprint arXiv:2208.03938 (2022).Google ScholarGoogle Scholar
  7. Jackson A Prado Lima and Silvia R Vergilio. 2020. Test Case Prioritization in Continuous Integration environments: A systematic mapping study. Information and Software Technology 121 (2020), 106268.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. JinJin Lin, Pengfei Chen, and Zibin Zheng. 2018. Microscope: Pinpoint performance issues with causal graphs in micro-service environments. In Service- Oriented Computing: 16th International Conference, ICSOC 2018, Hangzhou, China, November 12--15, 2018, Proceedings 16. Springer, 3--20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Leonardo Mariani, Cristina Monni, Mauro Pezzé, Oliviero Riganelli, and Rui Xin. 2018. Localizing faults in cloud systems. In 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST). IEEE, 262--273.Google ScholarGoogle ScholarCross RefCross Ref
  10. Lee Naish, Hua Jie Lee, and Kotagiri Ramamohanarao. 2011. A model for spectrabased software diagnosis. ACM Transactions on software engineering and methodology (TOSEM) 20, 3 (2011), 1--32.Google ScholarGoogle Scholar
  11. Austin Parker, Daniel Spoonhower, Jonathan Mace, Ben Sigelman, and Rebecca Isaacs. 2020. Distributed tracing in practice: Instrumenting, analyzing, and debugging microservices. O'Reilly Media.Google ScholarGoogle Scholar
  12. Jacopo Soldani and Antonio Brogi. 2022. Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey. ACM Computing Surveys (CSUR) 55, 3 (2022), 1--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jörg Thalheim, Antonio Rodrigues, Istemi Ekin Akkus, Pramod Bhatotia, Ruichuan Chen, Bimal Viswanath, Lei Jiao, and Christof Fetzer. 2017. Sieve: Actionable insights from monitored metrics in distributed systems. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference. 14--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ji Wang and Naser Ezzati-Jivan. 2020. Enhanced execution trace abstraction approach using social network analysis methods. Softwaretechnik-Trends 40, 3 (2020), 58--60.Google ScholarGoogle Scholar
  15. Li Wu, Johan Tordsson, Jasmin Bogatinovski, Erik Elmroth, and Odej Kao. 2021. MicroDiag: Fine-grained Performance Diagnosis for Microservice Systems. In 2021 IEEE/ACM International Workshop on Cloud Intelligence (CloudIntelligence). 31--36. https://doi.org/10.1109/CloudIntelligence52565.2021.00015Google ScholarGoogle ScholarCross RefCross Ref
  16. W. Xing and A. Ghorbani. 2004. Weighted PageRank algorithm. In Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004. 305--314. https://doi.org/10.1109/DNSR.2004.1344743Google ScholarGoogle ScholarCross RefCross Ref
  17. Zihao Ye, Pengfei Chen, and Guangba Yu. 2021. T-Rank:A Lightweight Spectrum based Fault Localization Approach for Microservice Systems. In 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 416--425. https://doi.org/10.1109/CCGrid51090.2021.00051Google ScholarGoogle ScholarCross RefCross Ref
  18. Guangba Yu, Pengfei Chen, Hongyang Chen, Zijie Guan, Zicheng Huang, Linxiao Jing, Tianjun Weng, Xinmeng Sun, and Xiaoyun Li. 2021. MicroRank: End-to- End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments. In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia) (WWW '21). Association for Computing Machinery, New York, NY, USA, 3087--3098. https://doi.org/10.1145/3442381.3449905Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Guangba Yu, Zicheng Huang, and Pengfei Chen. 2021. TraceRank: Abnormal service localization with dis-aggregated end-to-end tracing data in cloud native systems. Journal of Software: Evolution and Process (2021), e2413.Google ScholarGoogle Scholar

Index Terms

  1. Context-aware Root Cause Localization in Distributed Traces Using Social Network Analysis (Work In Progress paper)

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICPE '24 Companion: Companion of the 15th ACM/SPEC International Conference on Performance Engineering
        May 2024
        305 pages
        ISBN:9798400704451
        DOI:10.1145/3629527

        Copyright © 2024 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 May 2024

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate252of851submissions,30%
      • Article Metrics

        • Downloads (Last 12 months)15
        • Downloads (Last 6 weeks)15

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader