skip to main content
10.1145/3240765.3243476guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Failure Prediction Based on Anomaly Detection for Complex Core Routers

Published: 05 November 2018 Publication History

Abstract

Data-driven prognostic health management is essential to ensure high reliability and rapid error recovery in commercial core router systems. The effectiveness of prognostic health management depends on whether failures can be accurately predicted with sufficient lead time. This paper describes how time-series analysis and machine-learning techniques can be used to detect anomalies and predict failures in complex core router systems. First both a feature-categorization-based hybrid method and a changepoint-based method have been developed to detect anomalies in time-varying features with different statistical characteristics. Next, a SVM-based failure predictor is developed to predict both categories and lead time of system failures from collected anomalies. A comprehensive set of experimental results is presented for data collected during 30 days of field operation from over 20 core routers deployed by customers of a major telecom company.

References

[1]
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2008. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41 (2008), 15:1–15:58.
[2]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Journal of Machine Learning 20 (1995), 273–297.
[3]
Ana Gainaru et al. 2012. Fault prediction under the microscope: A closer look into HPC systems. In Proc. Int.l Conf High Performance Computing, Networking, Storage and Analysis. 77:1–77:11.
[4]
Ran Giladi. 2008. Network Processors: Architecture, Programming, and Implementation. Morgan Kaufmann.
[5]
S. Jin, Z. Zhang, K. Chakrabarty, and X. Gu. 2016. Accurate Anomaly Detection Using Correlation-Based Time-Series Analysis in a Core Router System. In Proc. IEEE International Test Conference (ITC).
[6]
S. Jin, Z. Zhang, K. Chakrabarty, and X. Gu. 2017. Changepoint-based anomaly detection in a core router system. In Proc. IEEE International Test Conference (ITC).
[7]
S. Jin, Z. Zhang, G. Chen, K. Chakrabarty, and X. Gu. 2016. Anomaly-Detection-Based Failure Prediction in a Core Router System. In Proc. International Conference on Advances in System Testing and Validation Lifecycle (VALID).
[8]
Muriel Médard and Steven S Lumetta. 2003. Network reliability and fault tolerance. Encyclopedia of Telecomunications (2003).
[9]
Animesh Patcha and Jung-Min Park. 2007. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Computer Networks 51 (2007), 3448–3470.
[10]
Prasenjit Kumar Patra, Harshpreet Singh, and Gurpreet Singh. 2013. Fault tolerance techniques and comparative implementation in cloud computing. International Journal of Computer Applications 64 (2013), 1–6.
[11]
Bianca Schroeder et al. 2006. A large-scale study of failures in high-performance computing systems. In Proceedings of the International Conference on Dependable Systems and Networks. 249–258.

Cited By

View all
  • (2024)Per-Packet Traffic Measurement in Storage, Computation and Bandwidth Limited Data PlaneIEEE/ACM Transactions on Networking10.1109/TNET.2024.340401132:5(3730-3742)Online publication date: Oct-2024
  • (2023)An Intelligent Monitoring Algorithm to Detect Dependencies between Test Cases in the Manual Integration Process2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)10.1109/ICSTW58534.2023.00066(353-360)Online publication date: Apr-2023
  • (2022)A Data-Driven Fault Tree for a Time Causality Analysis in an Aging SystemAlgorithms10.3390/a1506017815:6(178)Online publication date: 24-May-2022
  • Show More Cited By

Index Terms

  1. Failure Prediction Based on Anomaly Detection for Complex Core Routers
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
      Nov 2018
      939 pages

      Publisher

      IEEE Press

      Publication History

      Published: 05 November 2018

      Permissions

      Request permissions for this article.

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Per-Packet Traffic Measurement in Storage, Computation and Bandwidth Limited Data PlaneIEEE/ACM Transactions on Networking10.1109/TNET.2024.340401132:5(3730-3742)Online publication date: Oct-2024
      • (2023)An Intelligent Monitoring Algorithm to Detect Dependencies between Test Cases in the Manual Integration Process2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)10.1109/ICSTW58534.2023.00066(353-360)Online publication date: Apr-2023
      • (2022)A Data-Driven Fault Tree for a Time Causality Analysis in an Aging SystemAlgorithms10.3390/a1506017815:6(178)Online publication date: 24-May-2022
      • (2022)Correlation-based Clustering of Telecommunication Equipment in Smart Grid2022 14th International Conference on Computational Intelligence and Communication Networks (CICN)10.1109/CICN56167.2022.10008306(667-671)Online publication date: 4-Dec-2022
      • (2022)An Algorithm to Speed up Network Recovery Fault Point Estimation and Recovery Action RecommendationJournal of Network and Systems Management10.1007/s10922-022-09643-x30:2Online publication date: 14-Feb-2022
      • (2019)Feature-Based Time Series Classification for Service Request Opening Prediction in the Telecom IndustryProgress in Artificial Intelligence10.1007/978-3-030-30244-3_11(120-132)Online publication date: 30-Aug-2019

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media