ABSTRACT
Network alarm triage refers to grouping and prioritizing a stream of low-level device health information to help operators find and fix problems. Today, this process tends to be largely manual because existing tools cannot easily evolve with the network. We present CueT, a system that uses interactive machine learning to learn from the triaging decisions of operators. It then uses that learning in novel visualizations to help them quickly and accurately triage alarms. Unlike prior interactive machine learning systems, CueT handles a highly dynamic environment where the groups of interest are not known a-priori and evolve constantly. A user study with real operators and data from a large network shows that CueT significantly improves the speed and accuracy of alarm triage compared to the network's current practice.
- Appleby, K., Goldszmidt, G., and Steinder, M. Layered Event Correlation Engine for Multi-Domain Server Farms. Proc. INM 2001, IEEE (2001), 329--344.Google ScholarCross Ref
- Basu, S., Fisher, D., Drucker, S.M., and Lu, H. Assisting Users with Clustering Tasks by Combining Metric Learning and Classification. Proc. AAAI 2010.Google ScholarCross Ref
- Brugnosi, S., Bruno, G., Manione, R., Montariolo, E., Paschetta, E., and Sisto, L. An Expert System for Real Time Fault Diagnosis of the Italian Telecommunications Network. Proc. INM 1993, IEEE (1993), 617--628. Google ScholarDigital Library
- des Jardins, M., MacGlashan, J., and Ferraioli, J. Interactive Visual Clustering. Proc. IUI 2007, ACM Press (2007), 361--364. Google ScholarDigital Library
- EMC Ionix, http://www.emc.com/products/family/ionix-family.ht.Google Scholar
- Fails, J.A. and Olsen, Jr., D.R. Interactive Machine Learning. Proc. IUI 2003, ACM Press (2003), 39--45. Google ScholarDigital Library
- Fisher, D., Maltz, D.A., Greenberg, A., Wang, X., Warncke, H., Robertson, G., and Czerwinski, M. Using Visualization to Support Network and Application Management in a Data Center. Proc. INM 2008, IEEE (2008), 1--6.Google ScholarCross Ref
- Fogarty, J., Tan, D., Kapoor, A., and Winder, S. CueFlik: Interactive Concept Learning in Image Search. Proc. CHI 2008, ACM Press (2008), 29--38. Google ScholarDigital Library
- Gardner, R.D. and Harle, D.A. Methods and Systems for Alarm Correlation. Proc. GLOBECOM 1996, IEEE (1996), 136--140.Google ScholarCross Ref
- HP OpenView, http://openview.hp.co.Google Scholar
- Jain, P., Kulis, B., Dhillon, I.S., and Grauman, K. Online Metric Learning and Fast Similarity Search. Proc. NIPS 2008, (2008), 761--768.Google Scholar
- Jakobson, G. and Weissman, M.D. Alarm Correlation: Correlating multiple network alarms improves telecommunications network surveillance and fault management. IEEE Network 7, 6 (1993), 52--59.Google ScholarDigital Library
- Klementtinen, M., Mannila, H., and Toivonen, H. Rule Discovery in Telecommunication Alarm Data. J. Network and Systems Management 7, 4 (1999), 395--423. Google ScholarDigital Library
- Lakkaraju, K, Yurcik, W., and Lee, A.J. NVisionIP: Network Visualizations of System State for Security Situational Awareness. Proc. VizSEC/DMSEC 2004, ACM Press (2004), 65--72. Google ScholarDigital Library
- Liu, G., Mok, A.K., and Yang, E.J. Composite Events for Network Event Correlation. Proc. INM 1999, IEEE (1999), 247--260.Google ScholarCross Ref
- Spring, N., Mahajan, R., Wetherall, D., and Anderson, T. Measuring ISP Topologies with Rocketfuel. Proc. SIGCOMM 2002, ACM Press (2002), 133--145. Google ScholarDigital Library
- Steinder, M. and Sethi, A.S. A Survey of Fault Localization Techniques in Computer Networks. Science of Computer Programming 53, (2004), 165--194.Google ScholarCross Ref
- Yemini, S., Kliger, S., Mozes, E., Yemini, Y., and Ohsie, D. High Speed and Robust Event Correlation. IEEE Communications Magazine 34, 5 (1996), 82--90. Google ScholarDigital Library
Index Terms
- CueT: human-guided fast and accurate network alarm triage
Recommendations
Visualizing the performance of classification algorithms with additional re-annotated data
CHI EA '13: CHI '13 Extended Abstracts on Human Factors in Computing SystemsThe performance of machine learning (ML) classification algorithms in an open-ended problem with manual labels is difficult to assess, because errors can exist both in the classification and the data. This paper introduces a new visualization, confusion ...
Guiding Reinforcement Learning Exploration Using Natural Language
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsIn this work we present a technique for using natural language to help reinforcement learning generalize to unseen environments using neural machine translation techniques. These techniques are then integrated into policy shaping to make it more ...
CrowdHelp: application for improved emergency response through crowdsourced information
UbiComp '13 Adjunct: Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publicationEmergency resources are often insufficient to satisfy fully the demands for professional help and supplies after a public disaster. Furthermore, in a mass casualty situation, the emphasis shifts from ensuring the best possible outcome for each ...
Comments