Abstract
Anomaly detection in data streams requires a signal of an unusual event, but an actionable response requires diagnostics. Consequently, an important task is to isolate to the few key attributes that contribute to the signal from among a large collection. We introduce this contributor problem to the machine learning community and present a solution for monitoring in modern systems (with nonlinear reference conditions, high dimensions, categorical attributes, missing data, and so forth). The objective is to identify attributes that contribute to a signal, for both individual and multiple anomalies, or from several anomaly groups. Although related to the feature selection problem, the extreme sparseness of anomalies leads to scores that are designed specifically for the contributors problem. Statistical criteria are provided to quantitatively address decision rules and false alarms and the method can be computed quickly. Comparisons are made to traditional contribution plots.
This material is based upon work supported by the National Science Foundation under Grant No. 0743160.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hwang, W., Runger, G.C., Tuv, E.: Multivariate statistical process control with artificial contrasts. IIE Transactions 39(6), 659–669 (2007)
Hu, J., Runger, G.C., Tuv, E.: Tuned artificial contrasts to detect signals. International Journal of Production Research 45(23), 5527–5534 (2007)
Fei, T.L., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining (ICDM), pp. 413–422 (2008)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
Hotelling, H.: Multivariate quality control-illustrated by the air testing of sample bombsights. In: Eisenhart, C., Hastay, M.W., Wallis, W.A. (eds.) Techniques of Statistical Analysis, pp. 111–184. McGraw-Hill, New York (1947)
Miller, P., Swanson, R., Heckler, C.: Contribution plots: A missing link in multivariate quality control. Applied Mathematics and Computer Science 8(4), 775–792 (1998)
Nomikos, P., MacGregor, J.F.: Multivariate spc charts for monitoring batch processes. Technometrics 37(1), 41–59 (1995)
Hu, J., Runger, G., Tuv, E.: Contributors to a signal from an artificial contrast. In: Informatics in Control, Automation and Robotics II, pp. 71–78. Springer, Heidelberg (2007)
Tuv, E.: Ensemble learning and feature selection. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (2006)
Tuv, E., Borisov, A., Torkkola, K.: Feature selection using ensemble based ranking against artificial contrasts. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN) (2006)
Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Best subset feature selection with ensembles, artificial variables, and redundancy elimination. Journal of Machine Learning Research (2008) (to appear)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Borisov, A., Runger, G., Tuv, E. (2009). Contributor Diagnostics for Anomaly Detection. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds) Artificial Neural Networks – ICANN 2009. ICANN 2009. Lecture Notes in Computer Science, vol 5769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04277-5_95
Download citation
DOI: https://doi.org/10.1007/978-3-642-04277-5_95
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04276-8
Online ISBN: 978-3-642-04277-5
eBook Packages: Computer ScienceComputer Science (R0)