Abstract
In this paper, we focus on how to use random forests based methods to improve the anomaly detection rate for streaming datasets.
The key concept in a current work [12] is to build a random forest where in any tree, at any internal node, a feature is randomly selected and the associated data space is partitioned in half. However, the model parameters were pre-defined and the efficiency on applying this model for various conditions is not discussed. In this paper, we first give mathematical justification of required tree height and number of trees by casting the problem as a classical coupon collector problem. Then we design a majority voting score combination strategy to combine the results from different anomaly detection trees. Finally, we apply feature clustering to group the correlated features together in order to find the anomalies jointly determined by subsets of features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C.: On abnormality detection in spuriously populated data streams. In: Proceedings of the 2005 SIAM International Conference on Data Mining, SIAM 2005, pp. 80â91 (2005)
Beckman, R.J., Cook, R.D.: Outlier.......... s. Technometrics 25(2), 119â149 (1983). https://doi.org/10.1080/00401706.1983.10487840
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)
Chen, Q., Luley, R., Wu, Q., Bishop, M., Linderman, R.W., Qiu, Q.: AnRAD: a neuromorphic anomaly detection framework for massive concurrent data streams. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1622â1636 (2017)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861â874 (2006)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972â976 (2007)
Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250â2267 (2014)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29â36 (1982)
Motwani, R., Raghavan, P.: Randomized Algorithms. Chapman & Hall/CRC, London (2010)
Pokrajac, D., Lazarevic, A., Latecki, L.J.: Incremental local outlier detection for data streams. In: IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2007, pp. 504â515. IEEE (2007)
Tan, S.C., Ting, K.M., Liu, T.F.: Fast anomaly detection for streaming data. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, no. 1, p. 1511 (2011)
Yamanishi, K., Takeuchi, J.-I.: A unifying framework for detecting outliers and change points from non-stationary time series data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 676â681. ACM (2002)
Zhao, Z., Mehrotra, K.G., Mohan, C.K.: Ensemble algorithms for unsupervised anomaly detection. In: Ali, M., Kwon, Y.S., Lee, C.-H., Kim, J., Kim, Y. (eds.) IEA/AIE 2015. LNCS (LNAI), vol. 9101, pp. 514â525. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19066-2_50
Zikeba, M., Tomczak, S.K., Tomczak, J.M.: Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst. Appl. 58, 93â101 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Âİ 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Zhao, Z., Mehrotra, K.G., Mohan, C.K. (2018). Online Anomaly Detection Using Random Forest. In: Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M. (eds) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018. Lecture Notes in Computer Science(), vol 10868. Springer, Cham. https://doi.org/10.1007/978-3-319-92058-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-92058-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92057-3
Online ISBN: 978-3-319-92058-0
eBook Packages: Computer ScienceComputer Science (R0)