Online Anomaly Detection Using Random Forest

Zhao, Zhiruo; Mehrotra, Kishan G.; Mohan, Chilukuri K.

doi:10.1007/978-3-319-92058-0_13

Zhiruo Zhao¹⁷,
Kishan G. Mehrotra¹⁷ &
Chilukuri K. Mohan¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10868))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

3328 Accesses
3 Citations

Abstract

In this paper, we focus on how to use random forests based methods to improve the anomaly detection rate for streaming datasets.

The key concept in a current work [12] is to build a random forest where in any tree, at any internal node, a feature is randomly selected and the associated data space is partitioned in half. However, the model parameters were pre-defined and the efficiency on applying this model for various conditions is not discussed. In this paper, we first give mathematical justification of required tree height and number of trees by casting the problem as a classical coupon collector problem. Then we design a majority voting score combination strategy to combine the results from different anomaly detection trees. Finally, we apply feature clustering to group the correlated features together in order to find the anomalies jointly determined by subsets of features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, C.C.: On abnormality detection in spuriously populated data streams. In: Proceedings of the 2005 SIAM International Conference on Data Mining, SIAM 2005, pp. 80–91 (2005)
Google Scholar
Beckman, R.J., Cook, R.D.: Outlier.......... s. Technometrics 25(2), 119–149 (1983). https://doi.org/10.1080/00401706.1983.10487840
Article MathSciNet MATH Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)
Article Google Scholar
Chen, Q., Luley, R., Wu, Q., Bishop, M., Linderman, R.W., Qiu, Q.: AnRAD: a neuromorphic anomaly detection framework for massive concurrent data streams. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1622–1636 (2017)
Article MathSciNet Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Article MathSciNet Google Scholar
Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014)
Article Google Scholar
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)
MATH Google Scholar
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)
Article Google Scholar
Motwani, R., Raghavan, P.: Randomized Algorithms. Chapman & Hall/CRC, London (2010)
MATH Google Scholar
Pokrajac, D., Lazarevic, A., Latecki, L.J.: Incremental local outlier detection for data streams. In: IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2007, pp. 504–515. IEEE (2007)
Google Scholar
Tan, S.C., Ting, K.M., Liu, T.F.: Fast anomaly detection for streaming data. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, no. 1, p. 1511 (2011)
Google Scholar
Yamanishi, K., Takeuchi, J.-I.: A unifying framework for detecting outliers and change points from non-stationary time series data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 676–681. ACM (2002)
Google Scholar
Zhao, Z., Mehrotra, K.G., Mohan, C.K.: Ensemble algorithms for unsupervised anomaly detection. In: Ali, M., Kwon, Y.S., Lee, C.-H., Kim, J., Kim, Y. (eds.) IEA/AIE 2015. LNCS (LNAI), vol. 9101, pp. 514–525. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19066-2_50
Chapter Google Scholar
Zikeba, M., Tomczak, S.K., Tomczak, J.M.: Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst. Appl. 58, 93–101 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Syracuse University, Syracuse, NY, USA
Zhiruo Zhao, Kishan G. Mehrotra & Chilukuri K. Mohan

Authors

Zhiruo Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Kishan G. Mehrotra
View author publications
You can also search for this author in PubMed Google Scholar
Chilukuri K. Mohan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiruo Zhao .

Editor information

Editors and Affiliations

University of Regina, Regina, SK, Canada
Malek Mouhoub
University of Regina, Regina, SK, Canada
Samira Sadaoui
Concordia University, Montreal, QC, Canada
Otmane Ait Mohamed
Texas State University, San Marcos, TX, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Z., Mehrotra, K.G., Mohan, C.K. (2018). Online Anomaly Detection Using Random Forest. In: Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M. (eds) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018. Lecture Notes in Computer Science(), vol 10868. Springer, Cham. https://doi.org/10.1007/978-3-319-92058-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-92058-0_13
Published: 30 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92057-3
Online ISBN: 978-3-319-92058-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics