Distribution Forest: An Anomaly Detection Method Based on Isolation Forest

Yao, Chengfei; Ma, Xiaoqing; Chen, Biao; Zhao, Xiaosong; Bai, Gang

doi:10.1007/978-3-030-29611-7_11

Chengfei Yao¹³,
Xiaoqing Ma¹³,
Biao Chen¹³,
Xiaosong Zhao¹⁴ &
…
Gang Bai¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11719))

Included in the following conference series:

International Symposium on Advanced Parallel Processing Technologies

1037 Accesses
2 Citations

Abstract

Anomaly detection refers to finding patterns in the data that do not meet expectations. Anomaly detection has a variety of application domains and scenarios, such as network intrusion detection, fraud detection and fault detection. This paper proposes a new anomaly detection method Distribution Forest (dForest) inspired by Isolation Forest (iForest). dForest builds an ensemble of special binary trees called distribution tree (dTree). The basic idea of our method is to guide the building of dTree by the distribution of data at each node. And each node of dTree is treated as a subspace of input space. When dForest is built, the anomalies have a shorter path length than the normal instances.

dForest has a different explanation from other methods. Compared with iForest, LOF and iNNE, the proposed method achieves competitive results in terms of AUC on different benchmark datasets. Also, dForest performs well in both semi-supervised and unsupervised anomaly detection modes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)
Article Google Scholar
He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24(9–10), 1641–1650 (2003)
Article Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T.: LOF: identifying density-based local outliers. In: ACM SIGMOD International Conference on Management of Data. ACM (2000)
Google Scholar
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J. 8(3–4), 237–253 (2000)
Article Google Scholar
Liu, F.T., Kai, M.T., Zhou, Z.H.: Isolation forest. In: Eighth IEEE International Conference on Data Mining (2009)
Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data 6(1), 1–39 (2012)
Article Google Scholar
Scholkopf, B.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2014)
Article Google Scholar
Williams, G., Baxter, R., He, H., Hawkins, S., Gu, L.: A comparative study of RNN for outlier detection in data mining. In: Proceedings of 2002 IEEE International Conference on Data Mining, ICDM 2003. IEEE (2002)
Google Scholar
Mahalanobis, P.C.: On the generalised distance in statistics. Proc. Natl. Inst. Sci. India 12, 49–55 (1936)
MATH Google Scholar
Maesschalck, R.D., Jouan-Rimbaud, D., Massart, D.L.: The Mahalanobis distance. Chemometr. Intell. Lab. Syst. 50(1), 1–18 (2000)
Article Google Scholar
Patil, N., Das, D., Pecht, M.: Anomaly detection for IGBTs using Mahalanobis distance. Microelectron. Reliab. 55(7), 1054–1059 (2015)
Article Google Scholar
Dua, D., Karra Taniskidou, E.: UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine, CA (2017). http://archive.ics.uci.edu/ml
Yamanishi, K., Takeuchi, J.I., Williams, G., Milne, P.: On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min. Knowl. Discov. 8(3), 275–300 (2004)
Article MathSciNet Google Scholar
Swersky, L., Marques, H.O., Sander, J., Campello, R.J.G.B., Zimek, A.: On the evaluation of outlier detection and one-class classification methods. In: IEEE International Conference on Data Science & Advanced Analytics. IEEE (2016)
Google Scholar
Bandaragoda, T.R., Ting, K.M., Albrecht, D., Liu, F.T., Zhu, Y., Wells, J.R.: Isolation-based anomaly detection using nearest-neighbor ensembles. Comput. Intell. 34, 968–998 (2018)
Article MathSciNet Google Scholar

Download references

Acknowledgement

This work is partially supported by the Natural Science Foundation of Tianjin (No.18ZXZNGX00200), the National Key Research and Development Program of China (2016YFC0400709), the Science and Technology Commission of Tianjin Binhai New Area (BHXQKJXM-PT-ZJSHJ-2017005), the Natural Science Foundation of Tianjin (18YFYZCG00060) and Nankai University (91922299).

Author information

Authors and Affiliations

College of Computer Science, Nankai University, Tianjin, China
Chengfei Yao, Xiaoqing Ma, Biao Chen & Gang Bai
Tianjin Public Security Profession College, Tianjin, China
Xiaosong Zhao

Authors

Chengfei Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqing Ma
View author publications
You can also search for this author in PubMed Google Scholar
Biao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaosong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Gang Bai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Bai .

Editor information

Editors and Affiliations

University of Minnesota, Minneapolis, MN, USA
Pen-Chung Yew
Chalmers University of Technology, Gothenburg, Sweden
Per Stenström
National University of Defense Technology, Changsha, China
Junjie Wu
Nankai University, Tianjin, China
Xiaoli Gong
Nankai University, Tianjin, China
Tao Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, C., Ma, X., Chen, B., Zhao, X., Bai, G. (2019). Distribution Forest: An Anomaly Detection Method Based on Isolation Forest. In: Yew, PC., Stenström, P., Wu, J., Gong, X., Li, T. (eds) Advanced Parallel Processing Technologies. APPT 2019. Lecture Notes in Computer Science(), vol 11719. Springer, Cham. https://doi.org/10.1007/978-3-030-29611-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-29611-7_11
Published: 09 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29610-0
Online ISBN: 978-3-030-29611-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)