Minimizing False Positives of a Decision Tree Classifier for Intrusion Detection on the Internet

Ohta, Satoru; Kurebayashi, Ryosuke; Kobayashi, Kiyoshi

doi:10.1007/s10922-008-9102-4

Minimizing False Positives of a Decision Tree Classifier for Intrusion Detection on the Internet

Published: 15 August 2008

Volume 16, pages 399–419, (2008)
Cite this article

Journal of Network and Systems Management Aims and scope Submit manuscript

Satoru Ohta¹,
Ryosuke Kurebayashi² &
Kiyoshi Kobayashi³

609 Accesses
15 Citations
Explore all metrics

Abstract

Machine learning or data mining technologies are often used in network intrusion detection systems. An intrusion detection system based on machine learning utilizes a classifier to infer the current state from the observed traffic attributes. The problem with learning-based intrusion detection is that it leads to false positives and so incurs unnecessary additional operation costs. This paper investigates a method to decrease the false positives generated by an intrusion detection system that employs a decision tree as its classifier. The paper first points out that the information-gain criterion used in previous studies to select the attributes in the tree-constructing algorithm is not effective in achieving low false positive rates. Instead of the information-gain criterion, this paper proposes a new function that evaluates the goodness of an attribute by considering the significance of error types. The proposed function can successfully choose an attribute that suppresses false positives from the given attribute set and the effectiveness of using it is confirmed experimentally. This paper also examines the more trivial leaf rewriting approach to benchmark the proposed method. The comparison shows that the proposed attribute evaluation function yields better solutions than the leaf rewriting approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intrusion detection system model: a white-box decision tree with feature selection optimization

Article 03 January 2025

A Decision Tree Induction Algorithm for Efficient Rule Evaluation Using Shannon’s Expansion

An Efficient Automated Intrusion Detection System Using Hybrid Decision Tree

References

Cabrera, J.B.D., Lewis, L., Qin, X., Lee, W., Mehra, R.K.: Proactive intrusion detection and distributed denial of service attacks—a case study in security management. J. Netw. Syst. Manage. 10(2), 225–254 (2002)
Article Google Scholar
Jiang, J., Papavassiliou, S.: Detecting network attacks in the internet via statistical network traffic normality prediction. J. Netw. Syst. Manage. 12(1), 51–72 (2004)
Article Google Scholar
Koutepas, G., Stamatelopoulos, F., Maglaris, B.: Distributed management architecture for cooperative detection and reaction to DDoS attacks. J. Netw. Syst. Manage. 12(1), 73–94 (2004)
Article Google Scholar
Marks, D.G., Mell, P., Stinson, M.: Optimizing the scalability of network intrusion detection systems using mobile agents. J. Netw. Syst. Manage. 12(1), 95–110 (2004)
Article Google Scholar
Kulkarni, A., Bush, S.: Detecting distributed denial-of-service attacks using Kolmogorov complexity metrics. J. Netw. Syst. Manage. 14(1), 69–80 (2006)
Article Google Scholar
Lee, W., Stolfo, S.J., Mok, K.W.: A data mining framework for building intrusion detection models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, pp. 120–132 (May 1999)
Hayashi, T., Fung, S., Kurebayashi, R., Kobayashi, K., Ohta, S.: A performance tuning method in intrusion detection using Bayesian networks, IPSJ SIG Technical Report, 2004-CSEC-27, pp. 69–76 (2004)
Amor, N.B., Benferhat, S., Elouedi, Z.: Naive Bayes vs. decision trees in intrusion detection systems. In: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 420–424, Nicosia, Cyprus (2004)
Abbes, T., Bouhoula, A., Rusinowitch, M.: Protocol analysis in intrusion detection using decision tree. In: Proceedings of the ITCC’04, pp. 404–408, Las Vegas, NV (2004)
Kruegel, C., Tosh, T.: Using decision trees to improve signature-based intrusion detection. In: Proceedings of the RAID 2003, LNCS2820, pp. 173–191, Pittsburg, PA (September 2003)
Stein, G., Chen, B., Wu, A.S., Hua, K.A.: Decision tree classifier for network intrusion detection with GA-based feature selection. In: Proceedings of the 43rd ACM Southeast Regional Conference, pp. 136–141, Kennesaw, GA (2005)
Cannady, J.: Artificial neural networks for misuse detection. In: Proceedings of the NISSC’98, pp. 443–456, Arlington, VA (1998)
Bivens, A., Palagiri, C., Smith, R., Szymanski, B.: Network-based intrusion detection using neural networks. In: Proceedings of the ANNIE2002, pp. 579–584, St. Louis, MO (2002)
Pan, Z.-s., Chen, S.-c., Hu, G.-b., Zhang, D.-q.: Hybrid neural network and C4.5 for misuse detection. In: Proceedings of the 2nd International Conference on Machine Learning and Cybernetics, Xi’an, China (2003)
Yamanishi, K., Takeuchi, J.: Discovering outlier filtering rules from unlabeled data. In: Proceedings of the 2001 International Conference on Knowledge Discovery and Data Mining (KDD 01), pp. 389–394, San Francisco, CA, USA (August 2001)
Axelsson, S.: The base-rate fallacy and its implications for the difficulty of intrusion detection. In: Proceedings of the 6th ACM Conference on Computer Communications Security, pp. 1–7, Kent Ridge Digital Labs, Singapore (1999)
Moore, D., Voelker, G.M., Savage, S.: Inferring internet denial-of-service activity. In: Proceedings of the 10th USENIX Security Symposium, Washington, DC (August 2001)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Los Altos, CA (1993)
Google Scholar
Brulé, J.D., Johnson, R.A., Kletsky, E.J.: Diagnosis of equipment failures. IRE Trans. Reliab. Control RQC-9, 23–34 (1960)
Google Scholar
Goodman, R.M., Smyth, P.: Decision tree design from a communication theory standpoint. IEEE Trans. Inf. Theory IT-34(5), 979–994 (1988)
Article MathSciNet Google Scholar
Berry, M.J.A.: Mining the Wallet—Use Data Mining to Determine the Best Next Offer, http://www.intelligententerprise.com/db_area/archives/1999/992206/decision.html (1999)
Spiliopoulou, M.: Data Mining for Business Applications Classification, http://omen.cs.uni-magdeburg.de:8080/iti_kmd/lehre/WS03/Slides/DM_Marketing/KDDmarkt_dts.pdf (2003)
Ohta, S., Kanaya, F.: Optimal decision tree design based on information theoretical cost bound. IEICE Trans. E74(9), 2523–2530 (1991)
Google Scholar
KDD Cup 99 data, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (1999)

Download references

Author information

Authors and Affiliations

Faculty of Engineering, Toyama Prefectural University, 5180 Kurokawa, Imizu-shi, Toyama, 939-0398, Japan
Satoru Ohta
Network Solutions Busimess Headquarters, NTT Advanced Technology Corporation, Totsuka-ku, Yokohama-shi, Japan
Ryosuke Kurebayashi
NTT Network Innovation Laboratories, NTT Corporation, Yokosuka-shi, Japan
Kiyoshi Kobayashi

Authors

Satoru Ohta
View author publications
You can also search for this author inPubMed Google Scholar
Ryosuke Kurebayashi
View author publications
You can also search for this author inPubMed Google Scholar
Kiyoshi Kobayashi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Satoru Ohta.

Appendix: Leaf Rewriting Heuristic

Section 4 uses the following heuristic to evaluate the leaf rewriting approach.

When a decision tree is built, each leaf is assigned a subset of the training data set. An element of the subset is the instance that reaches the leaf as a result of the decision process. For a leaf labeled with an attack state (referred to as an “attack leaf” hereafter), we consider that u _m instances have state s _m (s _m ∈ S). Let s ₁ be the normal state and let the leaf be labeled s ₂. Then, the employed heuristic computes value e as follows.

$$ e = \frac{{u_{1} + \varepsilon }}{{\sum\nolimits_{m = 2}^{M} {u_{m} } + \varepsilon }} $$

(17)

where ε is a positive constant. In Sect. 4, ε was set at 0.5.

For an attack leaf associated with one or more instances, e < 1 because u ₁ ≤ u ₂. If value e is large, the data subset includes many normal state instances compared to the attack state instances. Thus, it is rational to change the leaf label to “normal.” Because of this, the heuristic sorts the list of attack leaves in decreasing order of e. Let K be the number of attack leaves. The heuristic chooses the first K − L attack leaves in the sorted list, and changes their labels to “normal”. The labels of the remaining L attack leaves are unchanged. The decrease in false positives is controlled by parameter L. If L > K, the heuristic does not modify any leaves.

If no data instances are associated with an attack leaf, u ₁ is replaced by the number of normal state instances at its parent node. This means that e > 1 and e is larger for such a leaf than for a leaf associated with data instances. If a leaf is not associated with any instances, the decision process will reach the leaf only infrequently for the test data as well. Thus, rewriting the label is not likely to increase false negatives for such a leaf. Thus, it is better to change the leaf label than those of other leaves by setting a larger value for e.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ohta, S., Kurebayashi, R. & Kobayashi, K. Minimizing False Positives of a Decision Tree Classifier for Intrusion Detection on the Internet. J Netw Syst Manage 16, 399–419 (2008). https://doi.org/10.1007/s10922-008-9102-4

Download citation

Published: 15 August 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s10922-008-9102-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimizing False Positives of a Decision Tree Classifier for Intrusion Detection on the Internet

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Intrusion detection system model: a white-box decision tree with feature selection optimization

A Decision Tree Induction Algorithm for Efficient Rule Evaluation Using Shannon’s Expansion

An Efficient Automated Intrusion Detection System Using Hybrid Decision Tree

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Leaf Rewriting Heuristic

Appendix: Leaf Rewriting Heuristic

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now