A new intrusion detection system using support vector machines and hierarchical clustering

Khan, Latifur; Awad, Mamoun; Thuraisingham, Bhavani

doi:10.1007/s00778-006-0002-5

A new intrusion detection system using support vector machines and hierarchical clustering

Regular Paper
Published: 31 August 2006

Volume 16, pages 507–521, (2007)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Latifur Khan¹,
Mamoun Awad¹ &
Bhavani Thuraisingham¹

1135 Accesses
261 Citations
3 Altmetric
Explore all metrics

Abstract

Whenever an intrusion occurs, the security and value of a computer system is compromised. Network-based attacks make it difficult for legitimate users to access various network services by purposely occupying or sabotaging network resources and services. This can be done by sending large amounts of network traffic, exploiting well-known faults in networking services, and by overloading network hosts. Intrusion Detection attempts to detect computer attacks by examining various data records observed in processes on the network and it is split into two groups, anomaly detection systems and misuse detection systems. Anomaly detection is an attempt to search for malicious behavior that deviates from established normal patterns. Misuse detection is used to identify intrusions that match known attack scenarios. Our interest here is in anomaly detection and our proposed method is a scalable solution for detecting network-based anomalies. We use Support Vector Machines (SVM) for classification. The SVM is one of the most successful classification algorithms in the data mining area, but its long training time limits its use. This paper presents a study for enhancing the training time of SVM, specifically when dealing with large data sets, using hierarchical clustering analysis. We use the Dynamically Growing Self-Organizing Tree (DGSOT) algorithm for clustering because it has proved to overcome the drawbacks of traditional hierarchical clustering algorithms (e.g., hierarchical agglomerative clustering). Clustering analysis helps find the boundary points, which are the most qualified data points to train SVM, between two classes. We present a new approach of combination of SVM and DGSOT, which starts with an initial training set and expands it gradually using the clustering structure produced by the DGSOT algorithm. We compare our approach with the Rocchio Bundling technique and random selection in terms of accuracy loss and training time gain using a single benchmark real data set. We show that our proposed variations contribute significantly in improving the training process of SVM with high generalization accuracy and outperform the Rocchio Bundling technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trends in Application of Machine Learning to Network-Based Intrusion Detection Systems

Fusion of Misuse Detection with Anomaly Detection Technique for Novel Hybrid Network Intrusion Detection System

An Enhanced Intrusion Detection System Based on Clustering

References

Agarwal, D.K.: Shrinkage estimator generalizations of proximal support vector machines, In: Proceedings of the 8th International Conference Knowledge Discovery and Data Mining, pp. 173–182. Edmonton, Canada (2002)
Anderson, D., Frivold, T., Valdes, A.: Next-generation intrusion detection expert system (NIDES): a summary. Technical Report SRI-CSL-95-07. Computer Science Laboratory, SRI International, Menlo Park, CA (May 1995)
Google Scholar
Axelsson, S.: Research in intrusion detection systems: a survey. Technical Report TR 98-17 (revised in 1999). Chalmers University of Technology, Goteborg, Sweden (1999)
Google Scholar
Balcazar, J.L., Dai, Y., Watanabe, O.: A random sampling technique for training support vector machines for primal-form maximal-margin classifiers, algorithmic learning theory. In: Proceedings of the 12th International Conference, ALT 2001, p. 119. Washington, DC (2001)
Bivens, A., Palagiri, C., Smith, R., Szymanski, B., Embrechts, M.: Intelligent engineering systems through artificial neural networks. In: Proceedings of the ANNIE-2002, vol. 12, pp. 579–584. ASME Press, New York (2002)
Google Scholar
Branch, J., Bivens, A., Chan, C.-Y., Lee, T.-K., Szymanski, B.: Denial of service intrusion detection using time dependent deterministic finite automata. In: Proceedings of the Research Conference. RPI, Troy, NY (2002)
Google Scholar
Cannady, J.: Artificial neural networks for misuse detection. In: Proceedings of the National Information Systems Security Conference (NISSC98), pp. 443–456. Arlington, VA (1998)
Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 409–415. Vancouver, Canada (2000)
Google Scholar
Debar, H., Dacier, M., Wespi, A.: A revised taxonomy for intrusion detection systems. Ann. Télécommun. 55(7/8), 361–378 (2000)
Google Scholar
Denning, D.E.: An intrusion detection model. IEEE Trans. Software Eng. 13(2), 222–232 (1987)
Article Google Scholar
Dopazo, J., Carazo, J.M.: Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J. Mol. Evol. 44, 226–233 (1997)
Article Google Scholar
Forras, P.A., Neumann, F.G.: EMERALD: event monitoring enabling response to anomalous live disturbances. In: Proceedings of the 20th National Information Systems Security Conference, pp. 353–365 (1997)
Freeman, S., Bivens, A., Branch, J., Szymanski, B.: Host-based intrusion detection using user signatures. In: Proceedings of the Research Conference. RPI, Troy, NY (2002)
Google Scholar
Feng, G., Mangasarian, O.L.: Semi-supervised support vector machines for unlabeled data classification. Optimization Methods Software 15, 29–44 (2001)
Article Google Scholar
Ghosh, A., Schwartzbard, A., Shatz, M.: Learning program behavior profiles for intrusion detection. In: Proceedings of the First USENIX Workshop on Intrusion Detection and Network Monitoring, pp. 51–62. Santa Clara, CA (1999)
Girardin, L., Brodbeck, D.: A visual approach or monitoring logs. In: Proceedings of the 12th System Administration Conference (LISA 98), pp. 299–308. Boston, MA (1998) (ISBN: 1-880446-40-5)
Hu, W., Liao, Y., Vemuri, V.R.: Robust support vector machines for anomaly detection in computer security. In: Proceedings of the 2003 International Conference on Machine Learning and Applications (ICMLA'03). Los Angeles, CA (2003)
Ilgun, K., Kemmerer, R.A., Porras, P.A.: State transition analysis: A rule-based intrusion detection approach. IEEE Trans. Software Eng. 21(3), 181–199 (1995)
Article Google Scholar
Joshi, M., Agrawal, R.: PNrule: a new framework for learning classifier models in data mining (a case-study in network intrusion detection) (2001). In: Proceedings of the First SIAM International Conference on Data Mining. Chicago (2001)
Khan, L., Luo, F.: Hierarchical clustering for complex data, in press. Int. J. Artif. Intell. Tools. World Scientific
Kohonen, T.: Self-Organizing Maps, Springer Series. Springer Berlin Heidelberg New York (1995)
Google Scholar
Kumar, S., Spafford, E.H.: A software architecture to support misuse intrusion detection. In: Proceedings of the 18th National Information Security Conference, pp. 194–204. (1995)
Lane, T., Brodley, C.E.: Temporal sequence earning and data reduction for anomaly detection. ACM Trans. Inform. Syst. Security 2(3), 295–331 (1999)
Article Google Scholar
Lee, W., Stolfo, S.J.: A framework for constructing features and models for intrusion detection systems. ACM Trans. Inform. Syst. Security 3(4), 227–261 (2000)
Article Google Scholar
Luo, F., Khan, L., Bastani, F.B., Yen, I.L., Zhou, J.: A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles. Bioinformatics 20(16), 2605–2617 (2004)
Article Google Scholar
Marchette, D.: A statistical method for profiling network traffic. In: Proceedings of the First USENIX Workshop on Intrusion Detection and Network Monitoring, pp. 119–128. Santa Clara, CA (1999)
McCanne, S., Leres, C., Jacobson, V.: Libpcap, available via anonymous ftp at ftp://ftp.ee.lbl.gov/ (1989)
Mukkamala, S., Janoski, G., Sung, A.: Intrusion detection: support vector machines and neural networks. In: Proceedings of the IEEE International Joint Conference on Neural Networks (ANNIE), pp. 1702–1707. St. Louis, MO (2002)
Lippmann, R., Graf, I., Wyschogrod, D., Webster, S.E., Weber, D.J., Gorton, S.: The 1998 DARPA/AFRL off-line intrusion detection evaluation. In: Proceedings of the First International Workshop on Recent Advances in Intrusion Detection (RAID). Louvain-la-Neuve, Belgium (1998)
Google Scholar
Ray, S., Turi, R.H.: Determination of number of clusters in k-means clustering and application in color image segmentation. In: Proceedings of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques (ICAPRDT'99), pp. 137–143. Calcutta, India (1999)
Ryan, J., Lin, M., Mikkulainen, R.: Intrusion detection with neural networks. In: Advances in Neural Information Processing Systems, vol. 10, pp. 943–949. MIT Press, Cambridge, MA (1998)
Google Scholar
Sequeira, K., Zaki, M.J.: ADMIT: anomaly-base data mining for intrusions. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 386–395 (2002)
Stolfo, S.J., Lee, W., Chan, P.K., Fan, W., Eskin, E.: Data mining-based intrusion detectors: an overview of the Columbia IDS project. ACM SIGMOD Record 30(4), 5–14 (2001)
Article Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer Berlin Heidelberg New York (1995)
MATH Google Scholar
Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inform. Process. Manage. 22(6), 465–476 (1986)
Article Google Scholar
Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: Alternative data models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, pp. 133–145. (1999)
Shih, L., Rennie, Y.D.M., Chang, Y., Karger, D.R.: Text bundling: statistics-based data reduction. In: Proceedings of the 20th International Conference on Machine Learning (ICML), pp. 696–703. Washington DC (2003)
Tufis, D., Popescu, C., Rosu, R.: Automatic classification of documents by random sampling. Proc. Romanian Acad. Ser. 1(2), 117–127 (2000)
Google Scholar
Upadhyaya, S., Chinchani, R., Kwiat, K.: An analytical framework for reasoning about intrusions. In: Proceedings of the IEEE Symposium on Reliable Distributed Systems, pp. 99–108. New Orleans, LA (2001)
Wang, K., Stolfo, S.J.: One class training for masquerade detection. In: Proceedings of the 3rd IEEE Conference, Data Mining Workshop on Data Mining for Computer Security. Florida (2003)
Yu, H., Yang, J., Han, J.: Classifying large data sets using SVM with hierarchical clusters. In: Proceedings of the SIGKDD 2003, pp. 306–315. Washington, DC (2003)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the SIGMOD Conference, pp. 103–114 (1996)

Download references

Author information

Authors and Affiliations

University of Texas at Dallas, Dallas, TX, USA
Latifur Khan, Mamoun Awad & Bhavani Thuraisingham

Authors

Latifur Khan
View author publications
You can also search for this author in PubMed Google Scholar
Mamoun Awad
View author publications
You can also search for this author in PubMed Google Scholar
Bhavani Thuraisingham
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Latifur Khan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, L., Awad, M. & Thuraisingham, B. A new intrusion detection system using support vector machines and hierarchical clustering. The VLDB Journal 16, 507–521 (2007). https://doi.org/10.1007/s00778-006-0002-5

Download citation

Received: 13 January 2005
Revised: 10 June 2005
Accepted: 21 July 2005
Published: 31 August 2006
Issue Date: October 2007
DOI: https://doi.org/10.1007/s00778-006-0002-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new intrusion detection system using support vector machines and hierarchical clustering

Abstract

Access this article

Similar content being viewed by others

Trends in Application of Machine Learning to Network-Based Intrusion Detection Systems

Fusion of Misuse Detection with Anomaly Detection Technique for Novel Hybrid Network Intrusion Detection System

An Enhanced Intrusion Detection System Based on Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new intrusion detection system using support vector machines and hierarchical clustering

Abstract

Access this article

Similar content being viewed by others

Trends in Application of Machine Learning to Network-Based Intrusion Detection Systems

Fusion of Misuse Detection with Anomaly Detection Technique for Novel Hybrid Network Intrusion Detection System

An Enhanced Intrusion Detection System Based on Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation