Skip to main content
Log in

A new intrusion detection system using support vector machines and hierarchical clustering

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Whenever an intrusion occurs, the security and value of a computer system is compromised. Network-based attacks make it difficult for legitimate users to access various network services by purposely occupying or sabotaging network resources and services. This can be done by sending large amounts of network traffic, exploiting well-known faults in networking services, and by overloading network hosts. Intrusion Detection attempts to detect computer attacks by examining various data records observed in processes on the network and it is split into two groups, anomaly detection systems and misuse detection systems. Anomaly detection is an attempt to search for malicious behavior that deviates from established normal patterns. Misuse detection is used to identify intrusions that match known attack scenarios. Our interest here is in anomaly detection and our proposed method is a scalable solution for detecting network-based anomalies. We use Support Vector Machines (SVM) for classification. The SVM is one of the most successful classification algorithms in the data mining area, but its long training time limits its use. This paper presents a study for enhancing the training time of SVM, specifically when dealing with large data sets, using hierarchical clustering analysis. We use the Dynamically Growing Self-Organizing Tree (DGSOT) algorithm for clustering because it has proved to overcome the drawbacks of traditional hierarchical clustering algorithms (e.g., hierarchical agglomerative clustering). Clustering analysis helps find the boundary points, which are the most qualified data points to train SVM, between two classes. We present a new approach of combination of SVM and DGSOT, which starts with an initial training set and expands it gradually using the clustering structure produced by the DGSOT algorithm. We compare our approach with the Rocchio Bundling technique and random selection in terms of accuracy loss and training time gain using a single benchmark real data set. We show that our proposed variations contribute significantly in improving the training process of SVM with high generalization accuracy and outperform the Rocchio Bundling technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal, D.K.: Shrinkage estimator generalizations of proximal support vector machines, In: Proceedings of the 8th International Conference Knowledge Discovery and Data Mining, pp. 173–182. Edmonton, Canada (2002)

  2. Anderson, D., Frivold, T., Valdes, A.: Next-generation intrusion detection expert system (NIDES): a summary. Technical Report SRI-CSL-95-07. Computer Science Laboratory, SRI International, Menlo Park, CA (May 1995)

    Google Scholar 

  3. Axelsson, S.: Research in intrusion detection systems: a survey. Technical Report TR 98-17 (revised in 1999). Chalmers University of Technology, Goteborg, Sweden (1999)

    Google Scholar 

  4. Balcazar, J.L., Dai, Y., Watanabe, O.: A random sampling technique for training support vector machines for primal-form maximal-margin classifiers, algorithmic learning theory. In: Proceedings of the 12th International Conference, ALT 2001, p. 119. Washington, DC (2001)

  5. Bivens, A., Palagiri, C., Smith, R., Szymanski, B., Embrechts, M.: Intelligent engineering systems through artificial neural networks. In: Proceedings of the ANNIE-2002, vol. 12, pp. 579–584. ASME Press, New York (2002)

    Google Scholar 

  6. Branch, J., Bivens, A., Chan, C.-Y., Lee, T.-K., Szymanski, B.: Denial of service intrusion detection using time dependent deterministic finite automata. In: Proceedings of the Research Conference. RPI, Troy, NY (2002)

    Google Scholar 

  7. Cannady, J.: Artificial neural networks for misuse detection. In: Proceedings of the National Information Systems Security Conference (NISSC98), pp. 443–456. Arlington, VA (1998)

  8. Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 409–415. Vancouver, Canada (2000)

    Google Scholar 

  9. Debar, H., Dacier, M., Wespi, A.: A revised taxonomy for intrusion detection systems. Ann. Télécommun. 55(7/8), 361–378 (2000)

    Google Scholar 

  10. Denning, D.E.: An intrusion detection model. IEEE Trans. Software Eng. 13(2), 222–232 (1987)

    Article  Google Scholar 

  11. Dopazo, J., Carazo, J.M.: Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J. Mol. Evol. 44, 226–233 (1997)

    Article  Google Scholar 

  12. Forras, P.A., Neumann, F.G.: EMERALD: event monitoring enabling response to anomalous live disturbances. In: Proceedings of the 20th National Information Systems Security Conference, pp. 353–365 (1997)

  13. Freeman, S., Bivens, A., Branch, J., Szymanski, B.: Host-based intrusion detection using user signatures. In: Proceedings of the Research Conference. RPI, Troy, NY (2002)

    Google Scholar 

  14. Feng, G., Mangasarian, O.L.: Semi-supervised support vector machines for unlabeled data classification. Optimization Methods Software 15, 29–44 (2001)

    Article  Google Scholar 

  15. Ghosh, A., Schwartzbard, A., Shatz, M.: Learning program behavior profiles for intrusion detection. In: Proceedings of the First USENIX Workshop on Intrusion Detection and Network Monitoring, pp. 51–62. Santa Clara, CA (1999)

  16. Girardin, L., Brodbeck, D.: A visual approach or monitoring logs. In: Proceedings of the 12th System Administration Conference (LISA 98), pp. 299–308. Boston, MA (1998) (ISBN: 1-880446-40-5)

  17. Hu, W., Liao, Y., Vemuri, V.R.: Robust support vector machines for anomaly detection in computer security. In: Proceedings of the 2003 International Conference on Machine Learning and Applications (ICMLA'03). Los Angeles, CA (2003)

  18. Ilgun, K., Kemmerer, R.A., Porras, P.A.: State transition analysis: A rule-based intrusion detection approach. IEEE Trans. Software Eng. 21(3), 181–199 (1995)

    Article  Google Scholar 

  19. Joshi, M., Agrawal, R.: PNrule: a new framework for learning classifier models in data mining (a case-study in network intrusion detection) (2001). In: Proceedings of the First SIAM International Conference on Data Mining. Chicago (2001)

  20. Khan, L., Luo, F.: Hierarchical clustering for complex data, in press. Int. J. Artif. Intell. Tools. World Scientific

  21. Kohonen, T.: Self-Organizing Maps, Springer Series. Springer Berlin Heidelberg New York (1995)

    Google Scholar 

  22. Kumar, S., Spafford, E.H.: A software architecture to support misuse intrusion detection. In: Proceedings of the 18th National Information Security Conference, pp. 194–204. (1995)

  23. Lane, T., Brodley, C.E.: Temporal sequence earning and data reduction for anomaly detection. ACM Trans. Inform. Syst. Security 2(3), 295–331 (1999)

    Article  Google Scholar 

  24. Lee, W., Stolfo, S.J.: A framework for constructing features and models for intrusion detection systems. ACM Trans. Inform. Syst. Security 3(4), 227–261 (2000)

    Article  Google Scholar 

  25. Luo, F., Khan, L., Bastani, F.B., Yen, I.L., Zhou, J.: A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles. Bioinformatics 20(16), 2605–2617 (2004)

    Article  Google Scholar 

  26. Marchette, D.: A statistical method for profiling network traffic. In: Proceedings of the First USENIX Workshop on Intrusion Detection and Network Monitoring, pp. 119–128. Santa Clara, CA (1999)

  27. McCanne, S., Leres, C., Jacobson, V.: Libpcap, available via anonymous ftp at ftp://ftp.ee.lbl.gov/ (1989)

  28. Mukkamala, S., Janoski, G., Sung, A.: Intrusion detection: support vector machines and neural networks. In: Proceedings of the IEEE International Joint Conference on Neural Networks (ANNIE), pp. 1702–1707. St. Louis, MO (2002)

  29. Lippmann, R., Graf, I., Wyschogrod, D., Webster, S.E., Weber, D.J., Gorton, S.: The 1998 DARPA/AFRL off-line intrusion detection evaluation. In: Proceedings of the First International Workshop on Recent Advances in Intrusion Detection (RAID). Louvain-la-Neuve, Belgium (1998)

    Google Scholar 

  30. Ray, S., Turi, R.H.: Determination of number of clusters in k-means clustering and application in color image segmentation. In: Proceedings of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques (ICAPRDT'99), pp. 137–143. Calcutta, India (1999)

  31. Ryan, J., Lin, M., Mikkulainen, R.: Intrusion detection with neural networks. In: Advances in Neural Information Processing Systems, vol. 10, pp. 943–949. MIT Press, Cambridge, MA (1998)

    Google Scholar 

  32. Sequeira, K., Zaki, M.J.: ADMIT: anomaly-base data mining for intrusions. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 386–395 (2002)

  33. Stolfo, S.J., Lee, W., Chan, P.K., Fan, W., Eskin, E.: Data mining-based intrusion detectors: an overview of the Columbia IDS project. ACM SIGMOD Record 30(4), 5–14 (2001)

    Article  Google Scholar 

  34. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer Berlin Heidelberg New York (1995)

    MATH  Google Scholar 

  35. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inform. Process. Manage. 22(6), 465–476 (1986)

    Article  Google Scholar 

  36. Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: Alternative data models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, pp. 133–145. (1999)

  37. Shih, L., Rennie, Y.D.M., Chang, Y., Karger, D.R.: Text bundling: statistics-based data reduction. In: Proceedings of the 20th International Conference on Machine Learning (ICML), pp. 696–703. Washington DC (2003)

  38. Tufis, D., Popescu, C., Rosu, R.: Automatic classification of documents by random sampling. Proc. Romanian Acad. Ser. 1(2), 117–127 (2000)

    Google Scholar 

  39. Upadhyaya, S., Chinchani, R., Kwiat, K.: An analytical framework for reasoning about intrusions. In: Proceedings of the IEEE Symposium on Reliable Distributed Systems, pp. 99–108. New Orleans, LA (2001)

  40. Wang, K., Stolfo, S.J.: One class training for masquerade detection. In: Proceedings of the 3rd IEEE Conference, Data Mining Workshop on Data Mining for Computer Security. Florida (2003)

  41. Yu, H., Yang, J., Han, J.: Classifying large data sets using SVM with hierarchical clusters. In: Proceedings of the SIGKDD 2003, pp. 306–315. Washington, DC (2003)

  42. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the SIGMOD Conference, pp. 103–114 (1996)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Latifur Khan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, L., Awad, M. & Thuraisingham, B. A new intrusion detection system using support vector machines and hierarchical clustering. The VLDB Journal 16, 507–521 (2007). https://doi.org/10.1007/s00778-006-0002-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-006-0002-5

Keywords

Navigation