Abstract
Classification accuracy and efficiency of an intrusion detection system (IDS) are largely affected by the discretization methods applied on continuous attributes. Cut generation is one of the methods of discretization and by applying variable number of cuts (in a partition) to the continuous attributes, different classification accuracy are obtained. In the paper to maximize accuracy of classifying network traffic data either ‘normal’ or ‘anomaly’, the proposed algorithm determines the set of cut points for each of the continuous attributes. After generation of appropriate and necessary cut points, they are mapped into corresponding intervals following centre-spread encoding technique. The learnt cut points are applied on the test data set for discretization to achieve maximum classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Williams, N., Zander, S., Armitage, G.: A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification
McGregor, A., Hall, M., Lorier, P., Brunskill, J.: Flow Clustering Using Machine Learning Techniques. In: Passive & Active Measurement Workshop, France (April 2004)
Dunnigan, T., Ostrouchov, G.: Flow Characterization for Intrusion Detection, Technical Report, Oak Ridge National Laboratory (November 2000)
http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab4-NaiveBayes.pdf
Chai, K., Hn, H.T., Chieu, H.L.: Bayesian Online Classifiers for Text Classification and Filtering. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 97–104 (August 2002)
Badulescu, L.A.: Data Mining Algorithms Based On Decision Trees, Annals of the Oradea University. Fascicle of Management and Technological Engineering, vol. V(XV), pp. 1621–1628. Publishing House of Oradea University (2006) ISSN:1583 - 0691
Chaudhuri, S., Fayyad, U., Bernhardt, J.: Scalable Classification over SQL Databases. In: Proc. ICDE 1999, Sydney, Australia, pp. 470–479. IEEE Computer Society (1999)
Du, W., Zhan, Z.: Building Decision Tree Classifier on Private Data. In: IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, Conferences in Research and Practice in Information Technology, Maebashi City, Japan, vol. 14. Australian Computer Society, Inc. (2002)
Kotsiantis, S., Kanellopoulos, D.: Discretization Techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering 32(1), 47–58 (2006)
Xu, T., Yingwu, C.: Half-global discretization algorithm based on rough set theory. Journal of Systems Engineering and Electronics 20(2) (April 1, 2009)
Nsl-kdd data set for network-based intrusion detection systems (2009), http://nsl.cs.unb.ca/NSL-KDD/
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A Detailed Analysis of the KDD CUP 99 Data Set
Ching, J.Y., Wong, A.K.C., Chan, K.C.C.: Class-Dependent Discretization for Inductive Learning from Continuous and Mixed Mode Data. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(7), 641–651 (1995)
Ren, Z., Hao, Y., Wen, B.: A Heuristic Genetic Algorithm for Continuous Attribute Discretization in Rough Set Theory
Komorowski, J., Polkowski, L., Skowron, A.: Rough Set: A tutorial
Kruse, R.L., Ryba, A.J.: Data structures and program design in C++. Prentice Hall (1998) ISBN-13: 9780137689958
Boritz, J.E.: IS Practitioners’ Views on Core Concepts of Information Integrity. International Journal of Accounting Information Systems (retrieved August 12, 2011)
Morariu, D.I., Vintan, L.N., Tresp, V.: Meta-Classification using SVM Classifiers for Text Documents World Academy of Science, Engineering and Technology 21 (2008)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proc. of the Twelfth International Conf. on Machine Learning, pp. 194–202 (1995)
Yang, Y., Webb, G.I.: On Why Discretization Works for Naive-Bayes Classifiers
Han, Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Neyman, J., Pearson, E.S.: The testing of statistical hypotheses in relation to probabilities a priori. Joint Statistical Papers, pp. 186–202. Cambridge University Press (1933, 1967)
Weka 3: Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/
Weka User Manual http://www.gtbit.org/downloads/dwdmsem6/dwdmsem6lman.pdf , http://kent.dl.sourceforge.net/project/weka/documentation/3.6.x/WekaManual-3-6-2.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mazumder, S., Sharma, T., Mitra, R., Sengupta, N., Sil, J. (2012). Generation of Sufficient Cut Points to Discretize Network Traffic Data Sets. In: Panigrahi, B.K., Das, S., Suganthan, P.N., Nanda, P.K. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2012. Lecture Notes in Computer Science, vol 7677. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35380-2_62
Download citation
DOI: https://doi.org/10.1007/978-3-642-35380-2_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35379-6
Online ISBN: 978-3-642-35380-2
eBook Packages: Computer ScienceComputer Science (R0)