Abstract
The application of machine learning methods to malware detection has opened up possibilities of generating large number of classifiers that use different kinds of features and learning algorithms. A straightforward way to select the best classifier is to pick the one with best holdout or cross-validation performance. Cross-validation or holdout gives a point estimate of generalization performance that varies with training data and learning algorithm parameters. We propose a classifier selection criterion that considers bounds on the performance estimates using confidence intervals in conjunction with a performance target. Performance targets are commonly used in practice, particularly in security applications like malware detection, for classifier selection. The proposed criterion, called deployability, selects a classifier as deployable if the cost target lies within or above the classifier’s expected cost confidence interval. We conducted an experiment with machine learning based malware detectors to evaluate the criterion. We found that for a given confidence level and cost target, even the classifier with least expected cost may not be deployable and classifiers with higher expected cost may also be deployable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brazdil, P., Gama, J., Henery, B.: Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 83–102. Springer, Heidelberg (1994)
Elovici, Y., Braha, D.: A decision-theoretic approach to data mining. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 33(1), 42–51 (2003)
Gaffney Jr., J., Ulvila, J.: Evaluation of intrusion detectors: A decision theory approach. In: Proc. of IEEE Symposium on Security and Privacy, pp. 50–61 (2001)
Gama, J., Brazdil, P.: Characterization of classification algorithms. In: Progress in Artificial Intelligence, pp. 189–200 (1995)
Han, J., Kamber, M.: Data mining: concepts and techniques. Morgan Kaufmann (2006)
Kleinberg, J., Papadimitriou, C., Raghavan, P.: A microeconomic view of data mining. Data Mining and Knowledge Discovery 2(4), 311–324 (1998)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1137–1145 (1995)
Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proc. of the Tenth ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 470–478 (2004)
Kolter, J., Maloof, M.: Learning to detect and classify malicious executables in the wild. The Journal of Machine Learning Research 7, 2721–2744 (2006)
Miller, I., Miller, M.: John E. Freund’s mathematical statistics with applications. Prentice Hall (2004)
Moskovitch, R., Feher, C., Tzachar, N., Berger, E., Gitelman, M., Dolev, S., Elovici, Y.: Unknown Malcode Detection Using OPCODE Representation. In: Ortiz-Arroyo, D., Larsen, H.L., Zeng, D.D., Hicks, D., Wagner, G. (eds.) EuroIsI 2008. LNCS, vol. 5376, pp. 204–215. Springer, Heidelberg (2008)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Singh, A., Singh, S., Walenstein, A., Lakhotia, A. (2012). Deployable Classifiers for Malware Detection. In: Dua, S., Gangopadhyay, A., Thulasiraman, P., Straccia, U., Shepherd, M., Stein, B. (eds) Information Systems, Technology and Management. ICISTM 2012. Communications in Computer and Information Science, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29166-1_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-29166-1_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29165-4
Online ISBN: 978-3-642-29166-1
eBook Packages: Computer ScienceComputer Science (R0)