Skip to main content

Evolutionary Cost-Sensitive Ensemble for Malware Detection

  • Conference paper
International Joint Conference SOCO’14-CISIS’14-ICEUTE’14

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 299))

Abstract

Malware detection is among the most extensively developed areas for computer security. Unauthorized, malicious software can cause expensive damage to both private users and companies. It can destroy the computer, breach the privacy of user and result in loss of valuable data. The amount of data uploaded and downloaded each day makes almost impossible for manual screening of each incoming software package. That is why there is a need for effective intelligent filters, that can automatically dichotomize between the safe and dangerous applications. The number of malware programs, that are faced by the detection system, is typically much smaller than the number of desired programs. Therefore, we have to deal with the imbalanced classification problem, in which standard classification algorithms tend to fail. In this paper, we present a novel ensemble, based on cost-sensitive decision trees. Individual classifiers are constructed according to an established cost matrix and trained on random feature subspaces to ensure, that they are mutually complementary. Instead of using a fixed cost matrix we derive its parameters via ROC analysis. An evolutionary algorithm is being applied for simultaneous classifier selection and assignment of committee member weights for the fusion process. Experimental analysis, carried out on a large malware dataset, prove that our method is capable of outperforming other state-of-the-art algorithms, and hence is an effective approach for the problem of imbalanced malware detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbasi, F.H., Harris, R., Marsland, S., Moretti, G.: An exemplar-based learning approach for detection and classification of malicious network streams in honeynets. Security and Communication Networks 7(2), 352–364 (2014)

    Article  Google Scholar 

  2. Alpaydin, E.: Combined 5 x 2 cv f test for comparing supervised classification learning algorithms. Neural Computation 11(8), 1885–1892 (1999)

    Article  Google Scholar 

  3. Błaszczyński, J., Deckert, M., Stefanowski, J., Wilk, S.: Integrating selective pre-processing of imbalanced data with ivotes ensemble. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 148–157. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Chapman and Hall (1984)

    Google Scholar 

  5. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: Smoteboost: Improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Fawcett, T.: An introduction to roc analysis. Pattern Recognition Letters 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  7. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)

    Article  Google Scholar 

  8. Krawczyk, B., Schaefer, G., Woźniak, M.: Breast thermogram analysis using a cost-sensitive multiple classifier system. In: Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2012), pp. 507–510 (2012)

    Google Scholar 

  9. Krawczyk, B., Woźniak, M., Schaefer, G.: Cost-sensitive decision tree ensembles for effective imbalanced classification. Applied Soft Computing, Part C 14, 554–562 (2014)

    Article  Google Scholar 

  10. Ling, C.X., Yang, Q., Wang, J., Zhang, S.: Decision trees with minimal costs. In: Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004, pp. 544–551 (2004)

    Google Scholar 

  11. Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39(2), 539–550 (2009)

    Article  Google Scholar 

  12. Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012, Part II. LNCS, vol. 7209, pp. 139–150. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Nuenz, M.: The use of background knowledge in decision tree induction. Machine Learning 6, 231–250 (1991)

    Google Scholar 

  14. Ouellette, J., Pfeffer, A., Lakhotia, A.: Countering malware evolution using cloud-based learning, pp. 85–94 (2013)

    Google Scholar 

  15. Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. Journal of Computer Security 19(4), 639–668 (2011)

    Google Scholar 

  16. Schrittwieser, S., Katzenbeisser, S., Kieseberg, P., Huber, M., Leithner, M., Mulazzani, M., Weippl, E.: Covert computation - hiding code in code through compile-time obfuscation. Computers and Security 42, 13–26 (2014)

    Article  Google Scholar 

  17. Shan, Z., Wang, X.: Growing grapes in your computer to defend against malware. IEEE Transactions on Information Forensics and Security 9(2), 196–207 (2014)

    Article  Google Scholar 

  18. Sheen, S., Anitha, R., Sirisha, P.: Malware detection by pruning of parallel ensembles using harmony search. Pattern Recognition Letters 34(14), 1679–1686 (2013)

    Article  Google Scholar 

  19. Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40(12), 3358–3378 (2007)

    Article  MATH  Google Scholar 

  20. Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence 23(4), 687–719 (2009)

    Article  Google Scholar 

  21. Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings, pp. 324–331 (2009)

    Google Scholar 

  22. Ye, Y., Wang, D., Li, T., Ye, D.: Imds: Intelligent malware detection system, pp. 1043–1047 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bartosz Krawczyk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Krawczyk, B., Woźniak, M. (2014). Evolutionary Cost-Sensitive Ensemble for Malware Detection. In: de la Puerta, J., et al. International Joint Conference SOCO’14-CISIS’14-ICEUTE’14. Advances in Intelligent Systems and Computing, vol 299. Springer, Cham. https://doi.org/10.1007/978-3-319-07995-0_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07995-0_43

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07994-3

  • Online ISBN: 978-3-319-07995-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics