Skip to main content
Log in

Malware detection using bilayer behavior abstraction and improved one-class support vector machines

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

Malware detection is one of the most challenging problems in computer security. Recently, methods based on machine learning are very popular in unknown and variant malware detection. In order to achieve a successful learning, extracting discriminant and stable features is the most important prerequisite. In this paper, we propose a bilayer behavior abstraction method based on semantic analysis of dynamic API sequences. Operations on sensitive system resources and complex behaviors are abstracted in an interpretable way at different semantic layers. At the lower layer, raw API calls are combined to abstract low-layer behaviors via data dependency analysis. At the higher layer, low-layer behaviors are further combined to construct more complex high-layer behaviors with good interpretability. The extracted low-layer and high-layer behaviors are finally embedded into a high-dimensional vector space. Hence, the abstracted behaviors can be directly used by many popular machine learning algorithms. Besides, to tackle the problem that benign programs are not adequately sampled or malware and benign programs are severely imbalanced, an improved one-class support vector machine (OC-SVM) named OC-SVM-Neg is proposed which makes use of the available negative samples. Experimental results show that the proposed feature extraction method with OC-SVM-Neg outperforms binary classifiers on the false alarm rate and the generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. https://github.com/lcy-hugepanda/BA_RULE/.

  2. VXHeaven: http://vxheavens.com/, last access: July 16, 2014

  3. Windows PC software downloads and reviews from CNET, http://download.com, accessed 2014.

  4. Huajun software downloads, http://onlinedown.net, accessed 2014.

  5. NewBasic disassembler, http://www.fysnet.net/newbasic.htm, accessed 2014.

References

  1. Fossi, M., Egan, G., Haley, K., Johnson, E., Mack, T., Adams, T., Blackbird, J., Low, M.K., Mazurek, D., Kinney, D.: Symantec internet security threat report, vol. 16. Symantec Corporation (2011)

  2. Wood, P., Egan, G., Haley, K., Tran, T., Cox, O., Lau, H., Wueest, C., McKinney, D., Millington, T., Nahorney, B., Mulcahy, J.: Symantec internet security threat report, vol. 17. Symantec Corporation (2012)

  3. Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. 44(2), 1–49 (2012)

    Article  Google Scholar 

  4. Wang, X., Yu, W., Champion, A., Fu, X., Xuan, D.: Detecting worms via mining dynamic program execution. In: Proceedings of the 3rd International Conference on Security and Privacy in Communications Networks, pp. 412–421 (2007)

  5. Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Sec. 19(4), 639–668 (2011)

    Article  Google Scholar 

  6. Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of the 1st India Software Engineering Conference, pp. 5–14 (2008)

  7. Martignoni, L., Stinson, E., Fredrikson, M., Jha, S., Mitchell, J.: A layered architecture for detecting malicious behaviors. In: Proceedings of the 11th International Symposium on Recent Advances in Intrusion Detection, pp. 78–97 (2008)

  8. Ye, Y., Li, T., Huang, K., Jiang, Q., Chen, Y.: Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list. J. Intell. Inf. Syst. 35(1), 1–20 (2010)

    Article  Google Scholar 

  9. Firdausi, I., Lim, C., Erwin, A., Nugroho, A.S.: Analysis of machine learning techniques used in behavior-based malware detection. In: Proceedings of the 2nd International Conference on Advances in Computing, Control and Telecommunication Technologies, pp. 201–203 (2010)

  10. Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Proceedings of the 28th Annual International Computer Software and Applications Conference, pp. 41–42 (2004)

  11. Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)

    MathSciNet  MATH  Google Scholar 

  12. Perdisci, R., Lanzi, A., Lee, W.: McBoost: Boosting scalability in malware collection and analysis using statistical classification of executables. In: Proceedings of the 24th Annual Computer Security Applications Conference, pp. 301–310 (2008)

  13. Tahan, G., Rokach, L., Shahar, Y.: Mal-ID: automatic malware detection using common segment analysis and meta-features. J. Mach. Learn. Res. 13, 949–979 (2012)

    MathSciNet  MATH  Google Scholar 

  14. Moskovitch, R., Feher, C., Tzachar, N., Berger, E., Gitelman, M., Dolev, S., Elovici, Y.: Unknown malcode detection using opcode representation. In: Daniel O., Henrik L, Daniel Z, David H, Gerhard W. (eds.) Intelligence and Security Informatics. pp. 204–215 (2008)

  15. Adkins, F., Jones, L., Carlisle, M., Upchurch, J.: Heuristic malware detection via basic block comparison. In: Proceedings of 8th International Conference on Malicious and Unwanted Software, pp. 11–18 (2013)

  16. Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inform. Sci. 231(10), 64–82 (2013)

    Article  MathSciNet  Google Scholar 

  17. Lakhotia, A., Walenstein, A., Miles, C., Singh, A.: VILO: a rapid learning nearest-neighbor classifier for malware triage. J. Comput. Virol. 9(3), 109–123 (2013)

    Google Scholar 

  18. Huda, S., Abawajy, J., Alazab, M., Abdollalihian, M., Islam, R., Yearwood, J.: Hybrids of support vector machine wrapper and filter based framework for malware detection. Future Gener. Comput. Syst. (2014). doi:10.1016/j.future.2014.06.001

    Google Scholar 

  19. Park, Y., Reeves, D., Mulukutla, V., Sundaravel, B.: Fast malware classification by automated behavioral graph matching. In: Proceedings of the 6th Annual Workshop on Cyber Security and Information Intelligence Research, pp. 1–4 (2010)

  20. Hu, X., Chiueh, T., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications security, pp. 611–620 (2009)

  21. Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X., Wang, X.F.: Effective and efficient malware detection at the end host. In: Proceedings of the 18th Conference on USENIX Security Symposium, pp. 351–366 (2009)

  22. Cao, Y., Miao, Q., Liu, J., Gao, L.: Abstracting minimal security-relevant behaviors for malware analysis. J. Comput. Virol. 9(4), 193–204 (2013)

    Google Scholar 

  23. Alazab, M., Venkatraman, S., Watters, P., Alazab, M.: Zero-day malware detection based on supervised learning algorithms of API call signatures. In: Proceedings of the 9th Australasian Data Mining Conference, pp. 171–182 (2011)

  24. Firdausi, I., Lim, C., Erwin, A., Nugroho, A.S.: Analysis of machine learning techniques used in behavior-based malware detection. In: Proceedings of 2nd International Conference on Advances in Computing, Control and Telecommunication Technologies, pp. 201–203 (2010)

  25. Natani, P., Vidyarthi, D.: Malware detection using API function frequency with ensemble based classifier. In: Proceedings of International Symposium on Security in Computing and Communications, pp. 378–388 (2013)

  26. Sheen, S., Anitha, R., Sirisha, P.: Malware detection by pruning of parallel ensembles using harmony search. Pattern Recognit. Lett. 34(14), 1679–1686 (2013)

    Article  Google Scholar 

  27. Uppal, D., Sinha, R., Mehra, V., Jain, V.: Malware detection and classification based on extraction of API sequences. In: Proceedings of 3rd International Conference on Advances in Computing, Communications and Informatics, pp. 2337–2342 (2014)

  28. Cheng, J.Y., Tsai, T., Yang, C.: An information retrieval approach for malware classification based on Windows API calls. In: Proceedings of 5th International Conference on Machine Learning and Cybernetics, pp. 1678–1683 (2013)

  29. Gavrilut, D., Benchea, R., Vatamanu, C.: Optimized zero false positives perceptron training for malware detection. In: Proceedings of the 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 247–253 (2012)

  30. Islam, R., Tian, R., Batten, L.M., Versteeg, S.: Classification of malware based on integrated static and dynamic features. J Netw. Comput. Appl. 34(2), 646–656 (2013)

    Article  Google Scholar 

  31. Santos, I., Devesa, J., Brezo, F., Nieves, J., Bringas, P.G.: OPEM: a static-dynamic approach for machine-learning-based malware detection. In: Proceedings of International Joint Conference CISIS’12-ICEUTE’12-SOCO’12, pp. 271–280 (2012)

  32. Anderson, B., Storlie, C., Lane, T.: Improving malware classification: bridging the static/dynamic gap. In: Proceedings of 5th ACM Workshop on Security and Artificial Intelligence, pp. 3–14 (2012)

  33. Liu, J., Song, J., Miao, Q., Cao, Y.: FENOC: an ensemble one-class learning framework for malware detection. In: Proceedings of 9th International Conference on Computational Intelligence and Security, pp. 523–527 (2013)

  34. Kong, D., Yan, G.: Discriminant malware distance learning on structural information for automated malware classification. In: Proceedings of 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1357–1365 (2013)

  35. Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X., Wang, X.F.: Effective and efficient malware detection at the end host. In: Proceedings of 18th Conference on USENIX Security Symposium, pp. 351–366 (2009)

  36. Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of 1st India Software Engineering Conference, pp. 5–14 (2008)

  37. Cao, Y., Miao, Q., Liu, J., Li, W.: Osiris: a malware behavior capturing system implemented at virtual machine monitor layer. Math. Probl. Eng. (2013). doi:10.1155/2013/402438

    Google Scholar 

  38. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)

    Article  Google Scholar 

  39. Tax, D.M.J.: One-class classification. Ph.D. dissertation, Delft University of Technology (2001)

  40. Spathoulas, G.P., Katsikas, S.K.: Reducing false positives in intrusion detection systems. Comput. Secur. 29(1), 35–44 (2010)

    Article  Google Scholar 

  41. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)

    Article  MATH  Google Scholar 

  42. Bernhard, S., Platt, J.C., Smola, A.J.: Kernel method for percentile feature extraction. Microsoft technical report, pp. 2000–2022 (2000)

  43. Dinaburg, A., Royal, P., Sharif, M., Lee, W.: Ether: malware analysis via hardware virtualization extensions. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, pp. 51–62 (2008)

Download references

Acknowledgments

The authors also would like to thank the reviewers for their valuable comments and important suggestions. Many thanks to Dr. Ben Stock at University of Erlangen-Nuremberg for his kind help of sharing many useful malware samples with us. The work was jointly supported by the National Natural Science Foundations of China under Grant No. 61472302, 61272280, U1404620, 41271447 and 61272195; The Program for New Century Excellent Talents in University under Grant No. NCET-12-0919; The Fundamental Research Funds for the Central Universities under Grant No. K5051203020, K50513- 03016, K5051303018, BDY081422 and K50513100006; Natural Science Foundation of Shaanxi Province, under Grant No. 2014JM8310; The Creative Project of the Science and Technology State of Xi’an under Grant No. CXY1341(6) and CXY1440(1) The State Key Laboratory of Geo-information Engineering under Grant No. SKLGIE2014-M-4-4.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiachen Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Miao, Q., Liu, J., Cao, Y. et al. Malware detection using bilayer behavior abstraction and improved one-class support vector machines. Int. J. Inf. Secur. 15, 361–379 (2016). https://doi.org/10.1007/s10207-015-0297-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-015-0297-6

Keywords

Navigation