Malware detection using bilayer behavior abstraction and improved one-class support vector machines

Miao, Qiguang; Liu, Jiachen; Cao, Ying; Song, Jianfeng

doi:10.1007/s10207-015-0297-6

Malware detection using bilayer behavior abstraction and improved one-class support vector machines

Regular Contribution
Published: 09 August 2015

Volume 15, pages 361–379, (2016)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

Qiguang Miao¹,
Jiachen Liu¹,
Ying Cao¹ &
…
Jianfeng Song¹

1259 Accesses
28 Citations
Explore all metrics

Abstract

Malware detection is one of the most challenging problems in computer security. Recently, methods based on machine learning are very popular in unknown and variant malware detection. In order to achieve a successful learning, extracting discriminant and stable features is the most important prerequisite. In this paper, we propose a bilayer behavior abstraction method based on semantic analysis of dynamic API sequences. Operations on sensitive system resources and complex behaviors are abstracted in an interpretable way at different semantic layers. At the lower layer, raw API calls are combined to abstract low-layer behaviors via data dependency analysis. At the higher layer, low-layer behaviors are further combined to construct more complex high-layer behaviors with good interpretability. The extracted low-layer and high-layer behaviors are finally embedded into a high-dimensional vector space. Hence, the abstracted behaviors can be directly used by many popular machine learning algorithms. Besides, to tackle the problem that benign programs are not adequately sampled or malware and benign programs are severely imbalanced, an improved one-class support vector machine (OC-SVM) named OC-SVM-Neg is proposed which makes use of the available negative samples. Experimental results show that the proposed feature extraction method with OC-SVM-Neg outperforms binary classifiers on the false alarm rate and the generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Article Open access 19 September 2022

Review: machine learning techniques applied to cybersecurity

Article 04 January 2019

Survey on SVM and their application in image classification

Article 11 January 2018

Notes

https://github.com/lcy-hugepanda/BA_RULE/.
VXHeaven: http://vxheavens.com/, last access: July 16, 2014
Windows PC software downloads and reviews from CNET, http://download.com, accessed 2014.
Huajun software downloads, http://onlinedown.net, accessed 2014.
NewBasic disassembler, http://www.fysnet.net/newbasic.htm, accessed 2014.

References

Fossi, M., Egan, G., Haley, K., Johnson, E., Mack, T., Adams, T., Blackbird, J., Low, M.K., Mazurek, D., Kinney, D.: Symantec internet security threat report, vol. 16. Symantec Corporation (2011)
Wood, P., Egan, G., Haley, K., Tran, T., Cox, O., Lau, H., Wueest, C., McKinney, D., Millington, T., Nahorney, B., Mulcahy, J.: Symantec internet security threat report, vol. 17. Symantec Corporation (2012)
Egele, M., Scholte, T., Kirda, E., Kruegel, C.: A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. 44(2), 1–49 (2012)
Article Google Scholar
Wang, X., Yu, W., Champion, A., Fu, X., Xuan, D.: Detecting worms via mining dynamic program execution. In: Proceedings of the 3rd International Conference on Security and Privacy in Communications Networks, pp. 412–421 (2007)
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Sec. 19(4), 639–668 (2011)
Article Google Scholar
Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of the 1st India Software Engineering Conference, pp. 5–14 (2008)
Martignoni, L., Stinson, E., Fredrikson, M., Jha, S., Mitchell, J.: A layered architecture for detecting malicious behaviors. In: Proceedings of the 11th International Symposium on Recent Advances in Intrusion Detection, pp. 78–97 (2008)
Ye, Y., Li, T., Huang, K., Jiang, Q., Chen, Y.: Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list. J. Intell. Inf. Syst. 35(1), 1–20 (2010)
Article Google Scholar
Firdausi, I., Lim, C., Erwin, A., Nugroho, A.S.: Analysis of machine learning techniques used in behavior-based malware detection. In: Proceedings of the 2nd International Conference on Advances in Computing, Control and Telecommunication Technologies, pp. 201–203 (2010)
Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Proceedings of the 28th Annual International Computer Software and Applications Conference, pp. 41–42 (2004)
Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)
MathSciNet MATH Google Scholar
Perdisci, R., Lanzi, A., Lee, W.: McBoost: Boosting scalability in malware collection and analysis using statistical classification of executables. In: Proceedings of the 24th Annual Computer Security Applications Conference, pp. 301–310 (2008)
Tahan, G., Rokach, L., Shahar, Y.: Mal-ID: automatic malware detection using common segment analysis and meta-features. J. Mach. Learn. Res. 13, 949–979 (2012)
MathSciNet MATH Google Scholar
Moskovitch, R., Feher, C., Tzachar, N., Berger, E., Gitelman, M., Dolev, S., Elovici, Y.: Unknown malcode detection using opcode representation. In: Daniel O., Henrik L, Daniel Z, David H, Gerhard W. (eds.) Intelligence and Security Informatics. pp. 204–215 (2008)
Adkins, F., Jones, L., Carlisle, M., Upchurch, J.: Heuristic malware detection via basic block comparison. In: Proceedings of 8th International Conference on Malicious and Unwanted Software, pp. 11–18 (2013)
Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inform. Sci. 231(10), 64–82 (2013)
Article MathSciNet Google Scholar
Lakhotia, A., Walenstein, A., Miles, C., Singh, A.: VILO: a rapid learning nearest-neighbor classifier for malware triage. J. Comput. Virol. 9(3), 109–123 (2013)
Google Scholar
Huda, S., Abawajy, J., Alazab, M., Abdollalihian, M., Islam, R., Yearwood, J.: Hybrids of support vector machine wrapper and filter based framework for malware detection. Future Gener. Comput. Syst. (2014). doi:10.1016/j.future.2014.06.001
Google Scholar
Park, Y., Reeves, D., Mulukutla, V., Sundaravel, B.: Fast malware classification by automated behavioral graph matching. In: Proceedings of the 6th Annual Workshop on Cyber Security and Information Intelligence Research, pp. 1–4 (2010)
Hu, X., Chiueh, T., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM Conference on Computer and Communications security, pp. 611–620 (2009)
Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X., Wang, X.F.: Effective and efficient malware detection at the end host. In: Proceedings of the 18th Conference on USENIX Security Symposium, pp. 351–366 (2009)
Cao, Y., Miao, Q., Liu, J., Gao, L.: Abstracting minimal security-relevant behaviors for malware analysis. J. Comput. Virol. 9(4), 193–204 (2013)
Google Scholar
Alazab, M., Venkatraman, S., Watters, P., Alazab, M.: Zero-day malware detection based on supervised learning algorithms of API call signatures. In: Proceedings of the 9th Australasian Data Mining Conference, pp. 171–182 (2011)
Firdausi, I., Lim, C., Erwin, A., Nugroho, A.S.: Analysis of machine learning techniques used in behavior-based malware detection. In: Proceedings of 2nd International Conference on Advances in Computing, Control and Telecommunication Technologies, pp. 201–203 (2010)
Natani, P., Vidyarthi, D.: Malware detection using API function frequency with ensemble based classifier. In: Proceedings of International Symposium on Security in Computing and Communications, pp. 378–388 (2013)
Sheen, S., Anitha, R., Sirisha, P.: Malware detection by pruning of parallel ensembles using harmony search. Pattern Recognit. Lett. 34(14), 1679–1686 (2013)
Article Google Scholar
Uppal, D., Sinha, R., Mehra, V., Jain, V.: Malware detection and classification based on extraction of API sequences. In: Proceedings of 3rd International Conference on Advances in Computing, Communications and Informatics, pp. 2337–2342 (2014)
Cheng, J.Y., Tsai, T., Yang, C.: An information retrieval approach for malware classification based on Windows API calls. In: Proceedings of 5th International Conference on Machine Learning and Cybernetics, pp. 1678–1683 (2013)
Gavrilut, D., Benchea, R., Vatamanu, C.: Optimized zero false positives perceptron training for malware detection. In: Proceedings of the 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 247–253 (2012)
Islam, R., Tian, R., Batten, L.M., Versteeg, S.: Classification of malware based on integrated static and dynamic features. J Netw. Comput. Appl. 34(2), 646–656 (2013)
Article Google Scholar
Santos, I., Devesa, J., Brezo, F., Nieves, J., Bringas, P.G.: OPEM: a static-dynamic approach for machine-learning-based malware detection. In: Proceedings of International Joint Conference CISIS’12-ICEUTE’12-SOCO’12, pp. 271–280 (2012)
Anderson, B., Storlie, C., Lane, T.: Improving malware classification: bridging the static/dynamic gap. In: Proceedings of 5th ACM Workshop on Security and Artificial Intelligence, pp. 3–14 (2012)
Liu, J., Song, J., Miao, Q., Cao, Y.: FENOC: an ensemble one-class learning framework for malware detection. In: Proceedings of 9th International Conference on Computational Intelligence and Security, pp. 523–527 (2013)
Kong, D., Yan, G.: Discriminant malware distance learning on structural information for automated malware classification. In: Proceedings of 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1357–1365 (2013)
Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X., Wang, X.F.: Effective and efficient malware detection at the end host. In: Proceedings of 18th Conference on USENIX Security Symposium, pp. 351–366 (2009)
Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of 1st India Software Engineering Conference, pp. 5–14 (2008)
Cao, Y., Miao, Q., Liu, J., Li, W.: Osiris: a malware behavior capturing system implemented at virtual machine monitor layer. Math. Probl. Eng. (2013). doi:10.1155/2013/402438
Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Article Google Scholar
Tax, D.M.J.: One-class classification. Ph.D. dissertation, Delft University of Technology (2001)
Spathoulas, G.P., Katsikas, S.K.: Reducing false positives in intrusion detection systems. Comput. Secur. 29(1), 35–44 (2010)
Article Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Article MATH Google Scholar
Bernhard, S., Platt, J.C., Smola, A.J.: Kernel method for percentile feature extraction. Microsoft technical report, pp. 2000–2022 (2000)
Dinaburg, A., Royal, P., Sharif, M., Lee, W.: Ether: malware analysis via hardware virtualization extensions. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, pp. 51–62 (2008)

Download references

Acknowledgments

The authors also would like to thank the reviewers for their valuable comments and important suggestions. Many thanks to Dr. Ben Stock at University of Erlangen-Nuremberg for his kind help of sharing many useful malware samples with us. The work was jointly supported by the National Natural Science Foundations of China under Grant No. 61472302, 61272280, U1404620, 41271447 and 61272195; The Program for New Century Excellent Talents in University under Grant No. NCET-12-0919; The Fundamental Research Funds for the Central Universities under Grant No. K5051203020, K50513- 03016, K5051303018, BDY081422 and K50513100006; Natural Science Foundation of Shaanxi Province, under Grant No. 2014JM8310; The Creative Project of the Science and Technology State of Xi’an under Grant No. CXY1341(6) and CXY1440(1) The State Key Laboratory of Geo-information Engineering under Grant No. SKLGIE2014-M-4-4.

Author information

Authors and Affiliations

School of Computer Science and Technology, Xidian Universiy, 2nd Taibai Road, Xi’an, 710071, China
Qiguang Miao, Jiachen Liu, Ying Cao & Jianfeng Song

Authors

Qiguang Miao
View author publications
You can also search for this author in PubMed Google Scholar
Jiachen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Cao
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiachen Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Miao, Q., Liu, J., Cao, Y. et al. Malware detection using bilayer behavior abstraction and improved one-class support vector machines. Int. J. Inf. Secur. 15, 361–379 (2016). https://doi.org/10.1007/s10207-015-0297-6

Download citation

Published: 09 August 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s10207-015-0297-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Malware detection using bilayer behavior abstraction and improved one-class support vector machines

Abstract

Access this article

Similar content being viewed by others

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Review: machine learning techniques applied to cybersecurity

Survey on SVM and their application in image classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Malware detection using bilayer behavior abstraction and improved one-class support vector machines

Abstract

Access this article

Similar content being viewed by others

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Review: machine learning techniques applied to cybersecurity

Survey on SVM and their application in image classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation