Effective Malware Detection Based on Behaviour and Data Features

Xu, Zhiwu; Wen, Cheng; Qin, Shengchao; Ming, Zhong

doi:10.1007/978-3-319-73830-7_6

Effective Malware Detection Based on Behaviour and Data Features

Zhiwu Xu¹⁴,
Cheng Wen¹⁴,
Shengchao Qin¹⁴ &
…
Zhong Ming¹⁴

Conference paper
First Online: 18 January 2018

1915 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10699))

Abstract

Malware is one of the most serious security threats on the Internet today. Traditional detection methods become ineffective as malware continues to evolve. Recently, various machine learning approaches have been proposed for detecting malware. However, either they focused on behaviour information, leaving the data information out of consideration, or they did not consider too much about the new malware with different behaviours or new malware versions obtained by obfuscation techniques. In this paper, we propose an effective approach for malware detection using machine learning. Different from most existing work, we take into account not only the behaviour information but also the data information, namely, the opcodes, data types and system libraries used in executables. We employ various machine learning methods in our implementation. Several experiments are conducted to evaluate our approach. The results show that (1) the classifier trained by Random Forest performs best with the accuracy 0.9788 and the AUC 0.9959; (2) all the features (including data types) are effective for malware detection; (3) our classifier is capable of detecting some fresh malware; (4) our classifier has a resistance to some obfuscation techniques.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We are able to obfuscate only the obj files compiled from C codes through VS 2010.

References

McAfee Labs Threats Report, June 2017
Google Scholar
Beaucamps, P., Filiol, E.: On the possibility of practically obfuscating programs towards a unified perspective of code protection. J. Comput. Virol. 3(1), 3–21 (2007)
Article Google Scholar
Ye, Y., Li, T., Adjeroh, D., Iyengar, S.S.: A survey on malware detection using data mining techniques. ACM Comput. Surv. 50(3), 41 (2017)
Article Google Scholar
Elovici, Y., Shabtai, A., Moskovitch, R., Tahan, G., Glezer, C.: Applying machine learning techniques for detection of malicious code in network traffic. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 44–50. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74565-5_5
Chapter Google Scholar
Masud, M.M., Khan, L., Thuraisingham, B.: A scalable multi-level feature extraction technique to detect malicious executables. Inf. Syst. Front. 10(1), 33–45 (2008)
Article Google Scholar
Anderson, B., Storlie, C., Lane, T.: Improving malware classification: bridging the static/dynamic gap. In: ACM Workshop on Security and Artificial Intelligence, pp. 3–14 (2012)
Google Scholar
Ye, Y., Li, T., Chen, Y., Jiang, Q.: Automatic malware categorization using cluster ensemble. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2010)
Google Scholar
Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. 231(9), 64–82 (2013)
Article MathSciNet Google Scholar
Wang, T.Y., Horng, S.J., Su, M.Y., Wu, C.H.: A surveillance spyware detection system based on data mining methods. In: IEEE International Conference on Evolutionary Computation, pp. 3236–3241 (2006)
Google Scholar
Ye, Y., Wang, D., Li, T., Ye, D.: IMDS: intelligent malware detection system. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1043–1047 (2007)
Google Scholar
Ye, Y., Li, T., Huang, K., Jiang, Q., Chen, Y.: Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list. J. Intell. Inf. Syst. 35(1), 1–20 (2009)
Article Google Scholar
Ye, Y., Chen, L., Wang, D., Li, T., Jiang, Q., Zhao, M.: SBMDS: an interpretable string based malware detection system using SVM ensemble with bagging. J. Comput. Virol. 5(4), 283 (2009)
Article Google Scholar
Islam, R., Tian, R., Versteeg, S., Versteeg, S.: Classification of malware based on integrated static and dynamic features. J. Netw. Comput. Appl. 36(2), 646–656 (2013)
Article Google Scholar
Karampatziakis, N., Stokes, J.W., Thomas, A., Marinescu, M.: Using file relationships in malware classification. In: Flegel, U., Markatos, E., Robertson, W. (eds.) DIMVA 2012. LNCS, vol. 7591, pp. 1–20. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37300-8_1
Chapter Google Scholar
Tamersoy, A., Roundy, K., Chau, D.H.: Guilt by association: large scale malware detection by mining file-relation graphs. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014)
Google Scholar
Mohamed, G.A.N., Ithnin, N.B.: Survey on representation techniques for malware detection system. Am. J. Appl. Sci. 14(11), 1049–1069 (2017)
Article Google Scholar
Saxe, J., Berlin, K.: Deep neural network based malware detection using two dimensional binary program features. In: 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), pp. 11–20 (2015)
Google Scholar
Hardy, W., Chen, L., Hou, S., Ye, Y., Li, X.: DL4MD: a deep learning framework for intelligent malware detection. In: Proceedings of the International Conference on Data Mining (2016)
Google Scholar
Ye, Y., Chen, L., Hou, S., et al.: DeepAM: a heterogeneous deep learning framework for intelligent malware detection. Knowl. Inf. Syst. 1–21 (2017)
Google Scholar
Jordaney, R., Sharad, K., Dash, S.K., Wang, Z., Papini, D., Nouretdinov, I., Cavallaro, L.: Transcend: detecting concept drift in malware classification models. In: 26th USENIX Security Symposium (USENIX Security 2017), pp. 625–642 (2017)
Google Scholar
Xu, Z., Wen, C., Qin, S.: Learning types for binaries. In: Duan, Z., Ong, L. (eds.) ICFEM 2017. LNCS, vol. 10610, pp. 430–446. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68690-5_26
Chapter Google Scholar
Microsoft Malware Classification Challenge. https://www.kaggle.com/c/malware-classification
theZoo aka Malware DB. http://ytisf.github.io/theZoo/
DAS MALWERK. http://dasmalwerk.eu/
Obfuscator. https://www.pelock.com/products/obfuscator
Unest. http://unest.org/

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their helpful comments. This work was partially supported by the National Natural Science Foundation of China under Grants No. 61502308, 61373033 and 61672358, Science and Technology Foundation of Shenzhen City under Grant No. JCYJ20170302153712968.

Author information

Authors and Affiliations

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
Zhiwu Xu, Cheng Wen, Shengchao Qin & Zhong Ming

Authors

Zhiwu Xu
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Wen
View author publications
You can also search for this author in PubMed Google Scholar
Shengchao Qin
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Ming
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiwu Xu .

Editor information

Editors and Affiliations

Columbia University, New York, New York, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Z., Wen, C., Qin, S., Ming, Z. (2018). Effective Malware Detection Based on Behaviour and Data Features. In: Qiu, M. (eds) Smart Computing and Communication. SmartCom 2017. Lecture Notes in Computer Science(), vol 10699. Springer, Cham. https://doi.org/10.1007/978-3-319-73830-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-73830-7_6
Published: 18 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73829-1
Online ISBN: 978-3-319-73830-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics