Skip to main content

Effective Malware Detection Based on Behaviour and Data Features

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10699))

Abstract

Malware is one of the most serious security threats on the Internet today. Traditional detection methods become ineffective as malware continues to evolve. Recently, various machine learning approaches have been proposed for detecting malware. However, either they focused on behaviour information, leaving the data information out of consideration, or they did not consider too much about the new malware with different behaviours or new malware versions obtained by obfuscation techniques. In this paper, we propose an effective approach for malware detection using machine learning. Different from most existing work, we take into account not only the behaviour information but also the data information, namely, the opcodes, data types and system libraries used in executables. We employ various machine learning methods in our implementation. Several experiments are conducted to evaluate our approach. The results show that (1) the classifier trained by Random Forest performs best with the accuracy 0.9788 and the AUC 0.9959; (2) all the features (including data types) are effective for malware detection; (3) our classifier is capable of detecting some fresh malware; (4) our classifier has a resistance to some obfuscation techniques.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We are able to obfuscate only the obj files compiled from C codes through VS 2010.

References

  1. McAfee Labs Threats Report, June 2017

    Google Scholar 

  2. Beaucamps, P., Filiol, E.: On the possibility of practically obfuscating programs towards a unified perspective of code protection. J. Comput. Virol. 3(1), 3–21 (2007)

    Article  Google Scholar 

  3. Ye, Y., Li, T., Adjeroh, D., Iyengar, S.S.: A survey on malware detection using data mining techniques. ACM Comput. Surv. 50(3), 41 (2017)

    Article  Google Scholar 

  4. Elovici, Y., Shabtai, A., Moskovitch, R., Tahan, G., Glezer, C.: Applying machine learning techniques for detection of malicious code in network traffic. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 44–50. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74565-5_5

    Chapter  Google Scholar 

  5. Masud, M.M., Khan, L., Thuraisingham, B.: A scalable multi-level feature extraction technique to detect malicious executables. Inf. Syst. Front. 10(1), 33–45 (2008)

    Article  Google Scholar 

  6. Anderson, B., Storlie, C., Lane, T.: Improving malware classification: bridging the static/dynamic gap. In: ACM Workshop on Security and Artificial Intelligence, pp. 3–14 (2012)

    Google Scholar 

  7. Ye, Y., Li, T., Chen, Y., Jiang, Q.: Automatic malware categorization using cluster ensemble. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2010)

    Google Scholar 

  8. Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. 231(9), 64–82 (2013)

    Article  MathSciNet  Google Scholar 

  9. Wang, T.Y., Horng, S.J., Su, M.Y., Wu, C.H.: A surveillance spyware detection system based on data mining methods. In: IEEE International Conference on Evolutionary Computation, pp. 3236–3241 (2006)

    Google Scholar 

  10. Ye, Y., Wang, D., Li, T., Ye, D.: IMDS: intelligent malware detection system. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1043–1047 (2007)

    Google Scholar 

  11. Ye, Y., Li, T., Huang, K., Jiang, Q., Chen, Y.: Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list. J. Intell. Inf. Syst. 35(1), 1–20 (2009)

    Article  Google Scholar 

  12. Ye, Y., Chen, L., Wang, D., Li, T., Jiang, Q., Zhao, M.: SBMDS: an interpretable string based malware detection system using SVM ensemble with bagging. J. Comput. Virol. 5(4), 283 (2009)

    Article  Google Scholar 

  13. Islam, R., Tian, R., Versteeg, S., Versteeg, S.: Classification of malware based on integrated static and dynamic features. J. Netw. Comput. Appl. 36(2), 646–656 (2013)

    Article  Google Scholar 

  14. Karampatziakis, N., Stokes, J.W., Thomas, A., Marinescu, M.: Using file relationships in malware classification. In: Flegel, U., Markatos, E., Robertson, W. (eds.) DIMVA 2012. LNCS, vol. 7591, pp. 1–20. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37300-8_1

    Chapter  Google Scholar 

  15. Tamersoy, A., Roundy, K., Chau, D.H.: Guilt by association: large scale malware detection by mining file-relation graphs. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014)

    Google Scholar 

  16. Mohamed, G.A.N., Ithnin, N.B.: Survey on representation techniques for malware detection system. Am. J. Appl. Sci. 14(11), 1049–1069 (2017)

    Article  Google Scholar 

  17. Saxe, J., Berlin, K.: Deep neural network based malware detection using two dimensional binary program features. In: 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), pp. 11–20 (2015)

    Google Scholar 

  18. Hardy, W., Chen, L., Hou, S., Ye, Y., Li, X.: DL4MD: a deep learning framework for intelligent malware detection. In: Proceedings of the International Conference on Data Mining (2016)

    Google Scholar 

  19. Ye, Y., Chen, L., Hou, S., et al.: DeepAM: a heterogeneous deep learning framework for intelligent malware detection. Knowl. Inf. Syst. 1–21 (2017)

    Google Scholar 

  20. Jordaney, R., Sharad, K., Dash, S.K., Wang, Z., Papini, D., Nouretdinov, I., Cavallaro, L.: Transcend: detecting concept drift in malware classification models. In: 26th USENIX Security Symposium (USENIX Security 2017), pp. 625–642 (2017)

    Google Scholar 

  21. Xu, Z., Wen, C., Qin, S.: Learning types for binaries. In: Duan, Z., Ong, L. (eds.) ICFEM 2017. LNCS, vol. 10610, pp. 430–446. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68690-5_26

    Chapter  Google Scholar 

  22. Microsoft Malware Classification Challenge. https://www.kaggle.com/c/malware-classification

  23. theZoo aka Malware DB. http://ytisf.github.io/theZoo/

  24. DAS MALWERK. http://dasmalwerk.eu/

  25. Obfuscator. https://www.pelock.com/products/obfuscator

  26. Unest. http://unest.org/

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their helpful comments. This work was partially supported by the National Natural Science Foundation of China under Grants No. 61502308, 61373033 and 61672358, Science and Technology Foundation of Shenzhen City under Grant No. JCYJ20170302153712968.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiwu Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, Z., Wen, C., Qin, S., Ming, Z. (2018). Effective Malware Detection Based on Behaviour and Data Features. In: Qiu, M. (eds) Smart Computing and Communication. SmartCom 2017. Lecture Notes in Computer Science(), vol 10699. Springer, Cham. https://doi.org/10.1007/978-3-319-73830-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73830-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73829-1

  • Online ISBN: 978-3-319-73830-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics