Skip to main content

Machine Learning for Analyzing Malware

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10394))

Abstract

The Internet has become an indispensable part of people’s work and life. It provides favorable communication conditions for malwares. Therefore, malwares are endless and spread faster and become one of the main threats of current network security. Based on the malware analysis process, from the original feature extraction and feature selection to malware detection, this paper introduces the machine learning algorithm such as clustering, classification and association analysis, and how to use the machine learning algorithm to malware and its variants for effective analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Michael, S., Andrew. H.: Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. Publishing House of Electronics Industry (2014)

    Google Scholar 

  2. Liao, G., Liu, J.A.: Malicious code detection method based on data mining and machine learning. J. Inf. Secur. Res. (2016)

    Google Scholar 

  3. Huang, H.X., Zhang, L., Deng, L.: Review of malware detection based on data mining. Comput. Sci. (2016)

    Google Scholar 

  4. Lee, D.H., Song, I.S., Kim, K.J.: A study on malicious codes pattern analysis using visualization. In: IEEE Computer Society, pp. 1–5 (2011)

    Google Scholar 

  5. Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)

    MathSciNet  MATH  Google Scholar 

  6. Schultz, M.G., Eskin, E., Zadok, E.: Data mining methods for detection of new malicious executables, pp. 38–49 (2001)

    Google Scholar 

  7. Shabtai, A., Moskovitch, R., Feher, C.: Detecting unknown malicious code by applying classification techniques on OpCode patterns. Secur. Inform. (2012)

    Google Scholar 

  8. Lai, Y.A.: Feature selection for malicious detection. In: ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/distributed Computing, pp. 365–370. IEEE Xplore (2008)

    Google Scholar 

  9. Mao, M., Liu, Y.: Research on malicious program detection based on machine learning. Softw. Guide (2010)

    Google Scholar 

  10. Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55, 78–87 (2012)

    Article  Google Scholar 

  11. Perdisci, R., Lanzi, A., Lee, W.: Classification of packed executables for accurate computer virus detection. Pattern Recogn. Lett. 29, 1941–1946 (2008)

    Article  Google Scholar 

  12. Ding, Y., Yuan, X., Tang, K.: A fast malware detection algorithm based on objective-oriented association mining. Comput. Secur. 39, 315–324 (2013)

    Article  Google Scholar 

  13. Santos, I., Brezo, F., Nieves, J., Penya, Y.K., Sanz, B., Laorden, C., Bringas, Pablo G.: Idea: opcode-sequence-based malware detection. In: Massacci, F., Wallach, D., Zannone, N. (eds.) ESSoS 2010. LNCS, vol. 5965, pp. 35–43. Springer, Heidelberg (2010). doi:10.1007/978-3-642-11747-3_3

    Chapter  Google Scholar 

  14. Karim, M.E., Walenstein, A., Lakhotia, A.: Malware phylogeny generation using permutations of code. J. Comput. Virol. Hacking Techn. 1, 13–23 (2005)

    Article  Google Scholar 

  15. Bilar, D.: Opcodes as predictor for malware. Int. J. Electron. Secur. Digital Forensics 1, 156–168 (2007)

    Article  Google Scholar 

  16. Santos, I., Brezo, F., Ugarte-Pedrero, X.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. 231, 64–82 (2013)

    Article  MathSciNet  Google Scholar 

  17. Liang, C.: Research on the main techonologies. In: Malware Code Detection. Yangzhou University (2012)

    Google Scholar 

  18. Chen, X., Zhang, J., Xiao-Guang, L.: A text classification method for chinese pornographic web recognition. Meas. Control Technol. 30(5), 27–26 (2011)

    Google Scholar 

  19. Cavnar, W.B., Trenkle, J.M.: N-Gram-based text categorization. In: Proceedings of SDAIR 1994, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, US (1994)

    Google Scholar 

  20. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)

    Article  Google Scholar 

  21. Adebayo, O.S., Abdulaziz, N.: Android malware classification using static code analysis and Apriori algorithm improved with particle swarm optimization. In: Information and Communication Technologies, pp. 123–128 (2015)

    Google Scholar 

  22. www.kaggle.com/malware-classification

  23. Fang, Z.: Research and Implementation of Malware Classification. National University of Defense Technology (2011)

    Google Scholar 

  24. Li, W.: Research and Implementation of Mobile Customer Churn Prediction Based on Decision Tree Algorithm. Beijing University (2010)

    Google Scholar 

  25. Zhu, L.J., Yu-Fen, X.U.: Application of C4.5 algorithm in unknown malicious code identification. J. Shenyang Univ. Chem. Technol. (2013)

    Google Scholar 

  26. Zhang, M.: Remote Sensing Image Classification Algorithm Based on Random Forest. Shandong University of Science and Technology (2013)

    Google Scholar 

  27. Tian, R., Batten, L., Islam, R.: An automated classification system based on the strings of trojan and virus families. Malware (2009)

    Google Scholar 

  28. Zhao, Z., Wang, J., Wang, C.: An unknown malware detection scheme based on the features of graph. Secur. Commun. Netw. 6, 239–246 (2013)

    Article  Google Scholar 

  29. Zhu, K., Yin, B., Mao, Y.: Malware classification approach based on valid window and Naive Bayes. J. Comput. Res. Develop. 373–381 (2014)

    Google Scholar 

  30. Sun, G.: Research on intrusion detection system based on SVM. Beijing University of Posts and Telecommunications (2007)

    Google Scholar 

  31. Qu, J.: Research on Overlap Similarity-based Hierarchical Clustering Algorithms and Its Application. Xiamen University (2007)

    Google Scholar 

  32. Feng, S.R.: Research and application of DBSCAN clustering algorithm based on density. Comput. Eng. Appl. 162–165 (2006)

    Google Scholar 

  33. Yu, J., He, P., Sun, Y.H.: Research on text hierarchical clustering algorithm based on K-Means. Comput. Appl. (2005)

    Google Scholar 

  34. Qian, Y., Peng, G., Wang, Y.: Homology analysis of malicious code and family clustering. Comput. Eng. Appl. 51, 76–81 (2015)

    Google Scholar 

  35. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc (1994)

    Google Scholar 

  36. Zhang, W., Zheng, Q., Shuai, J.M.: New malicious executables detection based on association rules. Comput. Eng. 172–174 (2008)

    Google Scholar 

  37. Li, Z.: Research on Malicious Code Analysis Based on API Association. The PLA Information Engineering University (2014)

    Google Scholar 

  38. Alazab, M.: Profiling and classifying the behaviour of malicious codes. J. Syst. Softw. 100, 91–102 (2014)

    Article  Google Scholar 

  39. Wang, X.Z., Sun, L.C., Zhang, M.: Malicious behavior detection method based on sequential pattern discovery. Comput. Eng. 37, 1–3 (2011)

    Google Scholar 

  40. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD International Conference on Management of Data. ACM, pp. 1–12 (2000)

    Google Scholar 

  41. Qin, L., Shi, Z.: Net flow association rules mining based on iceberg queries. Comput. Eng. 31, 9–11 (2005)

    Google Scholar 

  42. Wang, W.J., Liu, B.X.: Association rule-based network intrusion detection system. Hedianzixue Yu Tance Jishu/Nuclear Electron. Detection Technol. 119–123 (2015)

    Google Scholar 

  43. Kruczkowski, M., Niewiadomska-Szynkiewicz, E., Kozakiewicz, A.: FP-tree and SVM for malicious web campaign detection. In: Nguyen, N.T., Trawiński, B., Kosala, R. (eds.) ACIIDS 2015. LNCS, vol. 9012, pp. 193–201. Springer, Cham (2015). doi:10.1007/978-3-319-15705-4_19

    Google Scholar 

  44. Zheng, L.X., Xu, X.L., Li, J.: Malicious URL prediction based on community detection. In: International Conference on Cyber Security of Smart Cities, Industrial Control System and Communications, pp. 1–7. IEEE (2015)

    Google Scholar 

  45. Appavu, S., Rajaram, R.: Association rule mining for suspicious email detection: a data mining approach. In: Intelligence and Security Informatics, pp. 316–323. IEEE (2007)

    Google Scholar 

  46. Li, X., Dong, X., Wang, Y.: Malicious code forensics based on data mining. In: International Conference on Fuzzy Systems and Knowledge Discovery, pp. 978–983. IEEE (2013)

    Google Scholar 

Download references

Acknowledgements

This work was financially supported by National Key R&D Program of China (2016YFB0801304).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yajie Dong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Dong, Y., Liu, Z., Yan, Y., Wang, Y., Peng, T., Zhang, J. (2017). Machine Learning for Analyzing Malware. In: Yan, Z., Molva, R., Mazurczyk, W., Kantola, R. (eds) Network and System Security. NSS 2017. Lecture Notes in Computer Science(), vol 10394. Springer, Cham. https://doi.org/10.1007/978-3-319-64701-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64701-2_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64700-5

  • Online ISBN: 978-3-319-64701-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics