Skip to main content

Detecting Malware Based on Opcode N-Gram and Machine Learning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 13))

Abstract

Due to its seriously damage to computer and network, malware (short for malicious software) has caught the attention of both anti-malware companies and researchers for decades. Although signature-based detection is the most significant method used in commercial anti-malware, it fails to recognize new and unseen malware. To solve this problem, n-gram of the Opcodes, generated by disassembling the executables, is used to be the features for the classification process. However, many researches in the past set n small such as 1 or 2. In this paper, firstly, we use various n-gram size from 1 to 15. Then we compare different feature select methods. Lastly, we perform experiments with different MFP, short for malicious files percentage to demonstrate which setting is better.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Santos, I., Brezo, F., Ugarte-Pedrero, X., et al.: Opcode sequences as representation of executables for data mining based unknown malware detection. Inf. Sci. 231, 64–82 (2013). MLA

    Article  MathSciNet  Google Scholar 

  2. Griffin, K., Schneider, S., Hu, X., Chiueh, T.: Automatic generation of string signatures for malware detection (2009)

    Google Scholar 

  3. Ye, Y., Wang, D., Li, T. et al.: An intelligent PE-malware detection system based on association mining (2008)

    Google Scholar 

  4. Kuzurin, N., Shokurov, A., Varnovsky, N., Zakharov, V.: On the concept of software obfuscation in computer security. LNCS, vol. 4779, p. 281 (2007)

    Google Scholar 

  5. O’Kane, P., Sezer, S., McLaughlin, K.: Obfuscation-the hidden malware. IEEE Secur. Priv. 9(5), 41–47 (2011)

    Article  Google Scholar 

  6. Cai, D., Theiler, J., Gokhale, M.: Detecting a malicious executable without prior knowledge of its patterns. In: Proceedings of the 2005 Defense and Security Symposium. Information Assurance, and Data Network Security, vol. 5812, pp. 1–12 (2005)

    Google Scholar 

  7. Schultz, M., Eskin, E., Zadok, F., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy, pp. 38–49 (2001)

    Google Scholar 

  8. Wu, S., Wang, P., Li, X., Zhang, Y.: Effective detection of android malware based on the usage of data flow APIs and machine learning. In: Information and Software Technology, vol. 75, pp. 17–25 (2016)

    Google Scholar 

  9. Fan, Y., Ye, Y., Chen, L.: Malicious sequential pattern mining for automatic malware detection. Expert Syst. Appl. 52, 16–25 (2016)

    Article  Google Scholar 

  10. Santos, I., Brezo, F., Nieves, J., et al.: Idea: opcode-sequence-based malware detection. LNCS, pp. 35–43 (2010)

    Google Scholar 

  11. Moskovitch, R., et al.: Unknown malcode detection using opcode representation, pp. 204–215 (2008)

    Google Scholar 

  12. Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on opcode patterns. Secur. Inf. 1, 1 (2012)

    Article  Google Scholar 

  13. Santos, I., et al.: Idea: opcode-sequence-based malware detection, vol. 5965, pp. 35–43 (2010)

    Google Scholar 

  14. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by National Science Foundation of China (No. U1536122).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengfei Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, P., Chen, Z., Cui, B. (2018). Detecting Malware Based on Opcode N-Gram and Machine Learning. In: Xhafa, F., Caballé, S., Barolli, L. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-69835-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69835-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69834-2

  • Online ISBN: 978-3-319-69835-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics