Skip to main content
Log in

Malware visualization methods based on deep convolution neural networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose two visualization methods for malware analysis based on n-gram features of byte sequences. The space filling curve mapping (SFCM) method uses fractal curves to visualize the one-gram features of byte sequences, i.e. malware files themselves, and distinguishes the printable characters from non-printable ones by different colors. This method addresses the issues that the existing methods cannot interactively locate characters and avoid the risk of the Decompression Bomb attack caused by large malware. The Markov dot plot (MDP) method visualizes the bi-gram features and their statistical information of byte sequences as the coordinates and brightness of the pixels and solves the problem that the relocation of code sections or the addition of redundant information helps malware escape the global image detection. The two methods are applied to the Microsoft malware samples (BIG 2015| Kaggle) and their visualized results are learned by the deep convolution networks to extract image features used for classification by SVM (support vector machine). In terms of malware classification, our methods obtained 98.36% and 99.08% classification accuracy, respectively. We also visualized the benign PE (portable executable) files in the Windows OS and verified them with the above malware set. In terms of malware detection, the two methods obtained 99.21% and 98.74% detection accuracy, respectively. These results are better than the existing grayscale method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. AERAsec (n.d.), Decompression bomb vulnerabilities. www.aerasec.de/security/advisories/decompression-bomb-vulnerability.html. Accessed 2018 May 1

  2. Anotaipaiboon W, Makhanov SS (2008) Curvilinear space filling curves for five-axis machining. Comput Aided Des 40(3):350–367. https://doi.org/10.1016/j.cad.2007.11.007

    Article  MATH  Google Scholar 

  3. Bayer U, Moser A, Kruegel C et al (2006) Dynamic analysis of malicious code. J Comput Virol 2(1):67–77. https://doi.org/10.1007/s11416-006-0012-2

    Article  Google Scholar 

  4. Boeing G (2016) Visual analysis of nonlinear dynamical systems: chaos, fractals, self-similarity and the limits of prediction. Systems 4(4):37. https://doi.org/10.3390/systems4040037

    Article  Google Scholar 

  5. Böhm C, Klump G, Kriegel HP (1999) XZ-ordering: a space filling curve for objects with spatial extension. In: 6th International Symposium on Spatial Databases, pp 75–90. https://doi.org/10.1007/3-540-48482-5_7

  6. Chandrasekar K, Cleary G, Cox O et al (2017) Internet security threat report. Symantec Corp 22:38 https://www.symantec.com/about/newsroom/press-kits/istr-22. Accessed 2018 May 1

  7. Chiang WL, Lee MC, Lin CJ (2016) Parallel dual coordinate descent method for large-scale linear classification in multi-core environments. In: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1485–1494. https://doi.org/10.1145/2939672.2939826

  8. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp 1800–1807. https://doi.org/10.1109/CVPR.2017.195

  9. Conti G, Dean E, Sinda M et al (2008) Visual reverse engineering of binary and data files. In: 5th International Workshop on Visualization for Computer Security, pp 1–17. https://doi.org/10.1007/978-3-540-85933-8_1

  10. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/BF00994018

    Article  MATH  Google Scholar 

  11. Douze M, Jégou H, Sandhawalia H et al (2009) Evaluation of GIST descriptors for web-scale image search. In: ACM International Conference on Image and Video Retrieval, pp 1–8. https://doi.org/10.1145/1646396.1646421

  12. Faloutsos C (1988) Gray codes for partial match and range queries. IEEE Trans Softw Eng 14(10):1381–1393. https://doi.org/10.1109/32.6184

    Article  MathSciNet  MATH  Google Scholar 

  13. Gove R, Saxe J, Gold S et al (2014) SEEM: a scalable visualization for comparing multiple large sets of attributes for malware analysis. In: ACM 11th Workshop on Visualization for Cyber Security, pp 72–79. https://doi.org/10.1145/2671491.2671496

  14. Han KS, Lim JH, Kang B, Im EG (2015) Malware analysis using visualized images and entropy graphs. Int J Inf Secur 14(1):1–14. https://doi.org/10.1007/s10207-014-0242-0

    Article  Google Scholar 

  15. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  16. LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional networks and applications in vision. In: 2010 IEEE International Symposium on Circuits and Systems, pp 1–4. https://doi.org/10.1109/ISCAS.2010.5537907

  17. Lee DH, Kim KJ (2014) A study on malicious codes pattern analysis using visualization. Multimed Tools Appl 68(2):253–263. https://doi.org/10.1007/s11042-011-0907-x

    Article  Google Scholar 

  18. Liao S, Lopez MA, Leutenegger ST (2001) High dimensional similarity search with space filling curves. In: IEEE 17th International Conference on Data Engineering, pp 615–622. https://doi.org/10.1109/ICDE.2001.914876

  19. Mokbel MF, Aref WG (2011) Irregularity in high-dimensional space filling curves. Distrib Parallel Database 29(3):217–238. https://doi.org/10.1007/s10619-010-7070-7

    Article  Google Scholar 

  20. Nataraj L, Karthikeyan S, Jacob G et al (2011) Malware images: visualization and automatic classification. In: ACM 8th International Symposium on Visualization for Cyber Security, pp 1–7. https://doi.org/10.1145/2016904.2016908

  21. Niedermeier R, Reinhardt K, Sanders P (1997) Towards optimal locality in mesh-Indexings. In: International Symposium on Fundamentals of Computation Theory, pp 364–375. https://doi.org/10.1007/BFb0036198

  22. Panas T (2008) Signature visualization of software binaries. In: 4th ACM symposium on Software visualization, pp 185–188. https://doi.org/10.1145/1409720.1409749

  23. Quist DA, Liebrock LM (2009) Visualizing compiled executables for malware analysis. In: IEEE 6th International Workshop on Visualization for Cyber Security, pp 27–32. https://doi.org/10.1109/vizsec.2009.5375539

  24. Saxe J, Mentis D, Greamo C (2012) Visualization of shared system call sequence relationships in large malware corpora. In: ACM International Symposium on Visualization for Cyber Security, pp 33–40. https://doi.org/10.1145/2379690.2379695

  25. Schrack G, Stocco L (2015) Generation of spatial orders and space filling curves. IEEE Trans Image Process 24(6):1791–1800. https://doi.org/10.1109/TIP.2015.2409571

    Article  MathSciNet  MATH  Google Scholar 

  26. Simard PY, Steinkraus D, Platt JC (2003) Best practices for convolutional neural networks applied to visual document analysis. In: 7th International Conference on Document Analysis and Recognition, pp 1–6. https://doi.org/10.1109/ICDAR.2003.1227801

  27. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations, pp 1–14. https://arxiv.org/abs/1409.1556. Accessed 2018 May 1

  28. Strelkov VV (2008) A new similarity measure for histogram comparison and its application in time series analysis. Pattern Recogn Lett 29(13):1768–1774. https://doi.org/10.1016/j.patrec.2008.05.002

    Article  Google Scholar 

  29. Szegedy C, Vanhoucke V, Ioffe S et al (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308

  30. Trinius P, Holz T, Göbel J et al (2009) Visual analysis of malware behavior using treemaps and thread graphs. In: IEEE 6th International Workshop on Visualization for Cyber Security, pp 33–38. https://doi.org/10.1109/vizsec.2009.5375540

  31. Willems C, Holz T, Freiling F (2007) Toward automated dynamic malware analysis using CWSandbox. IEEE Secur Priv 5(2):32–39. https://doi.org/10.1109/MSP.2007.45

    Article  Google Scholar 

  32. Yee CL, Chuan LL, Ismail M et al (2012) A static and dynamic visual debugger for malware analysis. In: 18th Asia-Pacific Conference on Communications, pp 765–769. https://doi.org/10.1109/APCC.2012.6388211

  33. Yoo IS (2004) Visualizing windows executable viruses using self-organizing maps. In: ACM Workshop on Visualization and Data Mining for Computer Security, pp 82–89. https://doi.org/10.1145/1029208.1029222

  34. Zhuo W, Nadjin Y (2012) MalwareVis: entity-based visualization of malware network traces. In: 9th International Symposium on Visualization for Cyber Security, pp 41–47. https://doi.org/10.1145/2379690.2379696

Download references

Acknowledgements

This work was sponsored by the National Natural Science Foundation of China under Grant 61671006 and Chinese Universities Scientific Fund under Grant 14D310407. The authors would like to thank Jie Mao and Tao Gong for constructive suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhuojun Ren.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, Z., Chen, G. & Lu, W. Malware visualization methods based on deep convolution neural networks. Multimed Tools Appl 79, 10975–10993 (2020). https://doi.org/10.1007/s11042-019-08310-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08310-9

Keywords

Navigation