Malware visualization methods based on deep convolution neural networks

Ren, Zhuojun; Chen, Guang; Lu, Wenke

doi:10.1007/s11042-019-08310-9

Malware visualization methods based on deep convolution neural networks

Published: 16 December 2019

Volume 79, pages 10975–10993, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

674 Accesses
18 Citations
Explore all metrics

Abstract

In this paper, we propose two visualization methods for malware analysis based on n-gram features of byte sequences. The space filling curve mapping (SFCM) method uses fractal curves to visualize the one-gram features of byte sequences, i.e. malware files themselves, and distinguishes the printable characters from non-printable ones by different colors. This method addresses the issues that the existing methods cannot interactively locate characters and avoid the risk of the Decompression Bomb attack caused by large malware. The Markov dot plot (MDP) method visualizes the bi-gram features and their statistical information of byte sequences as the coordinates and brightness of the pixels and solves the problem that the relocation of code sections or the addition of redundant information helps malware escape the global image detection. The two methods are applied to the Microsoft malware samples (BIG 2015| Kaggle) and their visualized results are learned by the deep convolution networks to extract image features used for classification by SVM (support vector machine). In terms of malware classification, our methods obtained 98.36% and 99.08% classification accuracy, respectively. We also visualized the benign PE (portable executable) files in the Windows OS and verified them with the above malware set. In terms of malware detection, the two methods obtained 99.21% and 98.74% detection accuracy, respectively. These results are better than the existing grayscale method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of convolutional neural network architectures and their optimizations

Article 22 June 2022

Large language models and unsupervised feature learning: implications for log analysis

Article 04 April 2024

Deep packet: a novel approach for encrypted traffic classification using deep learning

Article 13 May 2019

References

AERAsec (n.d.), Decompression bomb vulnerabilities. www.aerasec.de/security/advisories/decompression-bomb-vulnerability.html. Accessed 2018 May 1
Anotaipaiboon W, Makhanov SS (2008) Curvilinear space filling curves for five-axis machining. Comput Aided Des 40(3):350–367. https://doi.org/10.1016/j.cad.2007.11.007
Article MATH Google Scholar
Bayer U, Moser A, Kruegel C et al (2006) Dynamic analysis of malicious code. J Comput Virol 2(1):67–77. https://doi.org/10.1007/s11416-006-0012-2
Article Google Scholar
Boeing G (2016) Visual analysis of nonlinear dynamical systems: chaos, fractals, self-similarity and the limits of prediction. Systems 4(4):37. https://doi.org/10.3390/systems4040037
Article Google Scholar
Böhm C, Klump G, Kriegel HP (1999) XZ-ordering: a space filling curve for objects with spatial extension. In: 6th International Symposium on Spatial Databases, pp 75–90. https://doi.org/10.1007/3-540-48482-5_7
Chandrasekar K, Cleary G, Cox O et al (2017) Internet security threat report. Symantec Corp 22:38 https://www.symantec.com/about/newsroom/press-kits/istr-22. Accessed 2018 May 1
Chiang WL, Lee MC, Lin CJ (2016) Parallel dual coordinate descent method for large-scale linear classification in multi-core environments. In: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1485–1494. https://doi.org/10.1145/2939672.2939826
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp 1800–1807. https://doi.org/10.1109/CVPR.2017.195
Conti G, Dean E, Sinda M et al (2008) Visual reverse engineering of binary and data files. In: 5th International Workshop on Visualization for Computer Security, pp 1–17. https://doi.org/10.1007/978-3-540-85933-8_1
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/BF00994018
Article MATH Google Scholar
Douze M, Jégou H, Sandhawalia H et al (2009) Evaluation of GIST descriptors for web-scale image search. In: ACM International Conference on Image and Video Retrieval, pp 1–8. https://doi.org/10.1145/1646396.1646421
Faloutsos C (1988) Gray codes for partial match and range queries. IEEE Trans Softw Eng 14(10):1381–1393. https://doi.org/10.1109/32.6184
Article MathSciNet MATH Google Scholar
Gove R, Saxe J, Gold S et al (2014) SEEM: a scalable visualization for comparing multiple large sets of attributes for malware analysis. In: ACM 11th Workshop on Visualization for Cyber Security, pp 72–79. https://doi.org/10.1145/2671491.2671496
Han KS, Lim JH, Kang B, Im EG (2015) Malware analysis using visualized images and entropy graphs. Int J Inf Secur 14(1):1–14. https://doi.org/10.1007/s10207-014-0242-0
Article Google Scholar
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional networks and applications in vision. In: 2010 IEEE International Symposium on Circuits and Systems, pp 1–4. https://doi.org/10.1109/ISCAS.2010.5537907
Lee DH, Kim KJ (2014) A study on malicious codes pattern analysis using visualization. Multimed Tools Appl 68(2):253–263. https://doi.org/10.1007/s11042-011-0907-x
Article Google Scholar
Liao S, Lopez MA, Leutenegger ST (2001) High dimensional similarity search with space filling curves. In: IEEE 17th International Conference on Data Engineering, pp 615–622. https://doi.org/10.1109/ICDE.2001.914876
Mokbel MF, Aref WG (2011) Irregularity in high-dimensional space filling curves. Distrib Parallel Database 29(3):217–238. https://doi.org/10.1007/s10619-010-7070-7
Article Google Scholar
Nataraj L, Karthikeyan S, Jacob G et al (2011) Malware images: visualization and automatic classification. In: ACM 8th International Symposium on Visualization for Cyber Security, pp 1–7. https://doi.org/10.1145/2016904.2016908
Niedermeier R, Reinhardt K, Sanders P (1997) Towards optimal locality in mesh-Indexings. In: International Symposium on Fundamentals of Computation Theory, pp 364–375. https://doi.org/10.1007/BFb0036198
Panas T (2008) Signature visualization of software binaries. In: 4th ACM symposium on Software visualization, pp 185–188. https://doi.org/10.1145/1409720.1409749
Quist DA, Liebrock LM (2009) Visualizing compiled executables for malware analysis. In: IEEE 6th International Workshop on Visualization for Cyber Security, pp 27–32. https://doi.org/10.1109/vizsec.2009.5375539
Saxe J, Mentis D, Greamo C (2012) Visualization of shared system call sequence relationships in large malware corpora. In: ACM International Symposium on Visualization for Cyber Security, pp 33–40. https://doi.org/10.1145/2379690.2379695
Schrack G, Stocco L (2015) Generation of spatial orders and space filling curves. IEEE Trans Image Process 24(6):1791–1800. https://doi.org/10.1109/TIP.2015.2409571
Article MathSciNet MATH Google Scholar
Simard PY, Steinkraus D, Platt JC (2003) Best practices for convolutional neural networks applied to visual document analysis. In: 7th International Conference on Document Analysis and Recognition, pp 1–6. https://doi.org/10.1109/ICDAR.2003.1227801
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations, pp 1–14. https://arxiv.org/abs/1409.1556. Accessed 2018 May 1
Strelkov VV (2008) A new similarity measure for histogram comparison and its application in time series analysis. Pattern Recogn Lett 29(13):1768–1774. https://doi.org/10.1016/j.patrec.2008.05.002
Article Google Scholar
Szegedy C, Vanhoucke V, Ioffe S et al (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308
Trinius P, Holz T, Göbel J et al (2009) Visual analysis of malware behavior using treemaps and thread graphs. In: IEEE 6th International Workshop on Visualization for Cyber Security, pp 33–38. https://doi.org/10.1109/vizsec.2009.5375540
Willems C, Holz T, Freiling F (2007) Toward automated dynamic malware analysis using CWSandbox. IEEE Secur Priv 5(2):32–39. https://doi.org/10.1109/MSP.2007.45
Article Google Scholar
Yee CL, Chuan LL, Ismail M et al (2012) A static and dynamic visual debugger for malware analysis. In: 18th Asia-Pacific Conference on Communications, pp 765–769. https://doi.org/10.1109/APCC.2012.6388211
Yoo IS (2004) Visualizing windows executable viruses using self-organizing maps. In: ACM Workshop on Visualization and Data Mining for Computer Security, pp 82–89. https://doi.org/10.1145/1029208.1029222
Zhuo W, Nadjin Y (2012) MalwareVis: entity-based visualization of malware network traces. In: 9th International Symposium on Visualization for Cyber Security, pp 41–47. https://doi.org/10.1145/2379690.2379696

Download references

Acknowledgements

This work was sponsored by the National Natural Science Foundation of China under Grant 61671006 and Chinese Universities Scientific Fund under Grant 14D310407. The authors would like to thank Jie Mao and Tao Gong for constructive suggestions.

Author information

Authors and Affiliations

College of Information Science and Technology, Donghua University, Shanghai, China
Zhuojun Ren, Guang Chen & Wenke Lu

Authors

Zhuojun Ren
View author publications
You can also search for this author in PubMed Google Scholar
Guang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wenke Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhuojun Ren.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ren, Z., Chen, G. & Lu, W. Malware visualization methods based on deep convolution neural networks. Multimed Tools Appl 79, 10975–10993 (2020). https://doi.org/10.1007/s11042-019-08310-9

Download citation

Received: 01 May 2018
Revised: 23 April 2019
Accepted: 02 October 2019
Published: 16 December 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s11042-019-08310-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Malware visualization methods based on deep convolution neural networks

Abstract

Access this article

Similar content being viewed by others

A review of convolutional neural network architectures and their optimizations

Large language models and unsupervised feature learning: implications for log analysis

Deep packet: a novel approach for encrypted traffic classification using deep learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Malware visualization methods based on deep convolution neural networks

Abstract

Access this article

Similar content being viewed by others

A review of convolutional neural network architectures and their optimizations

Large language models and unsupervised feature learning: implications for log analysis

Deep packet: a novel approach for encrypted traffic classification using deep learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation