Skip to main content
Log in

Clustering of Malicious Executable Files Based on the Sequence Analysis of System Calls

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

The use of clustering algorithms to determine the types of malicious software files based on the analysis of the WinAPI function call sequences is investigated. The use of clustering algorithms such as k-means, EM-algorithm, hierarchical algorithm, and the affinity propagation method is considered. The quality of clustering is evaluated using the silhouette metrics, the Calinski–Harabasz index, and the Davies–Bouldin index.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

Similar content being viewed by others

REFERENCES

  1. Hu, X., et al., Mutant x-s: Scalable malware clustering based on static features, 2013 USENIX Annual Technical Conference, 2013, pp. 187–198.

  2. Oprişa, C., Cabău, G., and Pal, G.S., Malware clustering using suffix trees, J. Comput. Virol. Hacking Tech., 2016, vol. 12, no. 1, pp. 1–10.

    Article  Google Scholar 

  3. Altaher, A., et al., Malware detection based on evolving clustering method for classification, Sci. Res. Essays, 2012, vol. 7, no. 22, pp. 2031–2036.

    Google Scholar 

  4. Shishkin, N.V., Matskevich, E.E., and Kozachok, A.V., A malware detection algorithm based on analysis of malware behavior, Inf. Bezop., 2012, vol. 15, no. 3, pp. 353–360.

    Google Scholar 

  5. Pai, S., A comparison of clustering techniques for malware analysis, Master’s Project, San Jose State University, 2015.

    Google Scholar 

  6. Wicherski, G., peHash: A novel approach to fast malware clustering, LEET, 2009, vol. 9, p. 8.

    Google Scholar 

  7. Giannella, C. and Bloedorn, E., Spectral malware behavior clustering, 2015 IEEE International Conference on Intelligence and Security Informatics (ISI), 2015, pp. 7–12.

  8. Cuckoo Sandbox. https://cuckoosandbox.org/.

  9. VirusShare. https://virusshare.com/.

  10. Coates, A. and Ng, A.Y., Learning feature representations with k-means, in Neural Networks: Tricks of the Trade, Berlin–Heidelberg: Springer, 2012, pp. 561–580.

    Google Scholar 

  11. Xiaojin Zhu, The EM Algorithm, University of Wisconsin, 2007. http://pages.cs.wisc.edu/~jerryzhu/ cs761/em.pdf.

  12. Frey, B.J. and Dueck, D., Clustering by passing messages between data points, Science, 2007, vol. 315, no. 5814, pp. 972–976.

    Article  MathSciNet  Google Scholar 

  13. Chen, G., et al., Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data, Stat. Sin., 2002, pp. 241–262.

  14. Davies, D.L. and Bouldin, D.W., A cluster separation measure, IEEE Trans. Pattern Anal. Mach. Intell., 1979, no. 2, pp. 224–227.

    Article  Google Scholar 

  15. Caliński, T. and Harabasz, J., A dendrite method for cluster analysis, Commun. Stat. Theory Methods, 1974, vol. 3, no. 1, pp. 1–27.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to E. V. Zhukovskii or D. P. Zegzhda.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Translated by I. P. Obrezanova

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ognev, R.A., Zhukovskii, E.V. & Zegzhda, D.P. Clustering of Malicious Executable Files Based on the Sequence Analysis of System Calls. Aut. Control Comp. Sci. 53, 1045–1055 (2019). https://doi.org/10.3103/S0146411619080212

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411619080212

Keywords:

Navigation