Skip to main content
Log in

Comparing files using structural entropy

  • Original paper
  • Published:
Journal in Computer Virology Aims and scope Submit manuscript

Abstract

One of the main trends in the modern anti-virus industry is the development of algorithms that help estimate the similarity of files. Since malware writers tend to use increasingly complex techniques to protect their code such as obfuscation and polymorphism, anti-virus software vendors face problems of the increasing difficulty of file scanning, the considerable growth of anti-virus databases, and file storages overgrowth. For solving such problems, a static analysis of files appears to be of some interest. Its use helps determine those file characteristics that are necessary for their comparison without executing malware samples within a protected environment. The solution provided in this article is based on the assumption that different samples of the same malicious program have a similar order of code and data areas. Each such file area may be characterized not only by its length, but also by its homogeneity. In other words, the file may be characterized by the complexity of its data order. Our approach consists of using wavelet analysis for the segmentation of files into segments of different entropy levels and using edit distance between sequence segments to determine the similarity of the files. The proposed solution has a number of advantages that help detect malicious programs efficiently on personal computers. First, this comparison does not take into account the functionality of analysed files and is based solely on determining the similarity in code and data area positions which makes the algorithm effective against many ways of protecting executable code. On the other hand, such a comparison may result in false alarms. Therefore, our solution is useful as a preliminary test that triggers the running of additional checks. Second, the method is relatively easy to implement and does not require code disassembly or emulation. And, third, the method makes the malicious file record compact which is significant when compiling anti-virus databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Breitenbacher, Z.: Entropy based detection of polymorphic malware. In: Proceedings of the 19th Annual EICAR Conference “ICT Security: Quo Vadis?”, pp. 117–128. Presses Techniques de l’ESIEA, Paris (2010)

  2. Daubechies, I.: Desjat’ lektsij po vejvletam. [Ten lectures on wavelets]. Izhevsk: NIC Regular and Chaotic Dynamics (2001)

  3. Ebringer, R., Sun, L., Boztas, S.: A fast randomness test that preserves local detail. In: Proceedings of the Virus Bulletin (VB) Conference, pp. 34–42. Virus Bulletin, Abingdon (2008)

  4. Fabjanski K., Kruk T.: Network traffic classification by common subsequence finding. In: Bubak, M., Albada, G., Sloot, P. (eds) Computational Science—ICCS 2008, vol. 5101, pp. 499–508. Springer, Berlin (2008)

    Chapter  Google Scholar 

  5. Gheorghescu, M.: An automated virus classification system. In: Proceedings of the Virus Bulletin (VB) Conference, pp. 294–300. Virus Bulletin, Abingdon (2005)

  6. Kreibich, C., Crowcroft, J.: Efficient sequence alignment of network traffic. In: Proceedings of Internet Measurement Conference, pp. 307–312. IMC, Melbourne (2006)

  7. Li, J., Xu, J., Xu, M., Zhao, H., Zheng, N.: Malware obfuscation measuring via evolutionary similarity. In: Proceedings of the International Conference on Future Information Networks, pp. 197–200. IEEE Computer Society, Los Alamitos (2009)

  8. Lyda R., Hamrock J.: Using entropy analysis to find encrypted and packed malware. IEEE Security Priv. 5(2), 40–45 (2007)

    Article  Google Scholar 

  9. Newsome, J., Karp, B., Song, D.: Polygraph: Automatically generating signatures for polymorphic worms. In: Proceedings of the 2005 IEEE Symposium on Security and Privacy, pp. 226–241. IEEE Computer Society, Los Alamitos (2005)

  10. Perdisci R., Lanzi A., Lee W.: Classification of packed executables for accurate computer virus detection. Pattern Recognit. Lett. 29(14), 1941–1946 (2008)

    Article  Google Scholar 

  11. Sun, L., Versteeg, S., Boztas, S., Yann, T.: Pattern recognition techniques for the classification of malware packers. In: Proceedings of the 15th Australian Conference on Information Security and Privacy (pp. 370–390). Springer, Berlin (2010)

  12. Sung, A.H., Xu, J., Chavez, P., Mukkamala, S.: Static analyzer of vicious executables (SAVE). In: Proceedings of the 20th Annual Computer Security Applications Conference, pp. 326–334. IEEE Computer Society, Washington (2004)

  13. Wagener G., State R., Dulaunoy A.: Malware behaviour analysis, extended version. J. Comput. Virol. 4(4), 279–287 (2007)

    Article  Google Scholar 

  14. Christodorescu, M., Jha, S.: Testing malware detectors. In: Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 34–44. ACM, New York (2004)

  15. Jacob, G., Neugschwandtner, M., Comparetti, P.M., Kruegel, C., Vigna, G.: A static, packer-agnostic filter to detect similar malware samples. Department of Computer Science University of California Santa Barbara Technical Report, 2010–26. Retrieved 29 November 2010 from http://www.cs.ucsb.edu/research/tech_reports/ (2010)

  16. Wagner R.A., Fischer M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  17. Prangišvili, I.V.: Èntropijnye i drugie sistemnye zakonomernosti. Voprosy upravlenija složnymi sistemami (Entropy and other system laws. Issues of managing complex systems). p. 432. Nauka, Moscow (2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Sorokin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sorokin, I. Comparing files using structural entropy. J Comput Virol 7, 259–265 (2011). https://doi.org/10.1007/s11416-011-0153-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-011-0153-9

Keywords

Navigation