Skip to main content
Log in

Evaluation of malware phylogeny modelling systems using automated variant generation

  • Eicar 2008 extended version
  • Published:
Journal in Computer Virology Aims and scope Submit manuscript

Abstract

A malware phylogeny model is an estimation of the derivation relationships between a set of malware samples. Systems that construct phylogeny models are expected to be useful for malware analysts. While several such systems have been proposed, little is known about the consistency of their results on different data sets, about their generalizability across different types of malware evolution. This paper explores these issues using two artificial malware history generators: systems that simulate malware evolution according to different evolution models. A quantitative study was conducted using two phylogeny model construction systems and multiple samples of artificial evolution. High variability was found in the quality of their results on different data sets, and the systems were shown to be sensitive to the characteristics of evolution in the data sets. The results call into question the adequacy of evaluations typical in the field, raise pragmatic concerns about tool choice for malware analysts, and underscore the important role that model-based simulation is expected to play in evaluating and selecting suitable malware phylogeny construction systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Beaucamps P.: Advanced polymorphic techniques. Int. J. Comput. Sci. 2(3), 194–205 (2007)

    Google Scholar 

  2. Bluis, J., Shin, D.: Nodal distance algorithm: calculating a phylogenetic tree comparison metric. In: Proceedings of the Third IEEE Symposium on Bioinformatics and BioEngineering, pp. 87–94 (2003)

  3. Buckley C., Dimmick D., Soboroff I., Voorhees E.: Bias and the limits of pooling for large collections. Inf. Retr. 10(6), 491–508 (2007)

    Article  Google Scholar 

  4. Christodorescu, M., Jha, S.: Testing malware detectors. In Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis, Boston, MA, USA, pp. 34–44 (2004)

  5. Erdélyi, G., Carrera, E.: Digital genome mapping: advanced binary malware analysis. In: Martin, H. (ed.) Proceedings of the 15th Virus Bulletin International Conference, Chicago, IL, USA, pp. 187–197. Virus Bulletin Ltd (2004)

  6. Filiol E., Jacob G., Le Laird M.: Evaluation methodology and theoretical model for antiviral behavioural detection strategies. J. Comput. Virol. 3(1), 23–37 (2007)

    Article  Google Scholar 

  7. Goldberg L., Goldberg P., Phillips C., Sorkin G.: Constructing computer virus phylogenies. J. Algorit. 26(1), 188–208 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  8. Gorshenev, A.A., Pis’mak, Y.M.: Punctuated equilibrium in software evolution. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 70(6), (2004). Epub 23 December 2004

  9. Harding E.F.: The probabilities of rooted tree shapes generated by random bifurcation. Adv. Appl. Prob. 3, 44–77 (1971)

    Article  MATH  MathSciNet  Google Scholar 

  10. Hayes, M.: Simulating malware evolution for evaluating program phylogenies. Master’s thesis, Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, USA, 70504 (2008)

  11. Infection Vectors. Agobot and the kitchen sink. Retrieved from http://www.infectionvectors.com/vectors/kitchensink.htm, 17 Feb 2008

  12. Karim M.E., Lakhotia A.W.A., Parida L.: Malware phylogeny generation using permutations of code. J. Comput. Virol. 1(1), 13–23 (2005)

    Article  Google Scholar 

  13. Karypis, G.: CLUTO—a clustering toolkit. Technical Report TR 02–017, Deptment of Computer Science, University of Minnesota (2003)

  14. Lyle, J.R., Gallagher, K.B.: A program decomposition scheme with applications to software modification and testing. In: Proceedings of the 22nd Annual Hawaii International conference on System Sciences, vol. 2, pp. 479–485 (1989)

  15. Ma, J., Dunagan, J., Wang, H.J., Savage, S., Voelker, G.M.: Finding diversity in remote code injection exploits. In: Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement, Rio de Janeiro, Brazil, pp. 53–64 (2006)

  16. Marx, A., Dressman, F.: The wildlist is dead: long live the wildlist! In: Martin, H. (ed.) Proceedings of the 18th Virus Bulletin International Conference, Vienna, Austria, pp. 136–147 (2007)

  17. Nakhleh, L., Sun, J., Warnow, T., Linder, C., Moret, B., Tholse, A.: Towards the development of computational tools for evaluating phylogenetic network reconstruction. In: Proceedings of the Eighth Pacific Symposium on Biocomputing, pp. 315–326 (2003)

  18. Rambaut A., Grassly N.: Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Bioinformatics 13(3), 235–238 (1997)

    Article  Google Scholar 

  19. Robinson D., Foulds L.: Comparison of phylogenetic trees. Math. Biosci. 53(1/2), 131–147 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  20. Sim, S.E., Easterbrook, S., Holt, R.C.: Using benchmarking to advance research: a challenge to software engineering. In: Proceedings of the 25th International Conference on Software Engineering (ICSE’03), pp. 74–83 (2003)

  21. Symantec. Symantec global internet security threat report volume XIII: trends for July–December 2007, April 2008

  22. Wehner S.: Analyzing worms and network traffic using compression. J. Comput. Secur. 15, 303–320 (2007)

    Google Scholar 

  23. Wu, J., Spitzer, C.W., Hassan, A.E., Holt, R.C.: Evolution spectrographs: Visualizing punctuated change in software evolution. In: Proceedings of the Seventh International Workshop on the Principles of Software Evolution (IWPSE’04), pp. 57–66 (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew Hayes.

Additional information

M. Hayes is presently at Case Western Reserve University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hayes, M., Walenstein, A. & Lakhotia, A. Evaluation of malware phylogeny modelling systems using automated variant generation. J Comput Virol 5, 335–343 (2009). https://doi.org/10.1007/s11416-008-0100-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-008-0100-6

Keywords

Navigation