Skip to main content
Log in

Malware phylogeny generation using permutations of code

Phylogenní generování malware pomocí permutací kódu

Génération phylogénique de codes malveillants par permutation de code

Haitallisten ohjelmien evoluutiomallien luominen käyttämällä ohjelmakoodin permutaatiota

Stammbaumgenerierung maliziöser Codes auf Basis von pattern-basierten Methoden

Generazione di filogenie di malware utilizzando permutazioni di codice

Journal in Computer Virology Aims and scope Submit manuscript

Abstract

Malicious programs, such as viruses and worms, are frequently related to previous programs through evolutionary relationships. Discovering those relationships and constructing a phylogeny model is expected to be helpful for analyzing new malware and for establishing a principled naming scheme. Matching permutations of code may help build better models in cases where malware evolution does not keep things in the same order. We describe methods for constructing phylogeny models that uses features called n-perms to match possibly permuted codes. An experiment was performed to compare the relative effectiveness of vector similarity measures using n-perms and n-grams when comparing permuted variants of programs. The similarity measures using n-perms maintained a greater separation between the similarity scores of permuted families of specimens versus unrelated specimens. A subsequent study using a tree generated through n-perms suggests that phylogeny models based on n-perms may help forensic analysts investigate new specimens, and assist in reconciling malware naming inconsistencies

Abstrakt

Škodlivé programy, jako viry a červy (malware), jsou zřídka psány narychlo, jen tak. Obvykle jsou výsledkem svých evolučních vztahů. Zjištěním těchto vztahů a tvorby v přesné fylogenezi se předpokládá užitečná pomoc v analýze nového malware a ve vytvoření zásad pojmenovacího schématu. Porovnávání permutací kódu uvnitř malware mů že nabídnout výhody pro fylogenní generování, protože evoluční kroky implementované autory malware nemohou uchovat posloupnosti ve sdíleném kódu. Popisujeme rodinu fylogenních generátorů, které provádějí clustering pomocí PQ stromově založených extrakčních vlastností. Byl vykonán experiment v němž výstup stromu z těchto generátorů byl vyhodnocen vzhledem k fylogenezím generovaným pomocí vážených n-gramů. Výsledky ukazují výhody přístupu založeného na permutacích ve fylogenním generování malware.

Résumé

Les codes malveillants, tels que les virus et les vers, sont rarement écrits de zéro; en conséquence, il existe des relations de nature évolutive entre ces différents codes. Etablir ces relations et construire une phylogénie précise permet d’espérer une meilleure capacité d’analyse de nouveaux codes malveillants et de disposer d’une méthode de fait de nommage de ces codes. La concordance de permutations de code avec des parties de codes malveillants sont susceptibles d’être très intéressante dans l’établissement d’une phylogénie, dans la mesure où les étapes évolutives réalisées par les auteurs de codes malveillants ne conservent généralement pas l’ordre des instructions présentes dans le code commun. Nous décrivons ici une famille de générateurs phylogénétiques réalisant des regroupements à l’aide de caractéristiques extraites d’arbres PQ. Une expérience a été réalisée, dans laquelle l’arbre produit par ces générateurs est évalué d’une part en le comparant avec les classificiations de références utilisées par les antivirus par scannage, et d’autre part en le comparant aux phylogénies produites à l’aide de polygrammes de taille n (n-grammes), pondérés. Les résultats démontrent l’intérêt de l’approche utilisant les permutations dans la génération phylogénétique des codes malveillants.

Abstrakti

Haitalliset ohjelmat, kuten tietokonevirukset ja -madot, kirjoitetaan harvoin alusta alkaen. Tämän seurauksena niistä on löydettävissä evoluution kaltaista samankaltaisuutta. Samankaltaisuuksien löytämisellä sekä rakentamalla tarkka evoluutioon perustuva malli voidaan helpottaa uusien haitallisten ohjelmien analysointia sekä toteuttaa nimeämiskäytäntöjä. Permutaatioiden etsiminen koodista saattaa antaa etuja evoluutiomallin muodostamiseen, koska haitallisten ohjelmien kirjoittajien evolutionääriset askeleet eivät välttämättä säilytä jaksoittaisuutta ohjelmakoodissa. Kuvaamme joukon evoluutiomallin muodostajia, jotka toteuttavat klusterionnin käyttämällä PQ-puuhun perustuvia ominaisuuksia. Teimme myös kokeen, jossa puun tulosjoukkoa verrattiin virustentorjuntaohjelman muodostamaan viitejoukkoon sekä evoluutiomalleihin, jotka oli muodostettu painotetuilla n-grammeilla. Tulokset viittaavat siihen, että permutaatioon perustuvaa lähestymistapaa voidaan menestyksekkäästi käyttää evoluutiomallien muodostamineen.

Zusammenfassung

Maliziöse Programme, wie z.B. Viren und Würmer, werden nur in den seltensten Fällen komplett neu geschrieben; als Ergebnis können zwischen verschiedenen maliziösen Codes Abhängigkeiten gefunden werden.

Im Hinblick auf Klassifizierung und wissenschaftlichen Aufarbeitung neuer maliziöser Codes kann es sehr hilfreich erweisen, Abhängigkeiten zu bestehenden maliziösen Codes darzulegen und somit einen Stammbaum zu erstellen.

In dem Artikel wird u.a. auf moderne Ansätze innerhalb der Staumbaumgenerierung anhand ausgewählter Win32 Viren eingegangen.

Astratto

I programmi maligni, quali virus e worm, sono raramente scritti da zero; questo significa che vi sono delle relazioni di evoluzione tra di loro. Scoprire queste relazioni e costruire una filogenia accurata puo’aiutare sia nell’analisi di nuovi programmi di questo tipo, sia per stabilire una nomenclatura avente una base solida. Cercare permutazioni di codice tra vari programmi puo’ dare un vantaggio per la generazione delle filogenie, dal momento che i passaggi evolutivi implementati dagli autori possono non aver preservato la sequenzialita’ del codice originario. In questo articolo descriviamo una famiglia di generatori di filogenie che effettuano clustering usando feature basate su alberi PQ. In un esperimento l’albero di output dei generatori viene confrontato con una classificazione di rifetimento ottenuta da un programma anti-virus, e con delle filogenie generate usando n-grammi pesati. I risultati indicano i risultati positivi dell’approccio basato su permutazioni nella generazione delle filogenie del malware.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

  • Abou-Assaleh T., Cercone N., Kešelj V., Sweidan R. (2004). Detection of new malicious code using n-grams signatures. In: Second annual conference on privacy, security and trust. Fredericton, NB, Canada, pp 193–196

  • Arief B., Besnard D. (2003). Technical and human issues in computer-based systems security. Tech. Rep. CS-TR-790, School of Computing Science, University of Newcastle-upon-Tyme

  • Arnold W., Tesauro G. (2000). Automatically generated Win32 heuristic virus detection. In: Proceedings of the 2000 international virus bulletin conference

  • Baker BS. (1992). A program for identifying duplicated code. Comput Sci Stat 24:49–57

    Google Scholar 

  • Baker BS., Manber U. (1998). Deducing similarities in java sources from bytecodes. In: Proceedings of the USENIX annual technical conference (no 98)

  • Beszédes Á., Ferenc R., Gyimóthy T. (2003). Survey of code-size reduction methods. ACM Comput Surve 35:223–267

    Article  Google Scholar 

  • Bontchev V., Tocheva K. (2002). Macro and script virus polymorphism. In: Proceedings of the twelfth international virus bulletin conference, Virus Bulletin, Ltd., New Orleans, LA pp. 406–438

  • Bontchev V. (2004). Anti-virus spamming and the virus-naming mess: Part 2. Virus Bull pp. 13–15

  • Erdélyi G., Carrera E. (2004). Digital genome mapping: advanced binary malware analysis. In: Proceedings of 15th virus bulletin international conference (VB 2004),Chicago, IL, pp. 187–197

  • Goldberg LA., Goldberg PW., Phillips CA., Sorkin GB. (1998). Constructing computer virus phylogenies. J Algorithms 26:188–208

    Article  MATH  MathSciNet  Google Scholar 

  • Godfrey M., Tu Q. (2001) Growth, evolution, and structural change in open source software. In: Proceedings of the 4th international workshop on principles of software evolution, Vienna, Austria ACM Press, pp. 103–106

  • Gusfield D. (1997). Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge, UK

    Google Scholar 

  • Jordan M. (2002). Dealing with metamorphism. Virus Bulletin pp 4–6

  • Karypis G. (2003). CLUTO: A clustering toolkit, release 2.1.1, Tech. Rep.#02-017, Department of Computer Science, University of Minnesota,Minneapolis, MN 55455, November 2003

  • Kephart JO. (1994). A biologically inspired immune system for computers. In: Brooks RA., Maes P (eds), Artificial Life IV: Proceedings of the fourth international workshop on synthesis and simulation of living systems MIT Press, Cambridge, MA, pp 130–139

  • Kephart JO., Sorkin GB., Arnold WC., Chess DM., Tesauro GJ., White SR. (1995). Biologically inspired defenses against computer viruses. In: Proceedings of the 14th international joint conference on artificial intelligence (IJCAI’95), Morgan Kaufman, Montreal, PQ, pp 985–996

  • Kephart JO., Arnold WC. (1994). Automatic extraction of computer virus signatures. In: Ford R (ed.) Proceedings of the 4th Virus Bulletin International Conference Virus Bulletin Ltd., Abingdon, England, pp. 179–194

  • Kolter JZ., Maloof MA. (2004). Learning to detect malicious executables in the wild. In: Kim W, Kohavi R, Gehrke J, DuMouchel W, (eds.), Proceedings of the Tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, Seattle, WA, pp 470–478

  • Marko R. (2002). Heuristics: Retrospective and future. In: Proceedings of the twelfth international virus bulletin conference, Virus Bulletin, Ltd., New Orleans, LA pp. 107–124

  • National Center for Biotechnology Information (2004) Just the facts: A basic introduction to the science underlying NCBI resources, http://www.ncbi.nlm.nih.gov/ About/primer/phylot .html, Last retrieved 20 March, 2005

  • Oberhumer MFXJ., Molnár L (2005) The Ultimate Packer for eXecutables – homepage. http://upx.sourceforge.net, Last retrieved 20 March, 2005

  • Raiu C (2002) A virus by any other name: Virus naming practices. Security focus, http://www.securityfocus.com/infocus/1587, Last accessed March 5, 2005

  • Schultz MG, Eskin E, Zadok E, Stolfo SJ (2001) Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE symposium on security and privacy, Oakland, CA, IEEE Computer Society Press, pp 38–49

  • Ször P, Ferrie P (2001) Hunting for metamorphic. In: Proceedings of the 12th virus bulletin international conference pp 123–144

  • Tesauro G., Kephart JO., Sorkin GB. (1996). Neural networks for computer virus recognition. IEEE Expert 11(4):5–6

    Article  Google Scholar 

  • Tichy WF. (1984). The string-to-string correction problem with block moves. ACM Trans Comput Syst 2(4):309–321

    Article  MathSciNet  Google Scholar 

  • VX heavens (2005) Available from vx.netlux.org (and mirrors), Last retrieved 5 March

  • Wehner S (2005) Analyzing worms using compression. http://homepages.cwi.nl/∼wehner/worms/, Last accessed March 5,2005

  • Zobel J., Moffat A. (1998). Exploring the similarity space. SIGIR Forum 32(1):18–34

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md. Enamul. Karim.

Additional information

Version of this paper was published in the EICAR 2005 Conference: Best Paper Proceedings

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karim, M.E., Walenstein, A., Lakhotia, A. et al. Malware phylogeny generation using permutations of code. J Comput Virol 1, 13–23 (2005). https://doi.org/10.1007/s11416-005-0002-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-005-0002-9

Keywords

Navigation