Malware phylogeny generation using permutations of code

Karim, Md. Enamul.; Walenstein, Andrew; Lakhotia, Arun; Parida, Laxmi

doi:10.1007/s11416-005-0002-9

Malware phylogeny generation using permutations of code

Phylogenní generování malware pomocí permutací kódu

Génération phylogénique de codes malveillants par permutation de code

Haitallisten ohjelmien evoluutiomallien luominen käyttämällä ohjelmakoodin permutaatiota

Stammbaumgenerierung maliziöser Codes auf Basis von pattern-basierten Methoden

Generazione di filogenie di malware utilizzando permutazioni di codice

Original Paper
Published: 20 September 2005

Volume 1, pages 13–23, (2005)
Cite this article

Journal in Computer Virology Aims and scope Submit manuscript

Md. Enamul. Karim¹,
Andrew Walenstein¹,
Arun Lakhotia¹ &
…
Laxmi Parida²

614 Accesses
131 Citations
3 Altmetric
Explore all metrics

Abstract

Malicious programs, such as viruses and worms, are frequently related to previous programs through evolutionary relationships. Discovering those relationships and constructing a phylogeny model is expected to be helpful for analyzing new malware and for establishing a principled naming scheme. Matching permutations of code may help build better models in cases where malware evolution does not keep things in the same order. We describe methods for constructing phylogeny models that uses features called n-perms to match possibly permuted codes. An experiment was performed to compare the relative effectiveness of vector similarity measures using n-perms and n-grams when comparing permuted variants of programs. The similarity measures using n-perms maintained a greater separation between the similarity scores of permuted families of specimens versus unrelated specimens. A subsequent study using a tree generated through n-perms suggests that phylogeny models based on n-perms may help forensic analysts investigate new specimens, and assist in reconciling malware naming inconsistencies

Abstrakt

Škodlivé programy, jako viry a červy (malware), jsou zřídka psány narychlo, jen tak. Obvykle jsou výsledkem svých evolučních vztahů. Zjištěním těchto vztahů a tvorby v přesné fylogenezi se předpokládá užitečná pomoc v analýze nového malware a ve vytvoření zásad pojmenovacího schématu. Porovnávání permutací kódu uvnitř malware mů že nabídnout výhody pro fylogenní generování, protože evoluční kroky implementované autory malware nemohou uchovat posloupnosti ve sdíleném kódu. Popisujeme rodinu fylogenních generátorů, které provádějí clustering pomocí PQ stromově založených extrakčních vlastností. Byl vykonán experiment v němž výstup stromu z těchto generátorů byl vyhodnocen vzhledem k fylogenezím generovaným pomocí vážených n-gramů. Výsledky ukazují výhody přístupu založeného na permutacích ve fylogenním generování malware.

Résumé

Les codes malveillants, tels que les virus et les vers, sont rarement écrits de zéro; en conséquence, il existe des relations de nature évolutive entre ces différents codes. Etablir ces relations et construire une phylogénie précise permet d’espérer une meilleure capacité d’analyse de nouveaux codes malveillants et de disposer d’une méthode de fait de nommage de ces codes. La concordance de permutations de code avec des parties de codes malveillants sont susceptibles d’être très intéressante dans l’établissement d’une phylogénie, dans la mesure où les étapes évolutives réalisées par les auteurs de codes malveillants ne conservent généralement pas l’ordre des instructions présentes dans le code commun. Nous décrivons ici une famille de générateurs phylogénétiques réalisant des regroupements à l’aide de caractéristiques extraites d’arbres PQ. Une expérience a été réalisée, dans laquelle l’arbre produit par ces générateurs est évalué d’une part en le comparant avec les classificiations de références utilisées par les antivirus par scannage, et d’autre part en le comparant aux phylogénies produites à l’aide de polygrammes de taille n (n-grammes), pondérés. Les résultats démontrent l’intérêt de l’approche utilisant les permutations dans la génération phylogénétique des codes malveillants.

Abstrakti

Haitalliset ohjelmat, kuten tietokonevirukset ja -madot, kirjoitetaan harvoin alusta alkaen. Tämän seurauksena niistä on löydettävissä evoluution kaltaista samankaltaisuutta. Samankaltaisuuksien löytämisellä sekä rakentamalla tarkka evoluutioon perustuva malli voidaan helpottaa uusien haitallisten ohjelmien analysointia sekä toteuttaa nimeämiskäytäntöjä. Permutaatioiden etsiminen koodista saattaa antaa etuja evoluutiomallin muodostamiseen, koska haitallisten ohjelmien kirjoittajien evolutionääriset askeleet eivät välttämättä säilytä jaksoittaisuutta ohjelmakoodissa. Kuvaamme joukon evoluutiomallin muodostajia, jotka toteuttavat klusterionnin käyttämällä PQ-puuhun perustuvia ominaisuuksia. Teimme myös kokeen, jossa puun tulosjoukkoa verrattiin virustentorjuntaohjelman muodostamaan viitejoukkoon sekä evoluutiomalleihin, jotka oli muodostettu painotetuilla n-grammeilla. Tulokset viittaavat siihen, että permutaatioon perustuvaa lähestymistapaa voidaan menestyksekkäästi käyttää evoluutiomallien muodostamineen.

Zusammenfassung

Maliziöse Programme, wie z.B. Viren und Würmer, werden nur in den seltensten Fällen komplett neu geschrieben; als Ergebnis können zwischen verschiedenen maliziösen Codes Abhängigkeiten gefunden werden.

Im Hinblick auf Klassifizierung und wissenschaftlichen Aufarbeitung neuer maliziöser Codes kann es sehr hilfreich erweisen, Abhängigkeiten zu bestehenden maliziösen Codes darzulegen und somit einen Stammbaum zu erstellen.

In dem Artikel wird u.a. auf moderne Ansätze innerhalb der Staumbaumgenerierung anhand ausgewählter Win32 Viren eingegangen.

Astratto

I programmi maligni, quali virus e worm, sono raramente scritti da zero; questo significa che vi sono delle relazioni di evoluzione tra di loro. Scoprire queste relazioni e costruire una filogenia accurata puo’aiutare sia nell’analisi di nuovi programmi di questo tipo, sia per stabilire una nomenclatura avente una base solida. Cercare permutazioni di codice tra vari programmi puo’ dare un vantaggio per la generazione delle filogenie, dal momento che i passaggi evolutivi implementati dagli autori possono non aver preservato la sequenzialita’ del codice originario. In questo articolo descriviamo una famiglia di generatori di filogenie che effettuano clustering usando feature basate su alberi PQ. In un esperimento l’albero di output dei generatori viene confrontato con una classificazione di rifetimento ottenuta da un programma anti-virus, e con delle filogenie generate usando n-grammi pesati. I risultati indicano i risultati positivi dell’approccio basato su permutazioni nella generazione delle filogenie del malware.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Abou-Assaleh T., Cercone N., Kešelj V., Sweidan R. (2004). Detection of new malicious code using n-grams signatures. In: Second annual conference on privacy, security and trust. Fredericton, NB, Canada, pp 193–196
Arief B., Besnard D. (2003). Technical and human issues in computer-based systems security. Tech. Rep. CS-TR-790, School of Computing Science, University of Newcastle-upon-Tyme
Arnold W., Tesauro G. (2000). Automatically generated Win32 heuristic virus detection. In: Proceedings of the 2000 international virus bulletin conference
Baker BS. (1992). A program for identifying duplicated code. Comput Sci Stat 24:49–57
Google Scholar
Baker BS., Manber U. (1998). Deducing similarities in java sources from bytecodes. In: Proceedings of the USENIX annual technical conference (no 98)
Beszédes Á., Ferenc R., Gyimóthy T. (2003). Survey of code-size reduction methods. ACM Comput Surve 35:223–267
Article Google Scholar
Bontchev V., Tocheva K. (2002). Macro and script virus polymorphism. In: Proceedings of the twelfth international virus bulletin conference, Virus Bulletin, Ltd., New Orleans, LA pp. 406–438
Bontchev V. (2004). Anti-virus spamming and the virus-naming mess: Part 2. Virus Bull pp. 13–15
Erdélyi G., Carrera E. (2004). Digital genome mapping: advanced binary malware analysis. In: Proceedings of 15th virus bulletin international conference (VB 2004),Chicago, IL, pp. 187–197
Goldberg LA., Goldberg PW., Phillips CA., Sorkin GB. (1998). Constructing computer virus phylogenies. J Algorithms 26:188–208
Article MATH MathSciNet Google Scholar
Godfrey M., Tu Q. (2001) Growth, evolution, and structural change in open source software. In: Proceedings of the 4th international workshop on principles of software evolution, Vienna, Austria ACM Press, pp. 103–106
Gusfield D. (1997). Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge, UK
Google Scholar
Jordan M. (2002). Dealing with metamorphism. Virus Bulletin pp 4–6
Karypis G. (2003). CLUTO: A clustering toolkit, release 2.1.1, Tech. Rep.#02-017, Department of Computer Science, University of Minnesota,Minneapolis, MN 55455, November 2003
Kephart JO. (1994). A biologically inspired immune system for computers. In: Brooks RA., Maes P (eds), Artificial Life IV: Proceedings of the fourth international workshop on synthesis and simulation of living systems MIT Press, Cambridge, MA, pp 130–139
Kephart JO., Sorkin GB., Arnold WC., Chess DM., Tesauro GJ., White SR. (1995). Biologically inspired defenses against computer viruses. In: Proceedings of the 14th international joint conference on artificial intelligence (IJCAI’95), Morgan Kaufman, Montreal, PQ, pp 985–996
Kephart JO., Arnold WC. (1994). Automatic extraction of computer virus signatures. In: Ford R (ed.) Proceedings of the 4th Virus Bulletin International Conference Virus Bulletin Ltd., Abingdon, England, pp. 179–194
Kolter JZ., Maloof MA. (2004). Learning to detect malicious executables in the wild. In: Kim W, Kohavi R, Gehrke J, DuMouchel W, (eds.), Proceedings of the Tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, Seattle, WA, pp 470–478
Marko R. (2002). Heuristics: Retrospective and future. In: Proceedings of the twelfth international virus bulletin conference, Virus Bulletin, Ltd., New Orleans, LA pp. 107–124
National Center for Biotechnology Information (2004) Just the facts: A basic introduction to the science underlying NCBI resources, http://www.ncbi.nlm.nih.gov/ About/primer/phylot .html, Last retrieved 20 March, 2005
Oberhumer MFXJ., Molnár L (2005) The Ultimate Packer for eXecutables – homepage. http://upx.sourceforge.net, Last retrieved 20 March, 2005
Raiu C (2002) A virus by any other name: Virus naming practices. Security focus, http://www.securityfocus.com/infocus/1587, Last accessed March 5, 2005
Schultz MG, Eskin E, Zadok E, Stolfo SJ (2001) Data mining methods for detection of new malicious executables. In: Proceedings of the 2001 IEEE symposium on security and privacy, Oakland, CA, IEEE Computer Society Press, pp 38–49
Ször P, Ferrie P (2001) Hunting for metamorphic. In: Proceedings of the 12th virus bulletin international conference pp 123–144
Tesauro G., Kephart JO., Sorkin GB. (1996). Neural networks for computer virus recognition. IEEE Expert 11(4):5–6
Article Google Scholar
Tichy WF. (1984). The string-to-string correction problem with block moves. ACM Trans Comput Syst 2(4):309–321
Article MathSciNet Google Scholar
VX heavens (2005) Available from vx.netlux.org (and mirrors), Last retrieved 5 March
Wehner S (2005) Analyzing worms using compression. http://homepages.cwi.nl/∼wehner/worms/, Last accessed March 5,2005
Zobel J., Moffat A. (1998). Exploring the similarity space. SIGIR Forum 32(1):18–34
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, USA
Md. Enamul. Karim, Andrew Walenstein & Arun Lakhotia
IBM T. J., Watson Research Center, York town, USA
Laxmi Parida

Authors

Md. Enamul. Karim
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Walenstein
View author publications
You can also search for this author in PubMed Google Scholar
Arun Lakhotia
View author publications
You can also search for this author in PubMed Google Scholar
Laxmi Parida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md. Enamul. Karim.

Additional information

Version of this paper was published in the EICAR 2005 Conference: Best Paper Proceedings

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karim, M.E., Walenstein, A., Lakhotia, A. et al. Malware phylogeny generation using permutations of code. J Comput Virol 1, 13–23 (2005). https://doi.org/10.1007/s11416-005-0002-9

Download citation

Received: 12 December 2004
Accepted: 27 February 2005
Published: 20 September 2005
Issue Date: November 2005
DOI: https://doi.org/10.1007/s11416-005-0002-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Malware phylogeny generation using permutations of code

Abstract

Abstrakt

Résumé

Abstrakti

Zusammenfassung

Astratto

Access this article

Similar content being viewed by others

A Fast Algorithm for Constructing Phylogenetic Trees with Application to IoT Malware Clustering

Identifying Shared Software Components to Support Malware Forensics

A Static, Packer-Agnostic Filter to Detect Similar Malware Samples

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Malware phylogeny generation using permutations of code

Abstract

Abstrakt

Résumé

Abstrakti

Zusammenfassung

Astratto

Access this article

Similar content being viewed by others

A Fast Algorithm for Constructing Phylogenetic Trees with Application to IoT Malware Clustering

Identifying Shared Software Components to Support Malware Forensics

A Static, Packer-Agnostic Filter to Detect Similar Malware Samples

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation