skip to main content
10.1145/3576915.3616625acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

PackGenome: Automatically Generating Robust YARA Rules for Accurate Malware Packer Detection

Published: 21 November 2023 Publication History

Abstract

Binary packing, a widely-used program obfuscation style, compresses or encrypts the original program and then recovers it at runtime. Packed malware samples are pervasive---they conceal arresting code features as unintelligible data to evade detection. To rapidly respond to large-scale packed malware, security analysts search specific binary patterns to identify corresponding packers. The quality of such packer patterns or signatures is vital to malware dissection. However, existing packer signature rules severely rely on human analysts' experience. In addition to expensive manual efforts, these human-written rules (e.g., YARA) also suffer from high false positives: as they are designed to search the pattern of bytes rather than instructions, they are very likely to mismatch with unexpected instructions.
In this paper, we look into the weakness of existing packer detection signatures and propose a novel automatic YARA rule generation technique, called PackGenome. Inspired by the biological concept of species-specific genes, we observe that packer-specific genes can help determine whether a program is packed. Our framework generates new YARA rules from packer-specific genes, which are extracted from the unpacking routines reused in the same-packer protected programs. To reduce false positives, we propose a byte selection strategy to systematically evaluate the mismatch possibility of bytes. We compare PackGenome with public-available packer signature collections and a state-of-the-art automatic rule generation tool. Our large-scale experiments with more than 640K samples demonstrate that PackGenome can deliver robust YARA rules to detect Windows and Linux packers, including emerging low-entropy packers. PackGenome outperforms existing work in all cases with zero false negatives, low false positives, and a negligible scanning overhead increase.

References

[1]
Trivikram Muralidharan, Aviad Cohen, Noa Gerson, and Nir Nissim. 2022. File Packing from the Malware Perspective: Techniques, Analysis Approaches, and Directions for Enhancements. ACM Computing Surveys (CSUR), Vol. 55 (April 2022), 1--45.
[2]
Kevin A. Roundy and Barton P. Miller. 2013. Binary-Code Obfuscations in Prevalent Packer Tools. ACM Computing Surveys (CSUR), Vol. 46, 1 (2013), 1--32.
[3]
Miuyin Yong Wong, Matthew Landen, Manos Antonakakis, Douglas M. Blough, Elissa M. Redmiles, and Mustaque Ahamad. 2021. An Inside Look into the Practice of Malware Analysis. In Proceedings of the 28th ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 3053--3069.
[4]
Xabier Ugarte-Pedrero, Davide Balzarotti, Igor Santos, and Pablo G. Bringas. 2015. SoK: Deep Packer Inspection: A Longitudinal Study of the Complexity of Run-Time Packers. In Proceedings of the 36th IEEE Symposium on Security and Privacy (S&P). IEEE, 659--673.
[5]
Babak Rahbarinia, Marco Balduzzi, and Roberto Perdisci. 2017. Exploring the Long Tail of (Malicious) Software Downloads. In Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 391--402.
[6]
Hojjat Aghakhani, Fabio Gritti, Francesco Mecca, Martina Lindorfer, Stefano Ortolani, Davide Balzarotti, Giovanni Vigna, and Christopher Kruegel. 2020. When Malware is Packin' Heat; Limits of Machine Learning Classifiers Based on Static Analysis Features. In Proceedings of the 27th Network and Distributed System Security Symposium (NDSS). Internet Society.
[7]
Christian Wressnegger, Kevin Freeman, Fabian Yamaguchi, and Konrad Rieck. 2017. Automatically Inferring Malware Signatures for Anti-Virus Assisted Attacks. In Proceedings of the 12th ACM Asia Conference on Computer and Communications Security (ASIA CCS). ACM, 587--598.
[8]
Mario Polino, Andrea Continella, Sebastiano Mariani, Stefano D'Alessio, Lorenzo Fontana, Fabio Gritti, and Stefano Zanero. 2017. Measuring and Defeating Anti-Instrumentation-Equipped Malware. In Proceedings of the 14th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA). Springer Cham, 73--96.
[9]
Binlin Cheng, Jiang Ming, Erika A. Leal, Haotian Zhang, Jianming Fu, Guojun Peng, and Jean Yves Marion. 2021. Obfuscation-Resilient Executable Payload Extraction From Packed Malware. In Proceedings of the 30th USENIX Security Symposium (USENIX Security). USENIX Association, 3451--3468.
[10]
Erin Avllazagaj, Ziyun Zhu, Leyla Bilge, Davide Balzarotti, and Tudor Dumitras. 2021. When Malware Changed Its Mind: An Empirical Study of Variable Program Behaviors in the Real World. In Proceedings of the 30th USENIX Security Symposium (USENIX Security). USENIX Association, 3487--3504.
[11]
VirusTotal. VirusTotal - Stats. https://www.virustotal.com/gui/stats (accessed on 2022--12-09).
[12]
Victor Manuel Alvarez. YARA - The Pattern Matching Swiss Knife for Malware Researchers. https://virustotal.github.io/yara/ (accessed on 2022--12-09).
[13]
Horsicq. Detect-It-Easy. https://github.com/horsicq/Detect-It-Easy (accessed on 2022--12-07).
[14]
Emanuele Cozzi, Mariano Graziano, Yanick Fratantonio, and Davide Balzarotti. 2018. Understanding Linux Malware. In Proceedings of the 39th IEEE Symposium on Security and Privacy (S&P). IEEE, 161--175.
[15]
Robert Lyda and James Hamrock. 2007. Using Entropy Analysis to Find Encrypted and Packed Malware. IEEE Security and Privacy, Vol. 5, 2 (2007), 40--45.
[16]
Guhyeon Jeong, Euijin Choo, Joosuk Lee, Munkhbayar Bat-Erdene, and Heejo Lee. 2010. Generic Unpacking using Entropy Analysis. In Proceedings of the 5th International Conference on Malicious and Unwanted Software (MALWARE'10). IEEE, 114--121.
[17]
Munkhbayar Bat-Erdene, Taebeom Kim, Hyundo Park, and Heejo Lee. 2017. Packer Detection for Multi-Layer Executables Using Entropy Analysis. Entropy, Vol. 19, 3 (2017), 1--18.
[18]
Alessandro Mantovani, Simone Aonzo, Xabier Ugarte-Pedrero, Alessio Merlo, and Davide Balzarotti. 2020. Prevalence and Impact of Low-Entropy Packing Schemes in the Malware Ecosystem. In Proceedings of the 27th Network and Distributed System Security Symposium (NDSS). Internet Society.
[19]
Fabrizio Biondi, Michael A. Enescu, Thomas Given-Wilson, Axel Legay, Lamine Noureddine, and Vivek Verma. 2019. Effective, Efficient, and Robust Packing Detection and Classification. Computers & Security, Vol. 85 (2019), 436--451.
[20]
Fabian Kaczmarczyck, Bernhard Grill, Luca Invernizzi, Jennifer Pullman, Cecilia M. Procopiuc, David Tao, Borbala Benko, and Elie Bursztein. 2020. Spotlight: Malware Lead Generation at Scale. In Proceedings of the 36th Annual Computer Security Applications Conference (ACSAC). ACM, 17--27.
[21]
Erik Bergenholtz, Emiliano Casalicchio, Dragos Ilie, and Andrew Moss. 2020. Detection of Metamorphic Malware Packers Using Multilayered LSTM Networks. In Proceedings of the 22nd International Conference on Information and Communications Security (ICICS). Springer, Cham, 36--53.
[22]
Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S. Anderson, Bobby Filar, Charles Nicholas, and James Holt. 2020. Automatic Yara Rule Generation Using Biclustering. In Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security (AISec@CCS 2020). ACM, 71--82.
[23]
Xianwei Gao, Changzhen Hu, Chun Shan, and Weijie Han. 2022. MaliCage: A Packed Malware Family Classification Framework based on DNN and GAN. Journal of Information Security and Applications, Vol. 68 (2022), 2214--2126.
[24]
Aldeid. PEiD. https://www.aldeid.com/wiki/PEiD (accessed on 2022--12-09).
[25]
Evan Downing, Yisroel Mirsky, Kyuhong Park, and Wenke Lee. 2021. DeepReflect: Discovering Malicious Functionality through Binary Reconstruction. In Proceedings of the 30th USENIX Security Symposium (USENIX Security). USENIX Association, 3469--3486.
[26]
Kyuhong Park, Burak Sahin, Yongheng Chen, Jisheng Zhao, Evan Downing, Hong Hu, and Wenke Lee. 2021. Identifying Behavior Dispatchers for Malware Analysis. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security (ASIA CCS). ACM, 759--773.
[27]
Unipacker. Unpacking PE files using Unicorn Engine. https://github.com/uniPacker/uniPacker (accessed on 2022--12-09).
[28]
Daniel Votipka, Seth M. Rabin, Kristopher Micinski, Jeffrey S. Foster, and Michelle M. Mazurek. 2020. An Observational Investigation of Reverse Engineers' Processes. In Proceedings of the 29th USENIX Security Symposium (USENIX Security). USENIX Association, 1875--1892.
[29]
Oreans Technologies. Themida Overview. https://www.oreans.com/themida.php (accessed on 2022-12-09).
[30]
Fanglu Guo, Peter Ferrie, and Tzi-cker Chiueh. 2008. A Study of the Packer Problem and Its Solutions. Proceedings of the 11th Recent Advances in Intrusion Detection (RAID). Springer Berlin, Heidelberg, 98--115.
[31]
Dhondta. Awesome Executable Packing. https://github.com/dhondta/awesome-executable-packing (accessed on 2022--12-09).
[32]
Ange Albertini. Packers. https://corkami.blogspot.com/ (accessed on 2022-12-09).
[33]
Rufus Brown, Van Ta, Douglas Bienstock, Geoff Ackerman, and John Wolfram. Does This Look Infected? A Summary of APT41 Targeting U.S. State Governments. https://www.mandiant.com/resources/apt41-us-state-governments (accessed on 2022-12-09).
[34]
Cisco Talos Intelligence Group. New Research Paper: Prevalence and impact of low-entropy packing schemes in the malware ecosystem. https://blog.talosintelligence.com/2020/02/new-research-paper-prevalence-and.html (accessed on 2022--12-09).
[35]
Intel. Intel® 64 and IA-32 Architectures Software Developer Manuals. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html (accessed on 2022--12-09).
[36]
Ajit Varki and Tasha K. Altheide. 2005. Comparing the human and chimpanzee genomes: Searching for needles in a haystack. Genome Research, Vol. 15, 12 (2005), 1746--1758.
[37]
Yaniv David, Nimrod Partush, and Eran Yahav. 2016. Statistical Similarity of Binaries. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 266--280.
[38]
Binlin Cheng, Jiang Ming, Jianming Fu, Guojun Peng, Ting Chen, Xiaosong Zhang, and Jean-yves Marion. 2018. Towards Paving the Way for Large-Scale Windows Malware Analysis: Generic Binary Unpacking with Orders-of-Magnitude Performance Boost. In Proceedings of the 25th ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 395--411.
[39]
Arne Swinnen and Alaeddine Mesbahi. 2014. One Packer to Rule them All: Empirical Identification, Comparison and Circumvention of Current Antivirus Detection Techniques. In BlackHat USA. BlackHat, 1--55.
[40]
Erick Bauman, Zhiqiang Lin, and Kevin W. Hamlen. 2018. Superset Disassembly: Statically Rewriting x86 Binaries Without Heuristics. In Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS). Internet Society.
[41]
Tomislav Pericin. 2011. Reversing software compressions: Tale of dragons and men who slay them. In REcon 2011. REcon.
[42]
the MITRE Corporation. Obfuscated Files or Information: Software Packing. https://attack.mitre.org/techniques/T1027/002/ (accessed on 2022--12-09).
[43]
Thomas Barabosch. The malware analyst's guide to aPLib decompression. https://0xc0decafe.com/malware-analysts-guide-to-aplib-decompression (accessed on 2022-12-09).
[44]
Microsoft. PE Format. https://docs.microsoft.com/en-us/windows/win32/debug/pe-format (accessed on 2022-12-09).
[45]
Yara-rules. rules. https://github.com/Yara-Rules/rules (accessed on 2022--12-09).
[46]
Avast. retdec. https://github.com/avast/retdec/tree/master/support/yara_patterns/tools (accessed on 2022--12-09).
[47]
JusticeRage. Manalyze. https://github.com/JusticeRage/Manalyze (accessed on 2022--12-09).
[48]
Godaddy. yara-rules. https://github.com/godaddy/yara-rules/ (accessed on 2022--12-09).
[49]
AlienVault-OTX. OTX-Python-SDK. https://github.com/AlienVault-OTX/OTX-Python-SDK (accessed on 2022--12-09).
[50]
X64dbg. yarasigs. https://github.com/x64dbg/yarasigs (accessed on 2022--12-09).
[51]
Xen0ph0n. YaraGenerator. https://github.com/Xen0ph0n/YaraGenerator (accessed on 2022--12-09).
[52]
AlienVault-OTX. yabin. https://github.com/AlienVault-OTX/yabin (accessed on 2022--12-09).
[53]
Neo23x0. yarGen. https://github.com/Neo23x0/yarGen (accessed on 2022-12-09).
[54]
Shijia Li, Chunfu Jia, Pengda Qiu, Qiyuan Chen, Jiang Ming, and Debin Gao. 2022. Chosen-Instruction Attack Against Commercial Code Virtualization Obfuscators. In Proceedings of the 29th Network and Distributed System Security Symposium (NDSS). Internet Society.
[55]
Babak Yadegari and Saumya Debray. 2015. Symbolic Execution of Obfuscated Code. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 732--744.
[56]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation (PLDI). ACM Press, 190--200.
[57]
Xin Hu, Kang G Shin, Sandeep Bhatkar, and Kent Griffin. 2013. MutantX-S: Scalable Malware Clustering Based on Static Features. In Proceedings of the 2013 USENIX Annual Technical Conference (USENIX ATC). USENIX Association, 187--198.
[58]
A.S.L. EXEINFO PE. http://www.exeinfo.byethost18.com (accessed on 2022-12-09).
[59]
Vx-underground team. vx-underground. https://samples.vx-underground.org/ (accessed on 2022-12-09).
[60]
Cyber-research. APTMalware. https://github.com/cyber-research/APTMalware (accessed on 2022-12-09).
[61]
MalwareSamples. Linux-Malware-Samples. https://github.com/MalwareSamples/Linux-Malware-Samples (accessed on 2022-12-09).
[62]
Horsicq. Fix: 2022-06-02 · horsicq/Detect-It-Easy@c332fa4 · GitHub. https://github.com/horsicq/Detect-It-Easy/commit/c332fa452087bc0e6705c452e00331618a9da00e (accessed on 2022--12-09).
[63]
Michael Brengel and Christian Rossow. 2021. YARIX: Scalable YARA-based Malware Intelligence. Proceedings of the 30th USENIX Security Symposium (USENIX Security). USENIX Association, 3541--3558.
[64]
Guillaume Bonfante, Jose Fernandez, Jean-Yves Marion, Benjamin Rouxel, Fabrice Sabatier, and Aurélien Thierry. 2015. CoDisasm: Medium Scale Concatic Disassembly of Self-Modifying Binaries with Overlapping Instructions. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS'15). ACM, 745--756.
[65]
Binlin Cheng, Erika A. Leal, Haotian Zhang, and Jiang Ming. 2023. On the Feasibility of Malware Unpacking via Hardware-assisted Loop Profiling. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security). USENIX Association, 7481--7498.

Cited By

View all
  • (2025)ASDroid: Resisting Evolving Android Malware With API Clusters Derived From Source CodeIEEE Transactions on Information Forensics and Security10.1109/TIFS.2025.353628020(1822-1835)Online publication date: 2025
  • (2025)Practical clean-label backdoor attack against static malware detectionComputers & Security10.1016/j.cose.2024.104280150(104280)Online publication date: Mar-2025
  • (2024)Bifocal Agent: identificando automaticamente funções maliciosas para aumentar o foco do analista de malwareAnais do XXIV Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg 2024)10.5753/sbseg.2024.241689(60-75)Online publication date: 16-Sep-2024
  • Show More Cited By

Index Terms

  1. PackGenome: Automatically Generating Robust YARA Rules for Accurate Malware Packer Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security
    November 2023
    3722 pages
    ISBN:9798400700507
    DOI:10.1145/3576915
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 November 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. binary packing
    2. binary similarity
    3. malware analysis
    4. unpacking routines
    5. yara rules

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CCS '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

    Upcoming Conference

    CCS '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)469
    • Downloads (Last 6 weeks)31
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)ASDroid: Resisting Evolving Android Malware With API Clusters Derived From Source CodeIEEE Transactions on Information Forensics and Security10.1109/TIFS.2025.353628020(1822-1835)Online publication date: 2025
    • (2025)Practical clean-label backdoor attack against static malware detectionComputers & Security10.1016/j.cose.2024.104280150(104280)Online publication date: Mar-2025
    • (2024)Bifocal Agent: identificando automaticamente funções maliciosas para aumentar o foco do analista de malwareAnais do XXIV Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg 2024)10.5753/sbseg.2024.241689(60-75)Online publication date: 16-Sep-2024
    • (2024)Enhancing Malware Classification via Self-Similarity TechniquesIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.343337219(7232-7244)Online publication date: 25-Jul-2024
    • (2024)Automated Anti-malware Detection Rules Converter Based on SIMIOC2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD61410.2024.10580029(1770-1775)Online publication date: 8-May-2024
    • (2024)Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825735(2624-2634)Online publication date: 15-Dec-2024
    • (2024)Assessing LLMs in malicious code deobfuscation of real-world malware campaignsExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124912256:COnline publication date: 18-Nov-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media