research-article

Automatically Generate Malware Detection Rules By Extracting Risk Information

Authors:

Jie LiAuthors Info & Claims

CNIOT '24: Proceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things

Pages 595 - 599

https://doi.org/10.1145/3670105.3670209

Published: 29 July 2024 Publication History

Abstract

With the continuous growth of packages in the open-source software ecosystem, there has been an explosive increase in the number of malware. Therefore, how to quickly and accurately identify malware is a concern in the security field, and there is relatively less work focused on generating detection rules directly from the source code of malware. In this paper, we propose a method for automatically generating YARA rules based on risk information (RICYara), which can cover malicious behaviors as much as possible while automatically generating rules to ensure their effectiveness. Named entity recognition is used to mine risk information, and information grouping is performed based on the content of the source code. Three rules are designed for selecting risk group. Then, the feature extraction algorithm uses heuristic rules and Isolation Forests to create black-and-white list and then generate rules. Experimental results show that the scores of rules generated by this method are significantly higher than those of other methods, the efficiency of rule effectiveness is exponentially higher compared to other methods, and it is also more accurate in practical detection.

References

[1]

Zack Allen and other contributors. 2022. A CLI tool that allows to identify malicious PyPI and npm packages. https://github.com/DataDog/guarddog

[2]

Victor M. Alvarez. 2013. The pattern matching swiss knife for malware researchers (and everyone else). https://virustotal.github.io/yara/

[3]

Gabriele Bavota, Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, and Sebastiano Panichella. 2013. The Evolution of Project Inter-dependencies in a Software Ecosystem: The Case of Apache. In 2013 IEEE International Conference on Software Maintenance. 280–289. https://doi.org/10.1109/ICSM.2013.39

Digital Library

[4]

Michael Brengel and Christian Rossow. 2021. YARIX: Scalable YARA-based Malware Intelligence. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 3541–3558. https://www.usenix.org/conference/usenixsecurity21/presentation/brengel

[5]

C. Clark. 2013. yaraGenerator: YARA rule generation. https://github.com/Xen0ph0n/YaraGenerator

[6]

Min Du, Wenjun Hu, and William Hewlett. 2021. AutoCombo: Automatic Malware Signature Generation Through Combination Rule Mining. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM ’21). Association for Computing Machinery, New York, NY, USA, 3777–3786. https://doi.org/10.1145/3459637.3481896

Digital Library

[7]

J. Foundation and other contributors. 2018. Postmortem for malicious packages. https://eslint.org/blog/2018/07/postmortem- for- malicious- package-publishes

[8]

Sungjin Kim, Jinkook Kim, Seokwoo Nam, and Dohoon Kim. 2018. WebMon: ML- and YARA-based malicious webpage detection. Computer Networks 137 (2018), 119–131. https://doi.org/10.1016/j.comnet.2018.03.006

Digital Library

[9]

Yisroel Mirsky, George Macon, Michael Brown, Carter Yagemann, Matthew Pruett, Evan Downing, Sukarno Mertoguno, and Wenke Lee. 2023. { VulChecker} : Graph-based Vulnerability Localization in Source Code. In 32nd USENIX Security Symposium (USENIX Security 23). 6557–6574.

[10]

Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. 2020. Backstabber’s knife collection: A review of open source software supply chain attacks. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 23–43.

Digital Library

[11]

Edward Raff, William Fleming, Richard Zak, Hyrum Anderson, Bill Finlayson, Charles Nicholas, and Mark Mclean. 2019. Kilograms: Very large n-grams for malware classification. arXiv preprint arXiv:1908.00200 (2019).

[12]

Edward Raff and Mark McLean. 2018. Hash-Grams On Many-Cores and Skewed Distributions. In 2018 IEEE International Conference on Big Data (Big Data). 158–165. https://doi.org/10.1109/BigData.2018.8622043

[13]

Edward Raff and Charles Nicholas. 2018. Hash-Grams: Faster N-Gram Features for Classification and Malware Detection. In Proceedings of the ACM Symposium on Document Engineering 2018 (Halifax, NS, Canada) (DocEng ’18). Association for Computing Machinery, New York, NY, USA, Article 22, 4 pages. https://doi.org/10.1145/3209280.3229085

Digital Library

[14]

Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S. Anderson, Bobby Filar, Charles Nicholas, and James Holt. 2020. Automatic Yara Rule Generation Using Biclustering(AISec’20). Association for Computing Machinery, New York, NY, USA, 71–82. https://doi.org/10.1145/3411508.3421372

Digital Library

[15]

JFlorian Roth. 2013. YarGen. https://github.com/Neo23x0/yarGen

[16]

Qin Si, Hui Xu, Ying Tong, Yu Zhou, Jian Liang, Lei Cui, and Zhiyu Hao. 2022. Malware detection using automated generation of yara rules on dynamic features. In International Conference on Science of Cyber Security. Springer, 315–330.

Digital Library

[17]

Return to Corporation. 2018. Manage SAST, secrets, and supply chain security in a single platform.https://semgrep.dev/

[18]

Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: an efficient data clustering method for very large databases. SIGMOD Rec. 25, 2 (jun 1996), 103–114. https://doi.org/10.1145/235968.233324

Digital Library

Cited By

Gupta SLu FBarlow ARaff EFerraro FMatuszek CNicholas CHolt J(2024)Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825735(2624-2634)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825735

Index Terms

Automatically Generate Malware Detection Rules By Extracting Risk Information
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation
  2. Software and application security
    1. Software security engineering

Recommendations

Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISec

A popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...
Malware Detection Method Focusing on Anti-debugging Functions
CANDAR '14: Proceedings of the 2014 Second International Symposium on Computing and Networking

Malware has received much attention in recent years. Antivirus software is widely used as a countermeasure against malware. However, some kinds of malware can evade detection by antivirus software, hence, a new detection method is required. In this ...
Malware Detection in Adversarial Settings: Exploiting Feature Evolutions and Confusions in Android Apps
ACSAC '17: Proceedings of the 33rd Annual Computer Security Applications Conference

Existing techniques on adversarial malware generation employ feature mutations based on feature vectors extracted from malware. However, most (if not all) of these techniques suffer from a common limitation: feasibility of these attacks is unknown. The ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CNIOT '24: Proceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things

May 2024

668 pages

ISBN:9798400716751

DOI:10.1145/3670105

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CNIOT 2024

CNIOT 2024: 2024 5th International Conference on Computing, Networks and Internet of Things

May 24 - 26, 2024

Tokyo, Japan

Acceptance Rates

Overall Acceptance Rate 39 of 82 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
29
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)10

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gupta SLu FBarlow ARaff EFerraro FMatuszek CNicholas CHolt J(2024)Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825735(2624-2634)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825735

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents