skip to main content
10.1145/3670105.3670209acmotherconferencesArticle/Chapter ViewAbstractPublication PagescniotConference Proceedingsconference-collections
research-article

Automatically Generate Malware Detection Rules By Extracting Risk Information

Published: 29 July 2024 Publication History

Abstract

With the continuous growth of packages in the open-source software ecosystem, there has been an explosive increase in the number of malware. Therefore, how to quickly and accurately identify malware is a concern in the security field, and there is relatively less work focused on generating detection rules directly from the source code of malware. In this paper, we propose a method for automatically generating YARA rules based on risk information (RICYara), which can cover malicious behaviors as much as possible while automatically generating rules to ensure their effectiveness. Named entity recognition is used to mine risk information, and information grouping is performed based on the content of the source code. Three rules are designed for selecting risk group. Then, the feature extraction algorithm uses heuristic rules and Isolation Forests to create black-and-white list and then generate rules. Experimental results show that the scores of rules generated by this method are significantly higher than those of other methods, the efficiency of rule effectiveness is exponentially higher compared to other methods, and it is also more accurate in practical detection.

References

[1]
Zack Allen and other contributors. 2022. A CLI tool that allows to identify malicious PyPI and npm packages. https://github.com/DataDog/guarddog
[2]
Victor M. Alvarez. 2013. The pattern matching swiss knife for malware researchers (and everyone else). https://virustotal.github.io/yara/
[3]
Gabriele Bavota, Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, and Sebastiano Panichella. 2013. The Evolution of Project Inter-dependencies in a Software Ecosystem: The Case of Apache. In 2013 IEEE International Conference on Software Maintenance. 280–289. https://doi.org/10.1109/ICSM.2013.39
[4]
Michael Brengel and Christian Rossow. 2021. YARIX: Scalable YARA-based Malware Intelligence. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 3541–3558. https://www.usenix.org/conference/usenixsecurity21/presentation/brengel
[5]
C. Clark. 2013. yaraGenerator: YARA rule generation. https://github.com/Xen0ph0n/YaraGenerator
[6]
Min Du, Wenjun Hu, and William Hewlett. 2021. AutoCombo: Automatic Malware Signature Generation Through Combination Rule Mining. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM ’21). Association for Computing Machinery, New York, NY, USA, 3777–3786. https://doi.org/10.1145/3459637.3481896
[7]
J. Foundation and other contributors. 2018. Postmortem for malicious packages. https://eslint.org/blog/2018/07/postmortem- for- malicious- package-publishes
[8]
Sungjin Kim, Jinkook Kim, Seokwoo Nam, and Dohoon Kim. 2018. WebMon: ML- and YARA-based malicious webpage detection. Computer Networks 137 (2018), 119–131. https://doi.org/10.1016/j.comnet.2018.03.006
[9]
Yisroel Mirsky, George Macon, Michael Brown, Carter Yagemann, Matthew Pruett, Evan Downing, Sukarno Mertoguno, and Wenke Lee. 2023. { VulChecker} : Graph-based Vulnerability Localization in Source Code. In 32nd USENIX Security Symposium (USENIX Security 23). 6557–6574.
[10]
Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. 2020. Backstabber’s knife collection: A review of open source software supply chain attacks. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 23–43.
[11]
Edward Raff, William Fleming, Richard Zak, Hyrum Anderson, Bill Finlayson, Charles Nicholas, and Mark Mclean. 2019. Kilograms: Very large n-grams for malware classification. arXiv preprint arXiv:1908.00200 (2019).
[12]
Edward Raff and Mark McLean. 2018. Hash-Grams On Many-Cores and Skewed Distributions. In 2018 IEEE International Conference on Big Data (Big Data). 158–165. https://doi.org/10.1109/BigData.2018.8622043
[13]
Edward Raff and Charles Nicholas. 2018. Hash-Grams: Faster N-Gram Features for Classification and Malware Detection. In Proceedings of the ACM Symposium on Document Engineering 2018 (Halifax, NS, Canada) (DocEng ’18). Association for Computing Machinery, New York, NY, USA, Article 22, 4 pages. https://doi.org/10.1145/3209280.3229085
[14]
Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S. Anderson, Bobby Filar, Charles Nicholas, and James Holt. 2020. Automatic Yara Rule Generation Using Biclustering(AISec’20). Association for Computing Machinery, New York, NY, USA, 71–82. https://doi.org/10.1145/3411508.3421372
[15]
JFlorian Roth. 2013. YarGen. https://github.com/Neo23x0/yarGen
[16]
Qin Si, Hui Xu, Ying Tong, Yu Zhou, Jian Liang, Lei Cui, and Zhiyu Hao. 2022. Malware detection using automated generation of yara rules on dynamic features. In International Conference on Science of Cyber Security. Springer, 315–330.
[17]
Return to Corporation. 2018. Manage SAST, secrets, and supply chain security in a single platform.https://semgrep.dev/
[18]
Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: an efficient data clustering method for very large databases. SIGMOD Rec. 25, 2 (jun 1996), 103–114. https://doi.org/10.1145/235968.233324

Cited By

View all
  • (2024)Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825735(2624-2634)Online publication date: 15-Dec-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CNIOT '24: Proceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things
May 2024
668 pages
ISBN:9798400716751
DOI:10.1145/3670105
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 July 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Automatic rule generation
  2. Malware detection
  3. Risk information extraction
  4. Yara rule

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CNIOT 2024

Acceptance Rates

Overall Acceptance Rate 39 of 82 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)10
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825735(2624-2634)Online publication date: 15-Dec-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media