Ontology-based knowledge representation for malware individuals and families

doi:10.1016/j.cose.2019.101574

Computers & Security

Volume 87, November 2019, 101574

https://doi.org/10.1016/j.cose.2019.101574 Get rights and content

Abstract

Malware consists of a large numbers of malware families and individuals, and each individual has complex behaviors. So knowledge base is urgently needed to process and store such a huge amount of information. In present the traditional signature-based database cannot represent the behavioral semantics of malicious code. Therefore, people cannot know what malware will do on a computer system. To solve this issue, we apply ontology technique into the malware domain, and propose the method for constructing malware knowledge base. We design the concept classes and object properties of malware, and propose the method for representing semantics of malware behavior. The data mining method, Apriori algorithm, is applied to extract the common behaviors of individuals belonging to the same family, and common behaviors are used to represent the knowledge of a malware family. The experimental results show that the data mining method can discover the common behaviors of the malware family, and the common behaviors mined can effectively classify the malware families.

Introduction

At present, more and more devices are connected to Internet, such as smartphones and household appliances. With the increase of Internet connected devices, users are more likely to become targets of network attack. One of the major threats facing computer systems and their users today is malicious code (malware). Malware has complex behaviors, and can use different technologies to attack computer systems. Usually it can bypass the security mechanism, and install itself on the target host, and establish remote access.

People mainly use anti-virus software to detect malware. Most of these anti-virus tools use signature-based methods to detect malware (Filiol, 2006). A signature is a short string of bytes which is unique for each known malware. The representation of malware signature is simple, so it is very easy to build a large signature database for detecting malware. However, signatures have no semantics, and we cannot know what malware has done on a computer system. In addition, signature of malware can be easily modified. Malware writers can use obfuscation techniques to change signature, which can produce a large number of variants of a malware family. This results the signature-based methods fail to detect variants of known or previously unknown malware.

To solve this issue, behavior based detection methods are proposed. Malware behavior has different representations, for example, API sequences and opcode sequences all can be used to describe malware behavior. However, API sequences and opcode sequences cannot accurately represent malware behavior. People also cannot know what malware has done on the target machine. To obtain the exact behaviors of malware, it is necessary to run malware in the virtual environment and monitor the behavior of malware continuously. Malware is a super complex group. It contains many families, and each family is composed of a large number of individuals, and each individual shows complex behaviors. One problem we need to solve is how to represent and store such huge amount of information so that the machine can understand and process it automatically.

In this paper we study the representation of malware behavior and the construction method of the malware knowledge base. The purpose of this study is to develop an integrated knowledge base to represent and store behavior knowledge about malware individuals and families and help people analyze and detect malware. We introduce the ontology technology into the malware detection domain, and utilize ontology to describe malware behavior and construct the knowledge framework of malware, and use ontology reasoning technology to identify families of unknown malware.

Section snippets

Related works

In this paper, we use ontology technique to build malware knowledge base. So we only shows some research work related with ontology. Ontology can objectively describe things in the real world, so it is often used to describe the knowledge of a domain. Ontology has been applied in many fields such as knowledge engineering, artificial intelligence, web semantics and so on. In the field of information security, ontology has been used to describe the knowledge of malware.

Tafazzoli and Sadjadi (2008)

Ontology concepts for malware

Ontology is the science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality (Barry and Christopher, 2001). The ontology framework provides a consistent conceptual description of domain knowledge, which can share knowledge, facilitate knowledge reuse and reduce repetitive descriptions.

Typically, ontology consists of classes, properties, relationships between classes and individuals. They describe important concepts (classes of

Conclusions

In this paper we introduce ontology techniques into the malware domain, and propose the method for constructing malware knowledge base. Our contributions are as follows:

We design the concept classes and object properties of malware, and propose the method for representing malware behaviors.

We propose the methods for constructing the knowledge of malware individuals and malware families. The knowledge of malware individuals is represented as their behaviors, and the knowledge of malware family

Declaration of Competing Interest

No conflict of interest.

Acknowledgements

This work was partially supported by Scientific Research Foundation in Shenzhen (Grant No. JCYJ20180306172156841, JCYJ20180507183608379), Guangdong Natural Science Foundation (Grant No. 2016A030313664), the National Natural Science Foundation of China (Grant No. 61872107) and the National Key R&D Program of China (Grant No. 2018YFB1003800, 2018YFB1003805) .

Yuxin Ding received the Ph.D. degree in computer science from Institute of Software, Chinese Academy of Sciences, in 1999. He is currently an Associate Professor in the Department of Computer Science at the Harbin Institute of Technology Shenzhen Graduate School. His current research interests are primarily in computer security and machine learning.

References (25)

Y. Ding et al.
A malware detection method based on family behavior graph
Comput. Secur.
(March 2018)
S. Hansman et al.
A taxonomy of network and computer attacks
Comput. Secur.
(2005)
L.C. Navarro et al.
Leveraging ontologies and machine-learning techniques for malware analysis into android permissions ecosystems
Comput. Secur.
(2018)
W. Wang et al.
Detecting android malicious apps and categorizing benign apps with ensemble of classifiers
Future Generation Computer Systems
(2018)
F. Abdoli et al.
An attacks ontology for computer and networks attack
Innovations and Advances in Computer Sciences and Engineering
(2010)
R. Agrawal et al.
Fast algorithms for mining association rules
S. Barry et al.
Ontology-towards a new synthesis
H.S. Chiang et al.
Mobile Malware Behavioral Analysis and Preventive Strategy Using Ontology
E Filiol
Malware pattern scanning schemes secure against blackbox analysis
J. Comput. Virol.
(2006)
H.D. Huang et al.
IT2FS-based ontology with soft-computing mechanism for malware behavior analysis
Soft Comput.
(2014)

B. Jasiul et al.

Identification of malware activities with rules

Malware L. (2019). Malicious code samples available from: http://malware.lu (Accessed 20 May...

Cited by (23)

An intelligent recommendation method based on multi-interest network and adversarial deep learning
2023, Computers and Security
Recommender systems have shown to popular in many Internet communities, as they could help users discover interesting items based on their history behaviors. However, with the explosive growth of data-intensive tasks and online information, cybersecurity risks become larger, conventional collaborative recommendation algorithms may not meet users’ security requirements. Besides, the sparsity issue and the cold-start issue also hinder the performance of conventional recommendation methods. Recently, deep learning has shown to outperform traditional modeling techniques, which can be employed in Recommender systems (RSs) to improve user behavior prediction. In light of these challenges and observations, an intelligent recommendation method based on multi-interest network and adversarial deep learning is proposed, where multi-source behavior information is applied for multi-view embedding extraction for better prediction performance. Specifically, multi-view preference embeddings, including self-embedding, interaction-aware embedding, and neighbor-based embedding, are combined to model users’ interests at a finer granularity. Besides, in neighbor-based embedding learning, an adversarial search scheme is adopted for fast similarity searching and privacy preservation. Finally, a DNN-based prediction mechanism is adopted for embedding aggregation and final prediction. Extensive experiments on real-world datasets show that our proposal achieves decent prediction performance with security concerns compared with state-of the-art baselines.
RecMaL: Rectify the malware family label via hybrid analysis
2023, Computers and Security
Intelligent applications can be significantly impacted by incorrectly categorized data. Recently, artificial intelligence technology has been deployed in an increasing number of security-related scenarios, but the issue of data mislabeling has received little attention. We concentrate on the problem of malware mislabeling in this paper. Unfortunately, in the security field, the mislabeling issue of malware is not taken seriously. Existing work attempts to aggregate the AV labels to alleviate malware mislabeling. This will mislead the security analyst and pass the error to subsequent data-driven applications. Therefore, we conduct an in-depth analysis to explore the severity of the malware mislabel issue, and try to rectify the description of malware generated from anti-virus engines. We first propose a malware label correction tool called RecMaL. It employs hybrid analyses for malware label rectifying.
According to the thorough exploratory analysis, we figure out the core reasons for mislabeling issues and summarize them into 3 types. To verify the effectiveness and how RecMaL benefits the downstream applications (e.g., malware classification), we evaluate RecMaL through a series of experiments and show that the main components of RecMaL improve the performance, which proves our method effectively alleviates the mislabeling issue.
An ontology-driven framework for knowledge representation of digital extortion attacks
2023, Computers in Human Behavior
Citation Excerpt :
The researchers applied the Stanford named entity recognizer (NER) to extract cybersecurity-related entities. Ding et al. (Ding et al., 2019) also conducted a study on the use of ontologies for knowledge representation of malware and their families. However, the prototype model developed by them included only a limited number of malware-related classes.
With the COVID-19 pandemic and the growing influence of the Internet in critical sectors of industry and society, cyberattacks have not only not declined, but have risen sharply. In the meantime, ransomware is at the forefront of the most devastating threats that have launched the lucrative illegal business. Due to the proliferation and variety of ransomware forays, there is a need for a new theory of categories. The intricacy and multiplicity of components involved in digital extortions entails the construction of a knowledge representation system that is able to organize large volumes of information from heterogeneous sources in a formal structured format and infer new knowledge from it. This paper suggests and develops a dedicated ontology of digital blackmails, called Rantology, with a particular focus on ransomware assaults. The logic coded in this ontology allows to assess the maliciousness of programs based on various factors, including called API functions and their behaviors. The proposed framework can be used to facilitate interoperability between cybersecurity experts and knowledge-based systems, and identify sensitive points for surveillance. The evaluation results based on several criteria confirm the adequacy of the suggested ontology in terms of clarity, modularity, consistency, coverage and inheritance richness.
MaliCage: A packed malware family classification framework based on DNN and GAN
2022, Journal of Information Security and Applications
Citation Excerpt :
Tang et al. [41] used dynamic analysis to extract an API call and generated feature images representing malware behavior according to color mapping rules. An Apriori algorithm was applied to extract the common behaviors of individuals belonging to the same family, and to represent the knowledge of a malware family [16]. Mirza et al. [30] proposed a combination of machine learning techniques applied to a rich set of features extracted from a large dataset of benign and malicious files.
To evade security detection, hackers always add a deceptive packer outside of the original malicious codes. The coexistence of original unpacked samples and packed samples of same family needs special attention in malware detection. The features of packed malware are changed by the packer, which would disturb the prediction results of malware classifier. The state-of-the-art studies of malware detection mainly focus on whether the malware is packed, or which type of packer is used. However, the ability of detecting the family of packed malware is still insufficient. Motivated by the above challenges, a novel packed malware family classification framework called MaliCage is proposed. The goal of the framework is to classify packed malware accurately. MaliCage consists of three core modules: packer detector, malware classifier, and a packer generative adversarial network (GAN). The packer detector is used as the pre-step of the framework to identify whether malware is packed. After distinguishing the packed samples, the dynamic features extracted from the sandbox are fitted to the malware classifier based on deep neural networks (DNN). The malware classifier can classify unpacked and packed malware simultaneously. Furthermore, the packer GAN generates fake packed samples to alleviate the underfitting of the malware classifiers. We built a single-packer dataset and a multi-packer dataset to evaluate the framework. In the single-packer experiment, 10 classes of malware samples packed by UPX were examined objectively. The accuracy of the malware classifier when using only real packed samples was 91.66%. After introducing fake packed samples generated by packer GAN, the accuracy of the packed malware classifier could reach 97.8%. In the multi-packer scenario, our method can also accurately classify benign programs, unpacked malware and malware packed by several common packers. The validation results show that MaliCage can not only effectively solve the impacts of packed malware on machine learning model, but also improve the detection accuracy.
APTMalInsight: Identify and cognize APT malware based on system call information and ontology knowledge framework
2021, Information Sciences
Citation Excerpt :
Generally, the ontology model is composed of classes, attributes, and relationships between classes and individuals [13]. The malware ontology model is a knowledge framework in the domain of malware, including concepts related to malware behaviors, malware categories and individuals, and computer system components, which could be leveraged to realize malware knowledge reasoning [9]. Based on the principle of ontology model, the design of the APT malware ontology model is shown in Fig. 4.
APT attacks have posed serious threats to the security of cyberspace nowadays which are usually tailored for specific targets. Identification and understanding of APT attacks remains a key issue for society. Attackers often utilize malware as the weapons to launch cyber-attacks. For this reason, detecting APT malware and gaining an insight of its malicious behaviors can strengthen the power to understand and counteract APT attacks. Based on the above motivation, this paper proposes a novel APT malware detection and cognition framework named APTMalInsight aiming at identifying and cognizing APT malware by leveraging system call information and ontology knowledge. We systematically study APT malware and extracts dynamic system call information to describe its behavioral characteristics. With respect to the established feature vectors, the APT malware can be detected and clustered into their belonging families accurately. Furthermore, a horizontal comparison between APT malware and the traditional malware is conducted from the perspective of behavior types, to understand the behavioral characteristics of APT malware in depth. On the above basis, the ontology model is introduced to construct the APT malware knowledge framework to represent its typical malicious behaviors, thereby implementing the systematic cognition of APT malware and providing contextual understanding of APT attacks. The evaluation results based on real APT malware samples demonstrate that the detection and clustering accuracy can reach up to 99.28% and 98.85% respectively. In addition, APTMalInsight supplies an effective cognition framework for APT malware and enhances the capability to understand APT attacks.
Attacks on the Industrial Internet of Things – Development of a multi-layer Taxonomy
2020, Computers and Security
Citation Excerpt :
By contrast, well-elaborated classification schemes enable users to consider various characteristics of attacks. Existing ontologies elaborate on attack families such as malware (Ding et al., 2019) or attack detection in web applications (Razzaq et al., 2014). In the context of IT security in conventional IT systems, taxonomies address a variety of topics such as vulnerabilities and security gaps (Landwehr et al., 1994), attacks on IT systems (Howard and Longstaff, 1998), and network- and cyber-attacks (Hansmann and Hunt, 2005; Simmons et al., 2014).
The Industrial Internet of Things (IIoT) provides new opportunities to improve process and production efficiency, which enable new business models. At the same time, the high degree of cross-linking and decentralization increases the complexity of IIoT systems and creates new vulnerabilities. Hence, organizations are not only vulnerable to conventional IT threats, but also to a multitude of new, IIoT-specific attacks. Yet, a literature-based and empirically evaluated understanding of attacks on the IIoT is still lacking. Against this backdrop, we develop a multi-layer taxonomy that helps researchers and practitioners to identify similarities and differences between attacks on the IIoT. Based on the latest literature and a sample of about 50 attacks, we deductively and inductively determine attack characteristics and dimensions. We demonstrate the usefulness and practical relevance of our taxonomy by applying it to a real-world incident affecting a German steel facility. By combining IT security, IIoT, and risk management to form an interdisciplinary approach, we contribute to the descriptive knowledge in these fields. Industry experts confirm that our taxonomy enables a detailed classification of attacks, which supports the identification, documentation, and communication of incidents within organizations and their value-creation networks. With this, our taxonomy provides a profound basis for the further development of IT security management and the derivation of mitigation measures.

View all citing articles on Scopus

Rui Wu received her B.S. degrees in Computer Sciences from the Jiangxi University in 2017. She is currently a master student in the Department of Computer Science at the Harbin institute of technology Shenzhen Graduate School. Her current research interests are in machine learning and computer security.

Xiao Zhang received his B.S. degrees in Computer Sciences from the Qingdao University in 2018. He is currently a master student in the Department of Computer Science at the Harbin institute of technology Shenzhen Graduate School. His current research interests are in machine learning and computer security.

View full text

Ontology-based knowledge representation for malware individuals and families

Abstract

Introduction

Section snippets

Related works

Ontology concepts for malware

Conclusions

Declaration of Competing Interest

Acknowledgements

Comput. Secur.

Comput. Secur.

Comput. Secur.

Future Generation Computer Systems

An attacks ontology for computer and networks attack

Innovations and Advances in Computer Sciences and Engineering

Fast algorithms for mining association rules

Ontology-towards a new synthesis

Mobile Malware Behavioral Analysis and Preventive Strategy Using Ontology

Malware pattern scanning schemes secure against blackbox analysis

J. Comput. Virol.

IT2FS-based ontology with soft-computing mechanism for malware behavior analysis

Soft Comput.

Identification of malware activities with rules