Elsevier

Computers & Security

Volume 87, November 2019, 101574
Computers & Security

Ontology-based knowledge representation for malware individuals and families

https://doi.org/10.1016/j.cose.2019.101574Get rights and content

Abstract

Malware consists of a large numbers of malware families and individuals, and each individual has complex behaviors. So knowledge base is urgently needed to process and store such a huge amount of information. In present the traditional signature-based database cannot represent the behavioral semantics of malicious code. Therefore, people cannot know what malware will do on a computer system. To solve this issue, we apply ontology technique into the malware domain, and propose the method for constructing malware knowledge base. We design the concept classes and object properties of malware, and propose the method for representing semantics of malware behavior. The data mining method, Apriori algorithm, is applied to extract the common behaviors of individuals belonging to the same family, and common behaviors are used to represent the knowledge of a malware family. The experimental results show that the data mining method can discover the common behaviors of the malware family, and the common behaviors mined can effectively classify the malware families.

Introduction

At present, more and more devices are connected to Internet, such as smartphones and household appliances. With the increase of Internet connected devices, users are more likely to become targets of network attack. One of the major threats facing computer systems and their users today is malicious code (malware). Malware has complex behaviors, and can use different technologies to attack computer systems. Usually it can bypass the security mechanism, and install itself on the target host, and establish remote access.

People mainly use anti-virus software to detect malware. Most of these anti-virus tools use signature-based methods to detect malware (Filiol, 2006). A signature is a short string of bytes which is unique for each known malware. The representation of malware signature is simple, so it is very easy to build a large signature database for detecting malware. However, signatures have no semantics, and we cannot know what malware has done on a computer system. In addition, signature of malware can be easily modified. Malware writers can use obfuscation techniques to change signature, which can produce a large number of variants of a malware family. This results the signature-based methods fail to detect variants of known or previously unknown malware.

To solve this issue, behavior based detection methods are proposed. Malware behavior has different representations, for example, API sequences and opcode sequences all can be used to describe malware behavior. However, API sequences and opcode sequences cannot accurately represent malware behavior. People also cannot know what malware has done on the target machine. To obtain the exact behaviors of malware, it is necessary to run malware in the virtual environment and monitor the behavior of malware continuously. Malware is a super complex group. It contains many families, and each family is composed of a large number of individuals, and each individual shows complex behaviors. One problem we need to solve is how to represent and store such huge amount of information so that the machine can understand and process it automatically.

In this paper we study the representation of malware behavior and the construction method of the malware knowledge base. The purpose of this study is to develop an integrated knowledge base to represent and store behavior knowledge about malware individuals and families and help people analyze and detect malware. We introduce the ontology technology into the malware detection domain, and utilize ontology to describe malware behavior and construct the knowledge framework of malware, and use ontology reasoning technology to identify families of unknown malware.

Section snippets

Related works

In this paper, we use ontology technique to build malware knowledge base. So we only shows some research work related with ontology. Ontology can objectively describe things in the real world, so it is often used to describe the knowledge of a domain. Ontology has been applied in many fields such as knowledge engineering, artificial intelligence, web semantics and so on. In the field of information security, ontology has been used to describe the knowledge of malware.

Tafazzoli and Sadjadi (2008)

Ontology concepts for malware

Ontology is the science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality (Barry and Christopher, 2001). The ontology framework provides a consistent conceptual description of domain knowledge, which can share knowledge, facilitate knowledge reuse and reduce repetitive descriptions.

Typically, ontology consists of classes, properties, relationships between classes and individuals. They describe important concepts (classes of

Conclusions

In this paper we introduce ontology techniques into the malware domain, and propose the method for constructing malware knowledge base. Our contributions are as follows:

We design the concept classes and object properties of malware, and propose the method for representing malware behaviors.

We propose the methods for constructing the knowledge of malware individuals and malware families. The knowledge of malware individuals is represented as their behaviors, and the knowledge of malware family

Declaration of Competing Interest

No conflict of interest.

Acknowledgements

This work was partially supported by Scientific Research Foundation in Shenzhen (Grant No. JCYJ20180306172156841, JCYJ20180507183608379), Guangdong Natural Science Foundation (Grant No. 2016A030313664), the National Natural Science Foundation of China (Grant No. 61872107) and the National Key R&D Program of China (Grant No. 2018YFB1003800, 2018YFB1003805) .

Yuxin Ding received the Ph.D. degree in computer science from Institute of Software, Chinese Academy of Sciences, in 1999. He is currently an Associate Professor in the Department of Computer Science at the Harbin Institute of Technology Shenzhen Graduate School. His current research interests are primarily in computer security and machine learning.

References (25)

  • B. Jasiul et al.

    Identification of malware activities with rules

  • Malware L. (2019). Malicious code samples available from: http://malware.lu (Accessed 20 May...
  • Cited by (23)

    • An ontology-driven framework for knowledge representation of digital extortion attacks

      2023, Computers in Human Behavior
      Citation Excerpt :

      The researchers applied the Stanford named entity recognizer (NER) to extract cybersecurity-related entities. Ding et al. (Ding et al., 2019) also conducted a study on the use of ontologies for knowledge representation of malware and their families. However, the prototype model developed by them included only a limited number of malware-related classes.

    • MaliCage: A packed malware family classification framework based on DNN and GAN

      2022, Journal of Information Security and Applications
      Citation Excerpt :

      Tang et al. [41] used dynamic analysis to extract an API call and generated feature images representing malware behavior according to color mapping rules. An Apriori algorithm was applied to extract the common behaviors of individuals belonging to the same family, and to represent the knowledge of a malware family [16]. Mirza et al. [30] proposed a combination of machine learning techniques applied to a rich set of features extracted from a large dataset of benign and malicious files.

    • APTMalInsight: Identify and cognize APT malware based on system call information and ontology knowledge framework

      2021, Information Sciences
      Citation Excerpt :

      Generally, the ontology model is composed of classes, attributes, and relationships between classes and individuals [13]. The malware ontology model is a knowledge framework in the domain of malware, including concepts related to malware behaviors, malware categories and individuals, and computer system components, which could be leveraged to realize malware knowledge reasoning [9]. Based on the principle of ontology model, the design of the APT malware ontology model is shown in Fig. 4.

    • Attacks on the Industrial Internet of Things – Development of a multi-layer Taxonomy

      2020, Computers and Security
      Citation Excerpt :

      By contrast, well-elaborated classification schemes enable users to consider various characteristics of attacks. Existing ontologies elaborate on attack families such as malware (Ding et al., 2019) or attack detection in web applications (Razzaq et al., 2014). In the context of IT security in conventional IT systems, taxonomies address a variety of topics such as vulnerabilities and security gaps (Landwehr et al., 1994), attacks on IT systems (Howard and Longstaff, 1998), and network- and cyber-attacks (Hansmann and Hunt, 2005; Simmons et al., 2014).

    View all citing articles on Scopus

    Yuxin Ding received the Ph.D. degree in computer science from Institute of Software, Chinese Academy of Sciences, in 1999. He is currently an Associate Professor in the Department of Computer Science at the Harbin Institute of Technology Shenzhen Graduate School. His current research interests are primarily in computer security and machine learning.

    Rui Wu received her B.S. degrees in Computer Sciences from the Jiangxi University in 2017. She is currently a master student in the Department of Computer Science at the Harbin institute of technology Shenzhen Graduate School. Her current research interests are in machine learning and computer security.

    Xiao Zhang received his B.S. degrees in Computer Sciences from the Qingdao University in 2018. He is currently a master student in the Department of Computer Science at the Harbin institute of technology Shenzhen Graduate School. His current research interests are in machine learning and computer security.

    View full text