An trustworthy intrusion detection framework enabled by ex-post-interpretation-enabled approach

https://doi.org/10.1016/j.jisa.2022.103364Get rights and content

Abstract

An enormous number of machine learning models have been recently proposed for intrusion detection. Among these models, the complex models stand out as a prominent approach for intrusion detection in network security. In contrast with the simple models, the complex models are powerful in that it learns the complex abstraction between input and output under the premise of the loss of the transparency. This lack of interpretability hinders the landing of the complex model in the field of intrusion detection. To balance the model interpretability and performance, a novel trustworthy intrusion detection framework (TIDF) combining machine learning and ex-post-interpretation method is proposed in this paper. The proposed framework TIDF achieves 82% prediction accuracy. In the contrast experiment, TIDF outperforms the junior Network Security Manages Engineer (NSME). With the proposed framework, we achieve a good prediction performance and improve the model interpretability in the intrusion detection. Thus, the proposed framework may act as a potential useful tool in the intrusion detection system.

Introduction

With the development of cloud computing, 5G communications and e-commerce, the importance of network security has significantly increased, among which the intrusion detection acts as a safeguard for network security. Network intrusion is defined as the potential, premeditated and unauthorized attempts to attack the target network [1]. Intrusion detection is applied to identify the network intrusion to make the network access reliable and stable. Over the past few years, network intrusion is becoming more and more rampant. On 14 June, 2016, the Democratic National Committee’s (DNC) computer network was hacked and the important internal documents were leaked online [2]. According to the information source, the intrusion detection systems (IDS) can be classified into host-based system and network-based system [3]. The network-based intrusion detection system uses raw data packets from the network as the data source and applies the adapter to monitor all the network traffic timely [4]. Anomaly detection and misuse detection are usually employed to detect the network intrusion. Anomaly detection uses the data generated by the normal behavior of the monitored system to distinguish the intrusive behavior from the abnormal behavior [5]. While misuse detection (signature-based detection) is a network intrusion detection technology based on pattern matching [6]. Misuse detection is strong in the known attack identification, but not at the unknown attack. Anomaly-based detection method can detect zero-day attacks, but it has a high false positive rate [7].

In recent decades, machine learning methods, especially the complex methods, are widely used in intrusion detection to improve the security of information. Farnaaz and Jabbar implemented a complex model (random forest) for intrusion detection system and achieved 99.67% accuracy [8]. Kim et al. employed the deep neural network (DNN) to predict the network intrusion behavior with 99% accuracy [9]. Zhang, Zulkernine and Haque proposed a systematic framework based on random forests to detect the misuse, anomaly, and hybrid-network-based intrusion detection behavior. The experimental results show that the overall performance of intrusion detection systems is improved [10]. To improve the generalization of single complex model, the combinatorial models are implemented. Muhammad et al. presented a detection system constructed by stacked auto encoder (AE) and DNN [11]. The evaluation showed that the developed detection system got the best performance on the Aegean Wi-Fi Intrusion Dataset . Ferrag et al. applied a rules and decision tree-based intrusion detection system (RDTIDS), which combined three different classifier approaches: Reduced Error Pruning (REP) Tree, JRip algorithm and Forest PA. RDTIDS provided the state of the art performance on the experimental data set [12]. The complex methods have been proven to work well for detecting network intrusion with considerable inspection rate. A general idea here is that the complex models can learn deep abstraction of the intrusion detection data. However, the deep abstraction leads to the opacity of the model, which will affect the landing of intrusion detection system in real world.

Building model interpretability that can improve the trust of the model is one of the most exciting tasks in intrusion detection. Among the model interpretability approaches that have been studied, Shapley Additive exPlanation (SHAP) stand out intrusion detection as a prominent approach with great potential [13], [14], [15]. Wang et al. applied SHAP on an intrusion detection system to interpret the prediction results of DNN [16]. Alenezi and Ludwig using SHAP explain the developed Random Forest Classifier, eXtreme Gradient Boosting (XGBoost) Classification, and the Keras Sequential algorithms on cyber security threat data [17]. Oseni et al. proposed an intrusion detection framework using SHAP to explain DL-based IDS in IoT network to improve resiliency [18]. SHAP provides global and local interpretation based on Shapley value aggregation. We note that most work simply applies a single interpretable approach like SHAP to interpret the complexity model developed for intrusion detection. When it comes to interpreting the complex models for intrusion detection, SHAP also brings time-consuming, oversimplified and even confusing [14]. Considering the complexity of the model, adopting a single interpretable approach like SHAP is inadequate for the intrusion detection tasks it has to perform. However, no much work has been done to achieve the comprehensive understanding of the intrusion detection systems based on the complex models.

To remedy this, and to investigate whether intrusion detection systems can benefit from artificial intelligence (AI), we propose a novel trustworthy intrusion detection framework (TIDF) combining a variety of complex models and a variety of interpretable methods to balance the accuracy and transparency of models. Fig. 1 depicts the structure of TIDF for the cyber security. The network access data is preprocessed, balanced and divided. Next, the model is constructed to predict the results of network intrusion. Then, both objective and subjective test are utilized to evaluate the proposed TIDF framework. The model global and local interpretations are exploited to improve the understanding of prediction result on network access data. Finally, network security manages engineer (NSME) obtains the prediction results and model interpretation. The results show that the proposed framework improves the transparency of the intrusion detection system.

The major contributions of this paper are as follows: (1) To achieve higher accuracy in intrusion detection, multiple machine learning (ML) models are explored. The performance evaluation of the ML models are achieved on the public NSL-KDD benchmark. (2) To build trust between the NSMEs and the complex models, an XAI intrusion detection framework (TIDF) is proposed to provide the ex-post-interpretation that improves the transparency of complex models. Inspired by the contrast experiment, to further elevate the reliability of this framework, the human–machine contrast experiment is introduced to evaluate the proposed framework.

The paper is organized in the following way: Section 2 illuminates the research methodology. Section 3 evaluates the proposed framework and represents the details of the ex-post-interpretation for the complex models. In the last two sections, we discuss and conclude this work.

Section snippets

Data set

To evaluate the effectiveness of the proposed interpretable intrusion detection framework for intrusion detection, NSL-KDD data set is utilized. NSL-KDD is a classic data in intrusion detection field, which is a refined version of the KDD CUP’99 data set [19], [20]. A total number of 148517 pieces of network access records are collected. It covers 39 specific network access behaviors. Each record in the NSL-KDD is labeled with normal or abnormal. This is an unbalanced data set. Thus, synthetic

Results

We apply both objective and subjective tests to evaluate the effectiveness of the proposed TIDF framework for the case of explaining the prediction behavior of complex models on intrusion detection. In what follows, we show the objective evaluation, subjective evaluation and the model ex-post-interpretation. We implement the above models in the Python 3.8 environment.

Discussion

In this study, to achieve the interpretability of the network intrusion detection system based on the complex black box models such as XGBoost, a novel interpretable intrusion detection framework (TIDF) using XAI technology is proposed. To build the TIDF framework, multiple machine learning models are employed. In addition, five classical ex-post-interpretation methods (feature importance, PDP, SHAP, LIME and ELI5) are utilized to provide interpretability of complex models in the TIDF

Conclusion

In this paper, we have presented XAI intrusion detection method, a novel framework (TIDF) for the interpretability improvement of intrusion detection system. LR, LinearSVC, KNN, CART, BernoulliNB, non-linearSVC, NN, RF, AdaBoost, LGBM and XGBoost classifiers are applied to instantiate the proposed TIDF framework. Empirical results demonstrate that both simple and complex models achieve a higher classification performance than the junior NSMEs. The outcome of the developed XGBoost model in

CRediT authorship contribution statement

Junfeng Peng: Conceptualization, Methodology, Software. Ziwei Cai: Data curation, Formal Analysis, Writing – original draft. Zhenyu Chen: Supervision, Formal Analysis, Rescources. Xujiang Liu: Methodology, Software, Supervision. Mianyu Zheng: Supervision. Chufeng Song: Validation. Xiongyong Zhu: Funding acquisition. Yi Teng: Funding acquisition. Ruilin Zhang: Visualization. Yanqin Zhou: Writing – original draft. Xuyang Lv: Writing – original draft. Jun Xu: Funding acquisition, Writing – review

Acknowledgments

Support of Subjective Experiments by 13 students in the computer normal class of grade 18 in Guangdong University of Education School of Computer Science.

This work was supported by the Foundation Item: Scientific research platforms and projects of colleges and universities in Guangdong Province under Grant 2021ZDZX3016; Scientific research platforms and projects of colleges and universities in Guangdong Province under Grant 2021ZDZX1062; Young innovative talents project of colleges and

References (38)

  • KimJ. et al.

    Method of intrusion detection using deep neural network

  • MuhammadG. et al.

    Stacked autoencoder-based intrusion detection system to combat financial fraudulent

    IEEE Internet Things J

    (2020)
  • FerragM.A. et al.

    Rdtids: Rules and decision tree-based intrusion detection system for Internet of Things networks

    Future Internet

    (2020)
  • MolnarC.

    Interpretable machine learning

    (2020)
  • LundbergS.M. et al.

    A unified approach to interpreting model predictions

    Adv Neural Inf Process Syst

    (2017)
  • WangM. et al.

    An explainable machine learning framework for intrusion detection systems

    IEEE Access

    (2020)
  • AleneziR. et al.

    Explainability of cybersecurity threats data using SHAP

  • OseniA. et al.

    An explainable deep learning framework for resilient intrusion detection in IoT-enabled transportation networks

    IEEE Trans Intell Transp Syst

    (2022)
  • RevathiS. et al.

    A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection

    Int J Eng Res Technol (IJERT)

    (2013)
  • Cited by (0)

    Kindly note that Junfeng Peng, Ziwei Cai, Zhenyu Chen and Xujiang Liu are jointly of the first authorship of the paper.

    View full text