Elsevier

Computer Communications

Volume 205, 1 May 2023, Pages 170-181
Computer Communications

CANET: A hierarchical CNN-Attention model for Network Intrusion Detection

https://doi.org/10.1016/j.comcom.2023.04.018Get rights and content

Abstract

Network Intrusion Detection (NID) is an important defense strategy in modern networks to detect malicious activities in large-scale cyberspace. The current NID methods suffer from a high false positive rate, which significantly reduces the overall effectiveness of network intrusion detection systems and simultaneously increases the maintenance cost. Furthermore, the class imbalance problem associated with the intrusion detection dataset limits the detection rate for the minority classes. This paper proposes a novel hierarchical CNN-Attention network, CANET. In CANET, CNN and the Attention mechanism mingle to form a CA Block that focuses on local spatio-temporal feature extraction. The multi-layer CA Block combination can fully learn the multi-level spatio-temporal features of network attack data, which is more suitable for modern large-scale NID. Besides, for the class imbalance problem, we propose to use Equalization Loss v2 (EQL v2) to increase the minority class weight and balance the learning attention on minority classes. Extensive experiments demonstrate that CANET outperforms the state-of-the-art methods in terms of accuracy, detection rate, and false positive rate. And it efficiently improves the detection rate of minority classes. The source code for the proposed CANET models is publicly available at https://github.com/yuanshuai666/CANET.

Introduction

The information age has brought us great convenience, but a great number of information security problems as well, such as privacy disclosure, alteration, destruction, or even threatening national security. Therefore, how to effectively prevent network attacks has become an urgent problem. NID can identify suspicious patterns in incoming packets for further attack identification and containment. However, the complex and diverse characteristics of network attacks and the imbalanced data distribution not only limit the detection efficiency of NID but also lead to a high false positive rate (FPR).

Various studies have shown that traditional machine learning-based techniques such as k-Nearest Neighbors, Support Vector Machine, and Naive Bayes [1], [2] have been applied to NID. However, traditional machine learning methods are not suitable for large-scale NID with the gradual enrichment of network attack categories, due to their limitations in changing feature learning. In recent years, deep learning-based methods are gradually being applied to NID. In particular, several recent studies [3], [4] have proved that deep learning (DL) techniques can achieve an important improvement in accuracy compared to conventional machine learning(ML) techniques. Although existing DL-based methods [5], [6] improve the detection rate (DR), because of insufficient feature learning, they still suffer from a high FPR.

Nowadays, the class imbalance problem caused by datasets has been attracting more attention due to the fact that the large gap in the number of instances between different classes significantly reduces the DR. To the best of our knowledge, current solutions are mainly divided into three types. The first is the data-level method, which focuses on modifying the training dataset so that the standard learning algorithms can also be effectively trained. However, the data-level method is susceptible to noise and increases computational overhead. The second is the integrated method, which focuses on combining a data-level or algorithm-level approach with ensemble learning to obtain a powerful ensemble classifier. Nevertheless, integrated method is complicated, time-consuming, and not robust to noise. The third is the cost-sensitive method, which solves the class imbalance problem by adjusting the loss weights. The cost-sensitive learning assigns higher misclassification costs to minority class samples and smaller misclassification costs to majority class samples. In this way, the importance of minority class samples is increased by the cost-sensitive learning during the training process, thus reducing the classifier’s preference for the majority class. While the existing cost-sensitive methods have improved the sensitivity and accuracy of the model, there may be impacted by outliers due to focusing too much on those samples that are particularly difficult to distinguish, adversely affecting the final results.

To address the above challenges, we propose a simple yet effective model CANET that incorporates spatiotemporal features. It can automatically learn to focus on characteristics that need to be concerned without additional supervision. Meanwhile, to solve the class imbalance problem, we propose the cost-sensitive Equalization loss v2 (EQL v2) [7] to weight the minority classes. Extensive experiments demonstrate that CANET consistently improves prediction accuracy across different datasets and training sizes while achieving state-of-the-art performance without requiring extra data preprocessing. The contributions of this work can be summarized as follows:

  • (1)

    We take the Attention approach a step further by proposing CA Block that mingles CNN and Attention at each layer to extract local multi-level spatiotemporal features. And the optimal architecture of six cases is explored for comparative analysis, while the results show that CANET is the most efficient architecture. Moreover, our approach fully takes the structural characteristics of network attacks into consideration and is more suitable for modern large-scale NID.

  • (2)

    To solve the problem of low DR for minority classes caused by the class imbalance, we exploit a gradient-guided reweighting loss, EQL v2, to weight the minority samples. This also eliminates the need for any preprocessing for data imbalance. The EQL v2 effectively improves the DR of the model for minority attacks on the problem of large-scale imbalanced intrusion detection at a small cost.

  • (3)

    We demonstrate that our method outperforms state-of-the-art methods on four challenging datasets: UNSW-NB15 [8], NSL-KDD [9], CICIDS2017 [10] and CICDDoS2019 [11] datasets. Our model achieves 89.39% accuracy (ACC), 98.93% DR, and 0.87% FPR on the UNSW-NB15 dataset. 99.77% ACC, 99.72% DR, and 0.18% FPR on the NSL-KDD dataset. 99.88% ACC, 99.82% DR, and 0.06% FPR on the CICIDS2017 dataset. 99.58% ACC, 99.97% DR, and 0.06% FPR on the CICDDoS2019 dataset.

Section snippets

Related work

In recent years, artificial intelligence (AI)-based intrusion detection systems have grown in popularity due to their ability to identify new threats. In this section, we investigate the existing works of NID and recent advances in the class imbalance problem.

The proposed CANET for NID

In this section, we elaborate on our CANET method. The overview of the CANET framework is shown in Fig. 1. In this section, we first introduce the data preprocessing method in Section 3.1, then show the design of the CA block in Section 3.2, and finally describe the loss function in Section 3.3.

Experiment

In this section, we evaluate the performance of our proposed model on four widely used NID datasets: NSL-KDD, UNSW-NB15, CICIDS2017 and CICDDoS2019. The data distribution of these four datasets used in this work are shown in Fig. 2, Fig. 3, Fig. 4, Fig. 5, note that the data distribution is very imbalanced. We demonstrate our performance by conducting comparisons with other DL models and conducting ablation studies to validate the performance of different components.

Conclusion

In this work, we propose a novel CNN-Attention network, CANET, which fuses the spatial and temporal features. It fully takes the structural characteristics of network attacks into consideration and solves the problem of temporal information loss caused by high-level spatial feature extraction. Additionally, CANET allows one to generate fine-grained attention that can be exploited for other sequence feature extraction tasks. To solve the problem of class imbalance, we propose to use EQL v2,

CRediT authorship contribution statement

Keyan Ren: Writing – review & editing, Funding acquisition, Formal analysis, Supervision. Shuai Yuan: Project administration, Writing – review & editing, Conceptualization, Methodology, Software, Validation, Writing – original draft, Visualization. Chun Zhang: Writing – review & editing. Yu Shi: Writing – review & editing. Zhiqing Huang: Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by National Key Research and Development Project, China No. 2019YFC1511003, No. 2018YFC19008005, and National Natural Science Foundation of China (NSFC) No. 61803004 through grants for our project.

References (35)

  • WangH. et al.

    A network intrusion detection system based on convolutional neural network

    J. Intell. Fuzzy Systems

    (2020)
  • YinC. et al.

    A deep learning approach for intrusion detection using recurrent neural networks

    Ieee Access

    (2017)
  • J. Tan, X. Lu, G. Zhang, C. Yin, Q. Li, Equalization loss v2: A new gradient balance approach for long-tailed object...
  • MoustafaN. et al.

    UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)

  • TavallaeeM. et al.

    A detailed analysis of the KDD CUP 99 data set

  • SharafaldinI. et al.

    Toward generating a new intrusion detection dataset and intrusion traffic characterization

    ICISSp

    (2018)
  • SharafaldinI. et al.

    Developing realistic distributed denial of service (DDoS) attack dataset and taxonomy

  • Cited by (0)

    View full text