Elsevier

Computer Networks

Volume 205, 14 March 2022, 108760
Computer Networks

Network traffic analysis over clustering-based collective anomaly detection

https://doi.org/10.1016/j.comnet.2022.108760Get rights and content

Abstract

Due to the ever-growing presence of network traffic, there has been a considerable amount of research on anomaly detection in network traffic by clustering. Most of them have not considered the problem that collective anomaly detection in network traffic. Collective anomaly might scatter among multiple clusters when applying the clustering-based algorithms in the anomaly detection. In this paper, we propose a progressive exploration framework for collective anomaly detection on network traffic based on a clustering method, called CCAD. CCAD enables analysts to effectively explore collective anomaly in network traffic. This framework is different from the other anomaly detection methods. It is based on the analysis of the influence of collective anomaly on the clustering results in the network traffic stream data. CCAD framework efficiently supports the collective anomaly exploration. As demonstrated by our extensive experiments with real-world data, CCAD has high detection rate in comparison with other existing methods.

Introduction

Background. Anomaly detection has been applied in a wide range of fields, including network intrusion detection, credit card fraud detection, industrial system monitoring and control, and so on. Anomaly detection is considered as an important task for many researchers. In simple terms, an attack is considered as an anomaly if it deviates from the “typical case” significantly [1]. However, an attack may not just a point anomaly in sometimes. Actually, it is also termed a collective anomaly. Collective anomaly refers to a collection of similar data instances which behave anomalously with respect to the entire dataset, but are not necessarily anomalous individually. These types of anomalies can be found in sequential or time-series network traffic [2], [3].

Motivation. In recent years, the traditional philosophy of using a knowledge base or external supervision has been superseded by the unsupervised anomaly detection techniques which are based on pure fundamental aspects of data mining, such as clustering. Without relying on expert supervision, unsupervised anomaly detection employs clustering techniques to judge the underlying structures of unlabeled data as well as unknown behaviors or attacks. However, it is not easy to output the collective anomaly by clustering, when new collective anomaly arrives in stream network traffic.

Fig. 1 illustrates the scenario of collective anomaly detection while using the clustering-based method in stream network traffic dataset. C1, C2, C3 and C4 are clusters in a network traffic dataset respectively. The area of C0 contains the new data points. It is important to note that these new data points are usually very few. However, we observe that C0 is a cluster intuitively. In fact, if we further analyzes C0, we will discover that this is a collective anomaly. There are two reasons: a) this is a collective anomaly that cannot be detected from any of the new data points individually; b) the collective anomaly cannot be found by clustering because the number of abnormal records is too small.

Limitations of the State-of-the-Art. Although efforts have been made in developing efficient algorithms for anomaly detection on the data stream, these algorithms are not sufficient when applying to the problem that we discussed above. In [4], the authors use x-means clustering to detect collective anomaly such as DDoS. The performance of their technique is better than other existing clustering-based methods. However, the time complexity of this algorithm is too high to apply to streaming data. The cluster with the minimum variance is considered a collective anomaly. It does not take into account that the abnormal data will be divided into different clusters so they cannot be detected. In PCstream [5], the authors believe a collective anomaly is a rare sequence of transitions between contexts. However, this work is limited in that, it only considers the context. It is not sufficient when the context does not significantly affect the detected data.

In this paper, we design an innovative approach Clustering-based Collective Anomaly Detection (CCAD) framework for continuous monitoring the collective anomaly in sliding windows over network traffic. The aim is to eliminate the limitations of previously proposed algorithms. It is an efficient exploration measurement for detecting the occurrence of real anomaly when a new sequence of input network traffic comes in which solves the problem of the collective anomaly detection process.

Main Contributions.

The major contributions of this work are as follows:

(1) Our CCAD framework is the first to track the problem of collective anomaly scattered among multiple clusters when applying the clustering-based algorithm in stream network traffic.

(2) The key innovation of CCAD is based on the analysis of the influence of collective anomaly on the clustering results in stream network traffic. This is in contrast with other state-of-the-art algorithms which only consider the difference between normal and abnormal.

The rest of the paper is organized as follows. In Section 2, we present problem formulations. We present details of our CCAD framework in Section 3. We provide our detailed experimental study on real-world data sets in Section 4. In Section 5, we review the related work. Finally, we conclude the paper with discussions and future research directions in Section 6.

Section snippets

Problem formulations

In network traffic analysis applications, traffic data volumes are huge. It means that large amounts of data require huge memory to load. However, it is impossible to keep all data when processing instead, a sliding window which contains only part of the whole network traffic is used. The new arrived network traffic data in the window termed as active data. When the network traffic data left the window is termed as expire data. Two parameters need to be considered. One is the window size w, and

CCAD framework

Due to the limitations of the state-of-the-art, we propose the CCAD framework to detect the collective anomaly. It solves the problem that collective anomaly data is scattered among the individual clusters when applying the clustering-based algorithms in stream network traffic.

Sliding Window with Clustering Methodology. We first present how to effectively process the results from different windows when applying Affinity Propagation (AP) algorithm [6] to sliding window. We also show later that,

Performance evaluation

Experimental Methodology. We have conducted a series of experiments to evaluate the performance of the proposed algorithm. We compare CCAD algorithm against three state-of-the-art algorithms. All methods are conducted on a Windows Server 2008 R2 Datacenter with Intel(R) Xeon(R) CPU E5-2609 1.90 GHz and 16 GB memory. All algorithms are implemented in Python.

Real-World Datasets. We use CICIDS2017 [8] and UNSW-NB15 [9] to evaluate performance of CCAD algorithm. The CICIDS2017 includes the results

Related work

Anomaly detection and anomaly analysis are topics that have been attracting the interest of researchers for several decades. Comprehensive surveys can be found in [2], [3], [12], [13].

In recent years, there are several novel methods that are applied to detect anomaly by clustering. CBLOF [14] propose a definition for clustering-based local anomalies which states that all the data points in a certain cluster are considered anomalies rather than single points. They used some numeric parameters to

Conclusion

In this work, we present a novel and efficient solution called CCAD, for detecting collective anomaly in network traffic stream by statistic theoretical and method. More specifically, our framework solves the problem of collective anomaly data scattered among multiple clusters when applying the clustering-based algorithms in the streaming network traffic. As it is shown in the performance evaluation results are based on real network traffic data set, the proposed techniques are by factors more

CRediT authorship contribution statement

Chonghua Wang: Methodology, Resources. Hao Zhou: Conceptualization, Methodology, Software, Formal analysis, Writing. Zhiqiang Hao: Supervision. Shu Hu: Formal analysis. Jun Li: Supervision, Project administration. Xueying Zhang: Investigation. Bo Jiang: Validation. Xuehong Chen: Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the National Key Research and Development Program of China (No. 2020YFB2009500).

Chonghua Wang received the Ph.D. from Institute of Information Engineering, Chinese Academy of Sciences, was a Visiting Joint Ph.D. student at Purdue University. He is an associate professor at China Industrial Control Systems Cyber Emergency Response Team. His research interests include industrial internet security, attack and defense technology of network and system, cloud security and ICS security.

References (40)

  • I. Sharafaldin, A.H. Lashkari, A.A. Ghorbani, Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic...
  • MoustafaN.

    Designing an online and reliable statistical anomaly detection framework for dealing with large high-speed network traffic

    (2017)
  • S.D. Bay, M. Schwabacher, Mining distance-based outliers in near linear time with randomization and a simple pruning...
  • D. Yang, E.A. Rundensteiner, M.O. Ward, Neighbor-Based Pattern Detection for Windows Over Streaming Data, in: EDBT...
  • AggarwalC.C.

    Outlier Analysis

    (2015)
  • GuptaM. et al.

    Outlier detection for temporal data: A survey

    IEEE Trans. Knowl. Data Eng

    (2014)
  • M. Amer, M. Goldstein, Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner, in:...
  • I. Syarif, A. Prugel-Bennett, G. Wills, Unsupervised Clustering Approach for Network Anomaly Detection, in:...
  • Y.Y. Aung, M.M. Min, A collaborative intrusion detection based on K-means and projective adaptive resonance theory, in:...
  • AngiulliF. et al.

    Fast outlier detection in high dimensional spaces

    (2002)
  • Cited by (13)

    • IP traffic behavior characterization via semantic mining

      2023, Journal of Network and Computer Applications
    • Encrypted DNS Traffic Analysis for Service Intention Inferring

      2023, IEEE Transactions on Network and Service Management
    View all citing articles on Scopus

    Chonghua Wang received the Ph.D. from Institute of Information Engineering, Chinese Academy of Sciences, was a Visiting Joint Ph.D. student at Purdue University. He is an associate professor at China Industrial Control Systems Cyber Emergency Response Team. His research interests include industrial internet security, attack and defense technology of network and system, cloud security and ICS security.

    Hao Zhou received the M.S. degree from Institute of Information Engineering, Chinese Academy of Sciences. He is a research associate at China Industrial Control Systems Cyber Emergency Response Team. His research interests include cyber security, data-driven security, security of artificial intelligence.

    Zhiqiang Hao received the M.S. degree from Beijing Institute of Technology. He is a professor of engineering at China Industrial Control Systems Cyber Emergency Response Team. His research interests include industrial internet security and ICS security.

    Shu Hu received the M. Eng. Degree in Software Engineering from University of Science and Technology of China. He is a computer science Ph.D. at University at Buffalo, The State University of New York. His research interests include machine learning, digital media forensics.

    Jun Li received the Ph.D. from Beijing Institute of Technology. He is an professor of engineering at China Industrial Control Systems Cyber Emergency Response Team. His research interests include industrial internet security and ICS security.

    Xueying Zhang received the M.S. degree from Beijing University of Posts and Telecommunications. She is a research associate at China Industrial Control Systems Cyber Emergency Response Team. Her research interests include industrial internet security, data security and ICS security.

    Bo Jiang received the Ph.D. degree in Chinese Academy of Sciences. He is an associate professor at the Institute of Information Engineering, Chinese Academy of Sciences. His research interests include network situational awareness, knowledge graph and data mining.

    Xuehong Chen received the M.S. degree in Information Engineering University. She is an associate professor at China Industrial Control Systems Cyber Emergency Response Team. Her research interests include industrial internet security and ICS security.

    This work is supported by the National Key Research and Development Program of China (No. 2020YFB2009500).

    View full text