Elsevier

Computer Networks

Volume 57, Issue 17, 9 December 2013, Pages 3446-3462
Computer Networks

Distribution-based anomaly detection via generalized likelihood ratio test: A general Maximum Entropy approach

https://doi.org/10.1016/j.comnet.2013.07.028Get rights and content

Abstract

We address the problem of detecting “anomalies” in the network traffic produced by a large population of end-users following a distribution-based change detection approach. In the considered scenario, different traffic variables are monitored at different levels of temporal aggregation (timescales), resulting in a grid of variable/timescale nodes. For every node, a set of per-user traffic counters is maintained and then summarized into histograms for every time bin, obtaining a timeseries of empirical (discrete) distributions for every variable/timescale node. Within this framework, we tackle the problem of designing a formal Distribution-based Change Detector (DCD) able to identify statistically-significant deviations from the past behavior of each individual timeseries.

For the detection task we propose a novel methodology based on a Maximum Entropy (ME) modeling approach. Each empirical distribution (sample observation) is mapped to a set of ME model parameters, called “characteristic vector”, via closed-form Maximum Likelihood (ML) estimation. This allows to derive a detection rule based on a formal hypothesis test (Generalized Likelihood Ratio Test, GLRT) to measure the coherence of the current observation, i.e., its characteristic vector, to the given reference. The latter is dynamically identified taking into account the typical non-stationarity displayed by real network traffic. Numerical results on synthetic data demonstrates the robustness of our detector, while the evaluation on a labeled dataset from an operational 3G cellular network confirms the capability of the proposed method to identify real traffic anomalies.

Introduction

Modern data and communication networks are exposed to many types of problems and security threats. In order to respond quickly and minimize service degradation, network operators require tools capable to promptly detect “abnormal” traffic conditions, i.e. anomalies. This is even more demanding in third-generation (3G) cellular networks, which are highly heterogeneous, complex and constantly evolving systems, and as such are exposed to unanticipated types of problems and threats [1], [2], [3]. Anomaly Detection (AD) in network traffic is a well-explored field, and several different techniques have been proposed (see e.g. [4], [5] and references therein). Generally speaking, the statistical-based AD approach seeks to identify a reference representative of the “normal” behavior and then look for any “significant” deviation from it. In other words, anomaly is defined as anything deviating from the expected behavior—expectation is a key concept here [6]. Thus, a complete AD scheme consists logically of a reference identification method followed by a detection rule for testing consistency between the observed data and the reference. As the state of the network and the behavior of its users change (e.g. following daily and weekly cycles, and long-term trends) so do the notions of “normal” behavior and “significant” deviation. Therefore the AD system should be adaptive: the reference identification as well as the detection rule must be dynamically updated so as to track the physiological changes in the traffic patterns.

Statistical-based AD can be applied to virtually any type of temporally-structured traffic data, or traffic representation, from coarse scalar time-series (e.g. of total volume or entropy) to finer-grain multidimensional representations (e.g. vectors, sketchs, histograms) of the underlying traffic process, extracted by some more or less involved procedure that typically requires feature selection, aggregation and tracking of per-flow states [7]. Moreover, in order to detect anomalies occurring at different timescales, the AD system should consider traffic data at different levels of temporal aggregation (multi-resolution). Operators of access networks are particularly concerned with revealing macro-anomalies, i.e. events that affect many network users (i.e., their “customers”) rather than micro-anomalies with impact limited to one or a few users, since the former more likely point to a problem in the shared network or service infrastructure. This motivates us to consider a distribution-based approach, where the network traffic is represented by (a set of) traffic distributions across users. In this way, we aim at profiling the aggregate behavior of the whole user population, rather than of individual users, which is in line with the goal of capturing macro-anomalies. More specifically, we consider a reference scenario where a passive monitoring system measures multiple traffic variables—e.g. number of packets of a certain type, such as “number of TCP SYN packets sent in uplink to port 80” or “number of distinct IP addresses contacted” or “volume of traffic on port 25” and so on (we will detail the formulation in Section 3)– for each individual user and at different temporal aggregation scales, from 1 min up to 1 day. For every variable and timescale, the data observed in each time-bin is summarized into a binned histogram—where bins are intervals that partition the span of the variable—that represents the empirical distribution of that variable across users. Therefore, we obtain a set of distribution timeseries, each referring to a different traffic variable and timescale. Each timeseries is then processed by a separate instance of a Distribution Change Detector (DCD) that learns adaptively the “normal” reference profile and detects if the current observation deviates “significantly” from the reference.

In the given reference scenario the number of variable/timescale combinations is large, and each follows a specific profile and temporal pattern different from the others. It would be practically infeasible to tailor the design and parametrization of the DCD module to each and every timeseries, therefore a suitable DCD should fulfill the following requirements:

  • Versatility: to model different traffic variables at different timescales and aggregation, without manual tuning.

  • Adaptiveness: to adjust the reference identification and the detection rule to the physiological changes in the traffic composition.

  • Low-complexity: to allow on-line implementation for a sufficiently large number of variables.

The goal of this paper is to derive a DCD with such properties by following a theoretically-grounded methodology.

The statistical-based AD approaches present in the literature can be grouped into two main categories: model-based (parametric) and model-free (non-parametric). The former (e.g. the popular CUSUM [8], [9] and other classical hypothesis testing methods) are based on strict a priori assumptions on the statistical characteristics of the data, which allow formal tractation and better control over the achievable performances (e.g. the Probability of False Alarm, PFA) but lack versatility. On the other hand, model-free methods based on general non-parametric techniques (e.g., PCA [10] to name just one) or simple heuristics (e.g. [7]) are more flexible but, lacking a formal hypothesis testing framework, must resort to heuristic detection rules which are difficult to control. In the present contribution we aim at breaking such a tension between the need for a tractable statistical model and the elusive peculiarities of empirical data. The proposed methodology is somehow a “third way” in between the model-based and model-free avenues. We leverage a Maximum Entropy (ME) approach to learn from empirical data the statistical model with the highest probability of being close to the underlying distribution out of a broad set of distributions, i.e. the Gibbs family. The set of ME model parameters, estimated via a Maximum Likeihoood (ML), represent the “characteristic vector” associated to the empirical distribution. Based on the latter, we derive a formal hypothesis test to establish whether the current sample is “compatible” with a reference extracted from selected past observations.

The formal test is obtained by applying the Generalized Likelihood Ratio Test (GLRT) theory to the problem at hand. This is a general approach for hypothesis testing in multidimensional data, which yields a formal and rigorous test provided that an a priori statistical model for the data is available and that the Maximum Likelihood (ML) estimation of its parameters can be obtained, possibly in closed-form to grant reasonable computational complexity. This makes the GLRT approach very powerful but difficult to derive, unless very tractable models are postulated (e.g., Gaussian). Indeed, real network traffic does not exhibit a simple statistical behavior and is non-stationary, i.e. it is difficult to model in a simple way. We introduced the general idea of GLRT for anomaly detection in network traffic in [11], without providing however any particular algorithm for the modeling. To the best of our knowledge, the only other attempt to apply a GLRT to anomaly detection in network traffic is [12], limited to data that can be modeled by α-stable distributions. An additional drawback of this approach is that such a model has no closed-form for the distributions, therefore powerful numerical methods are needed for parameter estimation. Furthermore, the lack of an analytical form impedes the derivation of a low-complexity GLRT detector. Conversely, our approach is general since does not make any assumption on the data at hand. The key idea is to use a Maximum Entropy (ME) approach for obtaining a general parametric model, which opens the door to formal hypothesis testing, hence to GLRT. Furthermore, thanks to the derivation in closed-form of the ML estimator of the characteristic vector, the ultimate structure of the detector has very low complexity. As a whole, our approach fulfills the operational requirements of versatility, adaptiveness and low-complexity identified above. To the best of our knowledge this is the first methodological work that conjugates all these characteristics within a formal hypothesis testing framework.

This paper makes the following contributions:

  • derivation of closed-form Maximum Likelihood (ML) estimator of ME characteristic vectors (Section 4);

  • derivation of a low-complexity GLRT-based detector based upon characteristic vectors (Section 5);

  • assessment of the theoretical performances of the detector in controlled simulations (Section 6);

Operational criteria for the practical application to real-world data are also provided (Section 7). Finally, the validation on a labeled dataset from an operational network (Section 8) demonstrates that the proposed scheme is capable to identify real traffic anomalies.

Section snippets

Related work

Before presenting our method in details, we critically review the most important related approaches that have been adopted for AD in network traffic. In particular we identify some recurrent ideas which, though scattered among different works, emerge transversally as key ingredients for a DCD with real operational value.

Lakhina et al. [10], [13], [14] provided the theoretical basis for the application of Principal Component Analysis (PCA) to AD. PCA is used to reduce the high dimensionality of

Anomaly detection framework

We consider a generic network (wireless or wired) serving a large population of users, where (i) traffic (packets or flows) can be stably associated to end-users and (ii) the traffic mix traversing the network is largely independent from routing changes. These requirements are typically met in access networks (e.g. xDSL Internet providers and 3G/4G mobile operators) but the method proposed here can be applied in other contexts as well. A high-level representation of the whole monitoring and

The Maximum Entropy approach

To translate the abstract hypothesis test (1) into a computable detector, amenable to real implementation within a generic DCD module, the pmf under test ptest (as well as the ones in the reference set S0) must be expressed in a parametric form. As discussed earlier, the challenge is to find an analytical model parsimonious enough to be tractable but also versatile enough to suit different variable/timescale nodes. Our approach is to address the problem following the principle of Maximum Entropy

GLRT-based detector on characteristic vectors

In this section we describe a novel GLRT-based detection algorithm that exploits the ME method as general tool for distribution modeling, thus completing the design of a general-purpose DCD module.

We assume that a set S0 of cardinality L is available as reference for the “normal” behavior. The algorithm described above corresponds to the block “ME model building” in Fig. 2, and allows to easily obtain the ME models pref(s)(ω) relative to each element pref(s)(ω) (s = 1,  , L) in S0—i.e., by Eq. (10)

Performance assessment on synthetic data

In this section we assess the performance of the proposed AD algorithm by means of Monte Carlo simulations on synthetic data. The main advantage of simulations is that all the parameters can be controlled and the “ground truth” is known. This allows to generate samples under both H0 (anomaly-free) and H1 (anomaly) hypotheses, and therefore determine exactly the PFA and PD. By plotting the PD against the PFA we obtain the Receiver Operating Characteristic (ROC) curve (Ref. Fig. 3). The area

Operational tuning

To operate in real contexts, the parameters of all DCD nodes, in our case the model accuracy and the detection threshold η, need to be dynamically adjusted to track the “physiological” variations of real traffic. The problem is that the “ground truth” is unknown in practice. Therefore, the test statistic cannot be characterized and it is not possible to set the threshold η in Eq. (15) that guarantees the desired PFA. Moreover, an operational criterion is needed to set the model accuracy . In

Validation on real network traffic data

The goal of this section is to validate the proposed methodology on real data from an operational network. Unfortunately, a competitive comparison with other existing AD methods is infeasible in our case. The fundamental issue is that there is no previous work considering AD on traffic distributions across users against which to compare. As discussed in Section 2, existing works are based on IP-layer identifiers like Origin–Destination (OD) pairs or port/address tuples, not on users. Hence, it

Conclusions

We proposed a novel distribution-based anomaly detection method that builds upon the Maximum Entropy principle, conceived as a grid of low-complexity DCD modules able to automatically adapt to the different traffic variables and timescales. Our method combines the advantages of the classical model-based and model-free approaches. Like the former, it embeds a ME modeling stage that can be computed in closed-form with a low-complexity procedure, and a formal GLRT for hypothesis testing. Like the

Angelo Coluccia received the “Laurea” degree summa cum laude in Telecommunication Engineering in 2007 from the University of Salento in Lecce, Italy. In the same year he joined the Forschungszentrum Telekommunikation Wien (FTW, Vienna) as researcher in the area of packet networking, working on traffic analysis and anomaly detection in an operational 3G network. Since 2008 he is in the Telecommunication Group at the University of Salento, where he received the PhD degree in Information

References (50)

  • H. Yang et al.

    Securing a wireless world

    IEEE Proceedings

    (2006)
  • P. Traynor, P. McDaniel, T. La Porta. On attack causality in internet-connected cellular networks, in: USENIX...
  • Ricciato

    A review of DoS attack models for 3G cellular networks from a system-design perspective

    Computer Communications

    (2010)
  • M. Thottan et al.

    Anomaly detection approaches for communication network

  • A. Patcha et al.

    An overview of anomaly detection techniques: existing solutions and latest technological trends

    Computer Networks

    (2007)
  • P. Traynor et al.

    Security for Telecommunications Networks

    (2008)
  • R. Sekar et al., Specification-based anomaly detection: a new approach for detecting network intrusions, in: ACM...
  • Haining Wang, Danlu Zhang, Kang G. Shin, Detecting syn flooding attacks, in: INFOCOM 2002, vol. 3, 2002, pp....
  • Lee et al., On the Detection of signaling DoS attacks on 3G wireless networks, in: IEEE INFOCOM’07, May...
  • Lakhina, Structural analysis of network traffic flows, in: ACM SIGMETRICS, June...
  • A. Coluccia, A. D’Alconzo, F. Ricciato, Distribution-based anomaly detection in network traffic, in: Data Traffic...
  • F. Simmross-Wattenberg et al.

    Anomaly detection in network traffic based on statistical inference and α-stable modeling

    IEEE Transactions on Dependable and Secure Computing

    (2011)
  • A. Lakhina, M. Crovella, C. Diot, Diagnosing network-wide traffic anomalies, in: ACM SIGCOMM,...
  • A. Lakhina, M. Crovella, C. Diot, Mining anomalies using traffic feature, in: ACM SIGCOMM,...
  • H. Ringberg, A. Soule, J. Rexford, C. Diot, Sensitivity of PCA for traffic anomaly detection, in: ACM SIGMETRICS,...
  • X. Guan et al.

    Dynamic feature analysis and measurement for large-scale network trafc monitoring

    IEEE Transactions on Information Forensics and Security

    (2010)
  • A. Ziviani et al.

    Network anomaly detection using nonextensive entropy

    IEEE Communications Letters

    (2007)
  • Dasu et al., An information-theoretic approach to detecting changes in multi-dimensional data streams, in:...
  • A. D’Alconzo, A. Coluccia, F. Ricciato, P. Romirer-Meierhofer, A distribution-based approach to anomaly detection for...
  • Y. Xiang et al.

    Low-rate ddos attacks detection and traceback by using new information metrics

    IEEE Transactions on Information Forensics and Security

    (2011)
  • W. Lee, D. Xiang, Information theoretic measures for anomaly detection, in: Symposium on Security and Privacy,...
  • M. Celenk et al.

    Predictive network anomaly detection and visualization

    IEEE Transactions on Information Forensics and Security

    (2010)
  • Y. Gu, A. McCallum, D. Towsley, Detecting anomalies in network traffic using maximum entropy estimation, in: IMC,...
  • Barford et al., A signal analysis of network traffic anomalies, in: ACM SIGCOMM’02,...
  • P. Huang, A. Feldmann, W. Willinger, A non-intrusive, wavelet-based approach to detecting network performance problems,...
  • Cited by (41)

    • Renyi entropy based design of heavy tailed distribution for return of financial assets

      2024, Physica A: Statistical Mechanics and its Applications
    • Explaining social events through community evolution on temporal networks

      2021, Applied Mathematics and Computation
      Citation Excerpt :

      At first, in Section 2, the GHRG is used to model temporal network snapshots. And the generalized likelihood ratio test (GLRT) [14] is applied to detect network change points. Next, the GED method is carried out in Section 3 to count the community evolution events in the temporal network.

    • A streaming approach to reveal crowded events from cellular data

      2020, Computer Communications
      Citation Excerpt :

      From a general point of view, the broad topic of traffic anomaly detection (i.e., not strictly involving cellular data) has been largely investigated over the last years (see [14–16] for a comprehensive overview). Out of the many approaches, it is worth recalling the machine learning approach proposed in [17,18], the combination of filtering and statistical methods discussed in [19], the technique based on principal subspace tracking suggested in [20], the traffic feature distributions used in [21,22], the utilization of big data analytics presented in [23], the method based on the variation in the entropy associated with the network traffic [24], and the kernel recursive least squares used in [25]. Similarly to this work, several other authors have adopted the Wavelet analysis to detect traffic anomalies.

    View all citing articles on Scopus

    Angelo Coluccia received the “Laurea” degree summa cum laude in Telecommunication Engineering in 2007 from the University of Salento in Lecce, Italy. In the same year he joined the Forschungszentrum Telekommunikation Wien (FTW, Vienna) as researcher in the area of packet networking, working on traffic analysis and anomaly detection in an operational 3G network. Since 2008 he is in the Telecommunication Group at the University of Salento, where he received the PhD degree in Information Engineering/Telecommunications in 2011. Besides Anomaly Detection and Security, his research interests include Communications, Wireless Networks (in particular Localization), Software-defined Radio and Signal Processing. He is also teacher for the course of “Digital Signal Processing” at University of Salento.

    Alessandro D’Alconzo received the MSc Diploma in Electrical Engineering and the Ph.D. degree from the Polytechnic of Bari, Italy, in 2003 and 2007, respectively. Since 2007 he is Senior Researcher at the Telecommunications Research Center Vienna (FTW), Austria. Since 2008 he is Management Committee representative of Austria for the COST Action IC0703 “Traffic Monitoring and Analysis”. He is scientific coordinator for FTW of the EU IP project DEMONS. His current research interests embrace the Network Measurements and Traffic Monitoring area, Quality of Experience evaluation, and application of secure multiparty computation techniques to inter-domain network monitoring.

    Fabio Ricciato received the Ph.D. in Information and Communications Engineering in 2003 from University La Sapienza, Italy. In 2004 he joined the Telecommunications Research Center Vienna (FTW) where he later acquired the leadership of the Networking Area. Since 2007 he is Assistant Professor (Ricercatore) in the Telecommunications Group at the University of Salento, where he teaches the course of “Telecommunication Systems”. His research interests cover various topics in the field of Telecommunication Networks, including Traffic Monitoring and Analysis, Network Measurements, Security and Privacy, routing and optimization, software-defined radio networks.

    View full text