Elsevier

Computer Networks

Volume 56, Issue 2, 2 February 2012, Pages 686-702
Computer Networks

Detection of traffic changes in large-scale backbone networks: The case of the Spanish academic network

https://doi.org/10.1016/j.comnet.2011.10.017Get rights and content

Abstract

Network management systems produce a huge amount of data in large-scale networks. For example, the Spanish academic network features hundreds of access and backbone links, each of which produces a link utilization time series. For the purpose of detecting relevant changes in traffic load a visual inspection of all such time series is required. As a result, the operational expenditure increases. In this paper, we present an on-line change detection algorithm to identify the relevant change points in link utilization, which are presented to the network manager through a graphical user interface. Consequently, the network manager only inspects those links that show a stationary and statistically significant change in the link load.

Introduction

In large-scale networks, the amount of information provided by management systems is huge. For example, time series of traffic volume or network link load may be provided per each access link. Network managers face with visual inspection of far too many graphs, which motivates automated procedures that basically pinpoint which are the links that deviate from a typical behavior and demand intervention from the manager, out of the many links present in the network. We propose a load model for network links that is capable of efficiently tracking sustained load changes in network links. Our model is suitable for any network link with high aggregation (e.g., backbone links and access links of large institutions). It is aimed at facilitating network-wide monitoring of large-scale networks, by clearly identifying network links with a varying traffic behavior. Moreover, forensic data for each link can be later analyzed off-line, in order to spot possible correlations that serve to understand how the detected load changes in one link have impacted the performance of the rest of the network.

Previous approaches to network-wide traffic analysis use point-to-point [1], [2] or point-to-multipoint [3] models for analyzing the demands in backbone networks. The key concept in these works is the Origin–Destination (OD) flow. An OD flow is a time series that comprises all the traffic that enters the backbone in a given Point of Presence (PoP) and leaves in another PoP. Therefore, the analysis of the backbone demands is divided into n2 time series, each representing an OD flow, being n the number of PoPs in the backbone network. To compute the OD flow time series, the authors of these works leverage on flow level measurements (to find the amount of traffic entering the network at each PoP) and routing information measurements (to determine the egress point of each measured flow). Our approach to network-wide traffic analysis reduces the complexity of the aforementioned methodologies leveraging on link time series. Network topologies in backbone networks are usually far from being a completely meshed topology. Thus, the number of links in a backbone network is considerably lower than the square of the number of nodes. In our case study, the Spanish academic network RedIRIS1 comprises 18 PoPs and only 30 backbone links. Therefore, our network-wide traffic analysis approach accounts for only 60 elements to monitor (because the links are bidirectional), considerably less than the 182 = 324 different OD flows with the RedIRIS topology. Moreover, our model is fed only with average load measurements at high granularity (90 min intervals), which can be easily obtained from Simple Network Management Protocol (SNMP) measurements [4]. This also entails a complexity reduction compared with the other network-wide traffic analysis approaches existing in the literature. Our model needs simpler measurements and simpler post-processing steps for the measurements, which makes it amenable for on-line application and enables its utilization in a broader set of network links.

We think our work is relevant to network operators and the research community. On the one hand, network operators are aware of the importance of detection of traffic changes, which are relevant at different timescales. Load changes at short timescales are relevant for anomaly and attack detection, where a sudden change in the load may be related with flash crowds or Denial of Service (DoS) attacks [5], [6], [7], [8]. On the contrary, load changes at long timescales (in the scale of days or weeks) should be taken into account for traffic engineering task such as load balancing and capacity planning [9], [10]. To the best of our knowledge, there is little existing work in the literature regarding traffic engineering procedures based on the detection of statistically significant sustained changes, and the more relevant approaches are normally based on simple time series forecasting techniques [11] focused on short-term changes. In those cases, a prediction of the load is used to compute confidence bands, where the actual value of the load should lie in under normal network performance. However, this methodology is not able to determine whether the change is stationary (i.e., the changed value is maintained over several time periods) and therefore the traffic behavior has changed. Consequently, in practice, the network manager should visually inspect the different link load plots to make such decision. In contrast, our methodology focuses only on sustained changes that may imply a shift in users’ behavior.

In this paper, we provide techniques that allow the network manager to focus only on those links that show stationary load changes. The case study is the Spanish academic network RedIRIS. We note that RedIRIS features 30 bidirectional backbone links and hundreds of connections to large institutions, and it is not feasible to analyze all of the corresponding time series separately from an operational expenditure (OPEX) point of view. Consequently, our proposed technique filters out those links which do not show statistically significant changes in the traffic behavior. As a result, the OPEX is largely reduced, because the traffic engineering tasks are only performed on a reduced subset of links. To identify such changes, we developed an on-line algorithm that uses clustering techniques and statistically sound methodologies to determine the location and statistical significance of the change points. In addition to providing valuable techniques to discriminate load-changing links, which have a direct impact in OPEX reduction, our findings also serve to gain insight about the dynamics of load change in large-scale networks. Is the load change continuous or showing sudden change in mean? How frequent are load changes in a large network? Our analysis serves to address these issues with a dataset that is three-year long and comprises the whole Spanish academic network, i.e., more than one million users.

Our proposed algorithm is based on a fairly multivariate Gaussian vector that models the daily traffic pattern of links with large aggregation level. Such model splits the 24 h day period into 16 non-overlapping intervals of 90 min starting at midnight, each of which is a vector component. We have validated our fairly Gaussian model with real network measurements obtained also from the RedIRIS network, showing evidence that the significance of the normal theory tests of mean vectors and covariance matrices are not severely affected by the deviations from normality existing in actual data. This result allows us to apply multivariate normal inference to the mean vector, namely the Multivariate Behrens–Fisher Problem (MBFP) procedure, to determine if there is a statistically significant difference in the mean vectors of two consecutive time series. Therefore, when there is evidence of a change in the load time series, we alert the network managers, allowing them to take the appropriate action as a response to that change.

After assessing the performance of the load change detection algorithm, we have applied it to such real network measurements, showing the efficiency in reducing the number of times the network needs supervision. We have analyzed more than 300 days worth of data, and in average, we have placed around 11 alerts per link. This supposes that a network manager would have receive an alert for a statistically significant and sustained change less than 4% of the days. In the remaining days, the network is considered stable and no action is required.

A distinguishing feature of the MBFP procedure to detect changes is that it evaluates the difference in the mean vectors taking all the vector components into account at the same time. This may result in changes that are due to either small differences in several vector components or large differences in a single vector component. In addition, as the vector components represent time intervals, the relevance of a change may be different depending on the vector component that caused the change detection. For instance, changes at night-time may not be relevant compared to those at the busy hours. Consequently, we devise an alert color code to categorize the change points located by our algorithm. Such color code is used to create weather maps of the network, allowing to visually inspect the relevant events happening in the network in an straightforward manner.

The rest of the paper is organized as follows: Section 2 is devoted to present the measurement dataset. Section 3 describes the load model and presents the methodology and results of its validation process. Section 4 presents the on-line load change detection algorithm and the assessment of its performance with synthetic data. Section 5 provides the results of the application of the algorithm to actual network measurements and Section 6 shows how the proposed methodology could be applied to monitor a large-scale network like RedIRIS. Finally, Section 7 concludes the study.

Section snippets

Measurement dataset

This section is devoted to present an overview of the network traffic measurements used in this study. As we noted in the previous section, our algorithm is fed by average load measurements computed at non-overlapping intervals of 90 min length. A simple averaging process of SNMP measurements obtained at 5 min granularity is enough to obtain such data. We gather network measurements at such resolution from Multi-Router Traffic Grapher (MRTG) tools [12] installed on the network equipments of the

Multivariate normal model for daily traffic

In this section, we present our multivariate model for network daily traffic load, and show practical evidence of its applicability. We assume that the network measurements to model come from SNMP reports at 5 min granularity due to its popularity, or instead come from another measurement methodology but using the same format. This model was first introduced in [18], and takes advantage of the apparently invariance of the daily traffic pattern shape for working days presented in Section 2.2. The

On-line load change detection algorithm

In the validation of the multivariate model we confirmed that the whole dataset does not follow a normal distribution, whereas small subsamples of it actually do. This fact suggest that the parameters of the normal distribution may be changing slowly with time (i.e., short-term stationarity). This section presents an on-line load change detection algorithm, aimed at identifying changes in traffic loads when monitoring Internet links. Such algorithm was first introduced in [18] and produces an

Change point analysis with real network measurements

In this section, we present the results of applying our change point detection methodology of Section 4 to the real network measurements of Section 2.1. Table 5 summarizes the number of tests performed and alerts generated by our algorithm when applied to such dataset, which is three-year long. The second column shows the number of times the MBFP testing methodology was applied. This is the number of times that the clustering algorithm found potential change points. The third column shows the

Network management based on relevant events

In this section we present a network management system that uses the change point detection algorithm, i.e., it shows the relevant events that potentially need action by the network manager. We develop an alert color code to differentiate the importance of the detected changes, which allows us to create weather maps of the operator’s network showing the most conflictive links that may be eligible for capacity planning and traffic engineering tasks. As it turns out, when the algorithm detects a

Summary and conclusions

In this paper, we have presented an on-line load change detection algorithm, which uses clustering and statistical techniques to identify statistically significant load changes. The algorithm is based on a multivariate fairly normal model, which keeps track of the well-known daily pattern of the network, in order to make the statistical inference. We have validated the suitability of that distribution to model the daily pattern and make inferences about the means of the distribution.

The

Acknowledgments

The authors thank the support of the Spanish Ministerio de Ciencia e Innovación (MICINN) to this work, under project ANFORA (TEC2009–13385) and the FPU fellowship program that has funded this research.

F. Mata received his MSc degree in Telecommunications Engineering with Honours at Universidad Autónoma de Madrid (Spain) in 2007. Nowadays he is combining his studies in Mathematics and Computer and Electrical Science at the same university, where in 2008 he joined the High Performance Computing and Networking Group at which he currently is pursuing his PhD in network monitoring and measuring under the F.P.U. fellowship program of the Ministry of Education of Spain. His research interests

References (35)

  • H. van den Berg et al.

    QoS-aware bandwidth provisioning for IP network links

    Computer Networks

    (2006)
  • J.L. García-Dorado et al.

    Characterization of the busy-hour traffic of IP networks based on their intrinsic features

    Computer Networks

    (2011)
  • A. Lakhina et al.

    Structural analysis of network traffic flows

    ACM SIGMETRICS Performance Evaluation Review

    (2004)
  • S. Bhattacharyya et al.

    Geographical and temporal characteristics of inter-POP flows: view from a single PoP

    European Transactions on Telecommunications

    (2002)
  • A. Feldmann et al.

    Deriving traffic demands for operational IP networks: methodology and experience

    IEEE/ACM Transactions on Networking

    (2001)
  • W. Stallings

    SNMP, SNMPv2, SNMPv3, and RMON 1 and 2

    (1998)
  • P. Barford, J. Kline, D. Plonka, A. Ron, A signal analysis of network traffic anomalies, in: Proceedings of ACM SIGCOMM...
  • Y. Chen, K. Hwang, Collaborative change detection of DDoS attacks on community and ISP networks, in: Proceedings of...
  • B. Krishnamurthy, S. Sen, Y. Zhang, Y. Chen, Sketch-based change detection: methods, evaluation, and applications, in:...
  • R. Schweller, A. Gupta, E. Parsons, Y. Chen, Reversible sketches for efficient and accurate change detection over...
  • K. Papagiannaki et al.

    Long-term forecasting of Internet backbone traffic

    IEEE Transactions on Neural Networks

    (2005)
  • A. Feldmann et al.

    Netscope: traffic engineering for IP networks

    IEEE Network

    (2000)
  • J. Brutlag, Aberrant behavior detection in time series for network monitoring, in: Proceedings of USENIX Conference on...
  • T. Oetiker, D. Rand, MRTG: the multi router traffic grapher, in: Proceedings of USENIX Conference on System...
  • K. Thompson et al.

    Wide-area Internet traffic patterns and characteristics

    IEEE Network

    (1997)
  • TRAMMS Consortium, TRAMMS IP Traffic report, Tech. Rep. 2, TRAMMS Project, 2008....
  • K. Fukuda et al.

    The impact of residential broadband traffic on Japanese ISP backbones

    ACM SIGCOMM Computer Communication Review

    (2005)
  • Cited by (17)

    • Workforce capacity planning for proactive troubleshooting in the Network Operations Center

      2023, Computer Networks
      Citation Excerpt :

      Probably the difficulties described above have meant that the study of stationary changes has received much less attention than other types of changes, albeit its relevance in capacity planning among other tasks. Certainly, the authors in [6] focused on stationary changes by considering the bands for normal behavior as a vector of multiple thresholds instead of a scalar. Each threshold accounted for a 90-minute interval throughout the day.

    • Distributing data analytics for efficient multiple traffic anomalies detection

      2017, Computer Communications
      Citation Excerpt :

      As traffic varies throughout the day, it is essential to consider the concrete traffic period in which the anomaly occurs. Authors in [13] suggested a model based on splitting a 24 h day period into 16 non-overlapping intervals of 90 min. Their on-line change detection algorithm identified relevant changes in link utilization and reported those links to a centralized controller for further analysis.

    • Anomaly detection in diurnal data

      2014, Computer Networks
      Citation Excerpt :

      These users can be divided into two main groups: enterprise users, who access the network in their workplaces, and domestic users, accessing the network from their residences. The enterprise users’ daily pattern is directly related to the office working hours, i.e., the load is larger during working hours, and usually there appear two clearly distinguishable peaks—before and after lunchtime (see [4] for a study on the Spanish Academic Network RedIRIS). The domestic users’ pattern is also influenced by the working hours, but obviously in the opposite way: the load is larger after the usual working hours (see [9] for such a study held within the European project TRAMMS).

    • On the modeling of multi-point RTT passive measurements for network delay monitoring

      2019, IEEE Transactions on Network and Service Management
    View all citing articles on Scopus

    F. Mata received his MSc degree in Telecommunications Engineering with Honours at Universidad Autónoma de Madrid (Spain) in 2007. Nowadays he is combining his studies in Mathematics and Computer and Electrical Science at the same university, where in 2008 he joined the High Performance Computing and Networking Group at which he currently is pursuing his PhD in network monitoring and measuring under the F.P.U. fellowship program of the Ministry of Education of Spain. His research interests include network management and traffic measurement and modeling.

    J.L. García-Dorado received the MSc degree in Computer Science and the PhD degree in Computer and Telecommunications Engineering in 2006 and 2010, both from Universidad Autónoma de Madrid (Spain). In 2006, he joined the Networking Research Group at the same university, as a researcher involved in the ePhoton/One Plus Network of Excellence, where he is collaborating in national and European research projects. In 2007, he was awarded with a four-year fellowship by the Ministry of Education of Spain (F.P.I. scholarship). His research interests are focused on the analysis of network traffic: its management, modeling, and evolution.

    J. Aracil received the MSc and PhD degrees (Honours) from Universidad Politécnica de Madrid (Spain) in 1993 and 1995, both in Telecommunications Engineering. In 1995 he was awarded with a Fulbright scholarship and was appointed as a Postdoctoral Researcher of the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley. In 1998 he was a research scholar at the Center for Advanced Telecommunications, Systems and Services of The University of Texas at Dallas. He has been an associate professor for University of Cantabria and Public University of Navarra and he is currently a full professor at Universidad Autónoma de Madrid (Spain). His research interests are in optical networks and performance evaluation of communication networks. He has authored more than 100 papers in international conferences and journals.

    View full text