Elsevier

Neurocomputing

Volume 428, 7 March 2021, Pages 317-324
Neurocomputing

GeoTraPredict: A machine learning system of web spatio-temporal traffic flow

https://doi.org/10.1016/j.neucom.2020.06.121Get rights and content

Abstract

Traffic flow prediction is an important component for self-driving. Traffic flow is closely related to population distribution, and the traffic flow is not only related to the absolute number of human population but also to their concerns and interests. Accurate spatio-temporal web traffic flow prediction is critical in many applications, such as bandwidth allocation, anomaly detection, congestion control and admission control. Most existing traffic flow prediction methods use models based on time-series analysis and remain inadequate for many real-world applications. Web traffic flow is found to be strongly associated with the spatio-temporal distribution of the population. Increasingly, it is critical to understand and make decisions based on the relationship between population patterns and web traffic flow patterns. It has been proven that different people have different responses to web events. Due to the complexity of spatial data structures and the huge volume of web traffic flow log data, it is difficult to routinely find the relationship between web events and population distributions without an appropriate processing framework. In this paper, we propose an innovative framework named GeoTrafficPredict to support the accurate spatio-temporal prediction of web traffic flow. GeoTrafficPredict provides a machine learning platform to learn the spatio-temporal pattern of traffic flow and use the pattern to predict the trend in both spatial and temporal dimension. Also, GeoTrafficPredict provide data aggregation portal and cloud-based computation function. GeoTrafficPredict deploys a series of computational images in a cloud computing environment, and the implementation on China’s CSTNET illustrates the performance of our platform.

Introduction

Traffic flow prediction is an important component for self-driving. In the field of smart cities or self-driving, traffic prediction is widely used in route planning [1]. Traffic flow is closely related to population distribution, and the traffic flow is not only related to the absolute number of human population but also to their concerns and interests. At present, there is a lack of effective analysis methods and platforms of population interest distribution. Web flow traffic prediction is crucial to network providers, content providers and computer network management. It is an important task for many applications, such as adaptive applications, congestion control, admission control, anomaly detection and bandwidth allocation. As the number of network customers and kinds of service rapid grows, network problems are considered to be highly non-linear, time-variable dynamic systems, and the demand for network traffic forecasts has increased accordingly [2].

Considering that the source of network activity is people, internet traffic is found to be strongly associated with the spatio-temporal distribution of the population. Spatial analysis involving population distribution information can indirectly contribute to the analysis and prediction of web traffic flow. In addition, some works on traffic prediction using spatio-temporal characteristic in the autonomous vehicles field are proposed [3].On this basis, the paper proposed an innovative method to predict web traffic, which combines population and network event clusters to explore the correlation between network traffic and population classification, followed by comprehensive analysis. Specifically, the fluctuation in network traffic can be predicted through learning the relationship between traffic variations and network events, and the characteristics of cyber citizens. This allows us to then provide corresponding measures in advance. This method closely relates network traffic analysis and network traffic prediction to increase accuracy. This network traffic analysis mainly focuses on possible correlations between the change of network traffic and spatial and temporal distribution, which promotes better forecasting of network traffic. Consequently, the spatial analysis which can provide valuable inferences should be considered when doing network traffic prediction.

The general process of network traffic prediction covers several primary steps, such as collecting network traffic data, building a traffic prediction model with parameters, training the model to acquire parameters in a simulation environment, modify the parameters within the model, and so on. There are many different methods of collecting computer network traffic data, such as SNMP, packet sniffing, NetFlow and so on.

Deep learning provides valuable insights into network traffic prediction. There are various network traffic prediction models proposed which include models based on statistics, machine learning, and deep learning. For different application requirements, the prediction method must consider the prediction horizon, computational costs, prediction error rate, and response times, which creates challenges for traffic prediction.

In the paper we present a platform to conduct accurate spatio-temporal distribution and prediction of population by predicting network traffic. The proposed method combines population and network event clusters to explore the correlation between network traffic and population classification, followed by comprehensive analysis. Specifically, the fluctuation in network traffic can be predicted through learning the relationship between traffic variations and network events, and the characteristics of cyber citizens. This allows us to then provide corresponding measures in advance. This method closely relates network traffic analysis and network traffic prediction to increase accuracy. This network traffic analysis mainly focuses on possible correlations between the change of network traffic and spatial and temporal distribution, which promotes better forecasting of network traffic. Consequently, the spatial analysis which can provide valuable inferences should be considered when doing network traffic prediction. The proposed innovative framework named GeoTrafficPredict supports the accurate spatio-temporal prediction of web traffic flow. The following paper is organized as follows. After a description of related works in section two, and section three elaborates the process of GeoTrafficPredict. Section Four describes the framework of GeoTrafficPredict including the data tier, computation tier, and visualization tier. The implementation on China’s CSTNET (China Science and Technology Network) can be found in section four. Section five discusses the results and future work.

Section snippets

Related works

Web traffic flow prediction models developed in three stages: traditional models (short correlation model), Self-Similar Models (long correlation model), and emerging machine learning based models, shown in Fig. 1. In the 1970s and early 1980s, as network applications were relatively simple, data transmission volume was less, and network analysis technology was limited, people drew on the model of the public switched telephone network and used a Poisson model to describe the traffic of data

Events detection

Human behavior stimulated by external events shows different patterns, and considering that the source of network activity is people, the web traffic flow can be studied to detect the related events.

In the first stage, wavelet analysis was performed for network traffic data, and correlation analysis was conducted for clustering results and wavelet separation results to detect the events. Original data takes time as X-axis and network traffic as Y-axis. By the wavelet analysis, the related

SOA architecture design

Data analysis systems used to be deployed on stand-alone systems. Specifically, data processing modules, computing modules, and visualization modules are designed on computers or compute nodes with the same configuration. This architecture is suitable for small data sets and single source data sets without the need for data integration for comprehensive analysis. However, with the explosive growth of data sets including distributed multi-source data and the needs of network users, it is now

Data prediction process

The GeoTrafficPredict support the accurate spatio-temporal prediction of web traffic flow by establishing a series of models to explore the relationship between the population distribution and the external events stimulation. With collecting, pre-processing, formatting and storing the network traffic information from the front-end data acquisition part of the whole system, the events extraction model gets processible data to conduct statistical analysis and correlation analysis. Considering the

Cloud based model integration

Spatial analysis algorithms are diverse and have different angles of interest. Using different analysis algorithms on the same data set will be more conducive to system design. However, even if we have source code freely, it is difficult to implement all the relevant algorithms on a stand-alone machine. Specifications, configurations, and requirements for environments of different algorithms or models will bring endless obstacles to integration. On a cluster with multiple computers, making all

Network traffic data organization and storage

In the paper, we analyze and predict network traffic trends, using network data request packets coming from CSTNET. CSTNET is an academic, non-profit scientific research computer network under the leadership of the Chinese Academy of Sciences.The collection is based on the technology of sFlow and a sampling rate of 1/550. The time scope is from January 2018 to December 2018. For the purpose of testing, we divided data into two parts, one is for training while the other is for testing. In this

Experiments analysis

We conduct extensive experiments to evaluate the efficacy of the proposed GeoTrafficPredict system. We select one site deployed in CSTNET. The site is leading one to provide services including science news, project applications and project reviews. We select the flow data to the site of 2018, and make spatial-temporal correlation analysis on it. As most of visitors are teachers and researchers from national wide. Our proposed GeoTrafficPredict system show the variation of traffic flow of

Conclusions

In this paper, we proposed a comprehensive framework for spatial analysis on web flow prediction. The framework uses open source data and official data released by governments and related institutions to discover new network events. The framework automatically crawls network data and efficiently organizes data using a spatio-temporal cube data warehouse. During the organization of the data, the data retains time series information and geographic information. In the process of processing traffic

CRediT authorship contribution statement

Jingjing Li: Conceptualization, Methodology, Software. Jun Li: Supervision, Writing - review & editing. Nan Jia: Writing - original draft. Xunchun Li: Visualization. Wenzhen Ma: Writing - review & editing. Shanshan Shi: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study is partly supported by the National Natural Science Foundation of China under Grant No. 41971366, 71673158, the National Key Research Development (R&D) Plan under Grant No. 2018YFC0809700, and the Beijing National Science Foundation of China under Grant No. 9194027.

Jingjing Li Senior engineer of Computer Network Information Center(CNIC), Chinese Academy of Sciences(CAS). He earned his M.S. degree from the CNIC, CAS in 2007. He has been working on engineering and research at the field of network management, network user data analysis and optimization over 10 years. His research interests include future Internet architecture, SDN, openflow, network management and forwarding technology.

References (26)

  • J. Liu et al.

    Nonlinear network traffic prediction based on bp neural network

    Jisuanji Yingyong/ J. Comput. Appl.

    (2007)
  • A. Erramilli, R. Singh, P. Pruthi, Chaotic maps as models of packet traffic, in: Proc. 14th Int. Teletraffic Cong, vol....
  • A. Erramilli, R. Singh, P. Pruthi, Modeling packet traffic with chaotic maps, Citeseer,...
  • Cited by (5)

    • Meta Graph Transformer: A Novel Framework for Spatial–Temporal Traffic Prediction

      2022, Neurocomputing
      Citation Excerpt :

      The results show that our model significantly outperforms the state-of-the-art methods. Traffic prediction is a classical task in ITS [40] and recent years have seen much progress [41–55]. Early works were mainly focused on statistical methods, such as autoregressive integrated moving average-based methods [1,2], Kalman filter [3], and vector autoregressive model [4].

    Jingjing Li Senior engineer of Computer Network Information Center(CNIC), Chinese Academy of Sciences(CAS). He earned his M.S. degree from the CNIC, CAS in 2007. He has been working on engineering and research at the field of network management, network user data analysis and optimization over 10 years. His research interests include future Internet architecture, SDN, openflow, network management and forwarding technology.

    Jun Li received the B.S. degree from Hunan University in 1989 and the M.S. and Ph.D. degrees from the Institute of Computing Technology, Chinese Academy of Sciences, in 1992 and 2006, respectively. He is currently a Professor and the Chief Engineer with the Computer Network Information Center, Chinese Academy of Sciences. He has been involved in research and engineering in the field of computer network over 20 years. He has authored or co-authored over 100 peer-reviewed papers and one book. His research interests include computer network architecture and protocols, involving Network Security, Artificial Intelligence and Big Data Application. He was a recipient of the National Technological Progress Awards.

    Nan Jia is a lecturer at School for Police Information Engineering and Cyber Security, People’s Public Security University of China. Her research focuses on Big Data, public safety & security, and intelligent risk management.

    Xunchun Li is a deputy director of Radio Technology Research Institute of Academy of Broadcasting Science. His research interest includes radio and television coverage network planning and optimization, 5G transmission and GIS.

    Wenzhen Ma is an associate professor at National Space Science Center, Chinese Academy of Sciences. Her main research interest is big data processing and information system architecture.

    Shanshan Shi receive the B.S. degree from Shandong Agricultural University in 2011 and the M.S. degree from the Computer Network Information Center, Chinese Academy of Sciences in 2016. She is currently pursuing the Ph.D. degree with the Computer Network Information Center, Chinese Academy of Sciences. Her research interests include future Internet architecture, information-centric networking, congestion control and forwarding technology.

    View full text