CO-STAR: A collaborative prediction service for short-term trends on continuous spatio-temporal data

https://doi.org/10.1016/j.future.2019.08.026Get rights and content

Highlights

  • Online stream and offline batch processing work collaboratively in the service.

  • Business interpretable prediction model can hold minute-level executive latency with nearly 10% precision improvement.

  • The service has been used in practice and evaluated extensively.

Abstract

Over various sensory data of Internet of Things, not only the current situation but also the future trends of many fields are required instantly to promote the business. As a typical requirement, the short-term prediction on spatio-temporal data stream is imperative, but challenges still remain due to the inherent limitation of long calculative time and insufficient predictive precision. In this paper, a novel prediction service CO-STAR is proposed in the highway domain. On the continuous toll data of the whole highway network, the service employs non-parametric regression model to predict the traffic volume of all the stations periodically. Considering both spatial and temporal business characteristics, a collaborative paradigm of online stream computing and offline batch processing is adopted to balance the efficiency and precision. On the real data of one Chinese provincial highway and the simulated data, our service can hold minute-level executive latency with nearly 10 percent improvement for the predictive precision in extensive experiments.

Introduction

In modern cities, abundant sensors in different types are adopted nowadays, such as surveillance cameras on the trunk roads, smart-card readers on the buses, loop detectors at the toll stations, and transducer in the power plants. The spatio-temporal data is generated from those sensors in a timely manner, which includes the attributes of spaces and times [1]. Such data as typical stream in IoT (Internet of Things) environment owns unique characteristics. Firstly, the spatio-temporal data is always generated fast, continuously and concurrently. For example, in a metropolis like Beijing, more than 10 thousand cameras have been deployed at the road intersections, each of which would produce one record in a second [2]. It yields data stream in high throughput. Secondly, the spatio-temporal data has to be processed in a low latency. For example, to instantly locate the illegal vehicles in the road network, any analytical application of traffic polices on the data from surveillance cameras requires the calculation to respond in no longer than three seconds [3]. Thirdly, the correlation among spatial and temporal attributes often implies complexity and dynamics behind the superficial facts. As a result, the uncertainty must exist in short-term trends prediction. For example, on GPS data of OBU (on-board unit) device, the travel time of road segment follows different distribution in a day, and varies much among different road segments [4].

Over such spatio-temporal data stream, not only the current situation but also the future trends are required instantly [5] for business management. Therefore, the short-term prediction service has become the necessary and essential in many domains. For bus schedule planning, the predictive delivery time between two adjacent stops in coming hours is the crucial; for the traffic guidance, the predictive traffic volume of road segments in the next 5–10 min are imperative; to avoid the energy blackouts, the predictive peaks of power load in urban functional areas on summer days are vital. Accordingly, such prediction for short-term trends within 30 minutes is sensitive to factors of both time and space.

However, challenges still remain for the short-term trends prediction on continuous spatio-temporal data due to inherent limitations. (1) It is impossible to hold the low calculative latency and higher predictive precision simultaneously on typical data stream. There is a contradiction between the efficiency and the accuracy: much computational resource and time are required for the accurate the results, and vice versa. Therefore, it is not trivial to balance the consequential precision and the run-time performance in the data stream conditions. (2) Traditional modeling methods cannot comprehensively cover the distinctive characteristics of spatio-temporal data, which restricts their feasibility and interpretability in practice. For example, time restricted models applied in many scenes [6], [7], [8] only emphasize the temporal feature, and limit the predictive effects. Moreover, the classic statistical models only work well on limited samples at a given location, which are not suitable in current Big Data condition. Some modern models like neural network can preferably fit the trends, but still lack the business interpretability. Accordingly, it is not easy to find reasonable feature models for trend prediction, considering not only its functional capacity but also the business specialty. (3) It is not convenient to do the prediction on spatio-temporal data in hybrid computing paradigm due to the absence of general methodology. For the short-term trends, the real urban sensory data is essentially continuous stream, but most of classic solutions are batch computing paradigm like [9], [10], [11], [12]. The gap lying between the online data and offline processing makes it hard to guarantee the performance. The collaborative service to integrate offline and online paradigms for short-term trends prediction is seldom considered yet on spatio-temporal data.

Against the problems above, a novel prediction service is proposed in this paper, and it is illustrated in the highway domain. The contributions of our work can be summarized as follows. (1) To balance the calculative performance on continuous spatio-temporal data, the service architecture mixes online stream and offline batch computing paradigms, which can hold the processing latency in minute-level. The service can organize the continuous input data in millisecond, and then predict the traffic volume trends in the next time slot of all the stations with less than 80 s. (2) To improve the predictive precision on spatio-temporal data stream, a non-parametric regression model is designed for short-term trends prediction. Considering the interpretable business characteristics of time and space, it can reduce the predictive error about nearly 10 percentages than traditional ways. (3) Having been evaluated in a practical project, the service has the convincing benefits for the business technicians on the real and simulated data in extensive experiments.

The rest of this paper is organized as follows: Section 2 shows the motivation and related works; Section 3 presents the preliminaries including methodology and data pre-processing; Section 4 elaborates the short-term prediction service with its model and algorithm; Section 5 quantitatively demonstrates performance and effects in the experiments; Section 6 summarizes the conclusion.

Section snippets

Motivation

For the transportation of intra or inter cities, the congestion has become one of the most serious problems worldwide. As one of the primary reasons, the traffic capacity of the current road network has not been explored enough [13]. Accordingly, the traffic flow guidance is so imperative for official government [14] to optimize the urban utilization. In recent decades, the electronic toll collection (ETC) is widely adopted in highways to promote the passing efficiency, and much real-time data

Methodology

We propose CO-STAR (collaborative prediction service for short-term trends on continuous spatio-temporal data) as a dedicated PaaS (platform as a service) in private Cloud like Fig. 2. On the infrastructural resources of computation, storage, and network, the service gathers the continuous data from sensors and outputs results as stream for further usage or as persisted data in storage. Accordingly, online continuous data is channeled through message broker, and offline historical data is

Problem analysis and definition

Besides temporal dimension considered in traditional ways, the spatial one should be emphasized either for the trends prediction on spatio-temporal data stream. Although the traffic volume seems fluctuating randomly, it owns spatio-temporal correlation [44] because an on-going vehicle on a specific line has certain speed limitation and imperative safety distances between others. Moreover, downstream traffic volume would change drastically when up-streams vary in a sudden [11] due to the spatial

Experiment setting

In the project mentioned in Section 2.1, our service is evaluated by extensive experiments. In the infrastructure, five Acer AR580 F2 rack servers via Citrix XenServer 6.2 are utilized to build a private Cloud, each of which own 8 processors (Intel Xeon E5-4607 2.20 GHz), 64 GB RAM and 80 TB storage. To deploy the PaaS CO-STAR, three virtual machines are used, each of which owns 4 cores CPU, 22 GB RAM and 2 TB storage installing CentOS 6.6 x86_64 operating system. The cluster of Hadoop 2.6.0

Conclusion

In this paper, a novel PaaS CO-STAR is proposed to predict short-term trends on spatio-temporal data stream. Considering the spatio-temporal correlation with data pre-processing, the service employs efficient non-parametric regression model in a collaborative paradigm to balance the computing complexity and predictive precision. Taking the trends prediction of traffic volume in highway domain as an example, CO-STAR can hold the executive latency of minute level and improve predictive error

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We would like to thank Yanqing Xia, Jie Zhou, Qianhui Ma and Shan Lu who are the graduate students in our laboratory to optimize our model and improve the quality of this paper. This work was supported by the Youth Program of National Natural Science Foundation of China under Grant 61702014, Beijing Municipal Natural Science Foundation under Grant 4192020 and Top Young Innovative Talents of North China University of Technology under Grant XN018022.

Weilong Ding received the PhD degree from Institute of Computing Technology, Chinese Academy of Sciences in 2013, and now has been an associate professor at North China University of Technology, Beijing, China. His major research interests include real-time data processing, distributed system and Service Computing. He has published more than 50 academic papers and owns 5 invention patents.

References (45)

  • W. Ding, Z. Zhao, Y. Han, A Framework to Improve the Availability of Stream Computing, pp....
  • D. Wang, J. Zhang, W. Cao, J. Li, Y. Zheng, When Will You Arrive? Estimating Travel Time Based on Deep Neural...
  • DuY. et al.

    Incremental analysis of temporal constraints for concurrent workflow processes with dynamic changes

    IEEE Trans. Ind. Inf.

    (2019)
  • DuY. et al.

    Timed compatibility analysis of web service composition: A modular approach based on petri nets

    IEEE Trans. Autom. Sci. Eng.

    (2014)
  • WanqiuL. et al.

    An Improved Least Square Support Vector Regression Algorithm for Traffic Flow Forecasting

    (2014)
  • LiD.-M. et al.

    Modeling and Prediction of Highway Traffic Flow Based on Wavelet Neural Network

    (2014)
  • ParkJungme et al.

    Intelligent trip modeling for the prediction of an origin-destination traveling speed profile

    IEEE Trans. Intell. Transp. Syst.

    (2014)
  • ChanK.Y. et al.

    Prediction of short-term traffic variables using intelligent swarm-based neural networks

    IEEE Trans. Control Syst. Technol.

    (2013)
  • LiZ. et al.

    A comparison of detrending models and multi-regime models for traffic flow prediction

    IEEE Intell. Transp. Syst. Mag.

    (2014)
  • de AssuncaoM.D. et al.

    Distributed data stream processing and edge computing: A survey on resource elasticity and future directions

    J. Netw. Comput. Appl.

    (2018)
  • LvY. et al.

    Traffic flow prediction with big data: A deep learning approach

    IEEE Trans. Intell. Transp. Syst.

    (2015)
  • DingW. et al.

    A data cleaning method on massive spatio-temporal data

  • Cited by (0)

    Weilong Ding received the PhD degree from Institute of Computing Technology, Chinese Academy of Sciences in 2013, and now has been an associate professor at North China University of Technology, Beijing, China. His major research interests include real-time data processing, distributed system and Service Computing. He has published more than 50 academic papers and owns 5 invention patents.

    Xuefei Wang received the master degree at the School of Information Science and Technology, North China University of Technology in 2019. Her research interests include Big Data, Internet of Things and Intelligent Transportation System.

    Zhuofeng Zhao is a professor at North China University of Technology. He got his PhD degree in Institute of Computing Technology, Chinese Academy of Sciences in 2005. His research interests include Stream Computing over Big Sensor Data, Service-Oriented Computing, Web Information Integration and Business Process Management. He has acted as the principal investigator (PI) of many research projects. He has published over 50 papers and owns nearly 10 invention patents. He is the co-chair for many distinguished conferences.

    View full text