Passenger flow estimation based on convolutional neural network in public transportation system
Introduction
Automatic passenger flow estimation is very useful for traffic management and overcrowding situation detection in public transportation. The accurate passenger flow information can improve the efficiency of public transportation service by optimizing the route plan and traffic scheduling. Meanwhile, it can also prevent severe traffic accidents caused by overloading. Traditionally, automatic passenger counting can be done by contact-type counters, optical sensors, and vision-based systems. Contact-type counters can be applied in many public places with entrances, such as subways and bus stations. However, it can cause congestion when the passenger flow is high because it counts passengers in sequence one by one. Optical sensors, such as radiation beam systems, do not block the doorways, but suffer from the undercounting problem. In recent years, automatic method of counting the passing passengers based on digital image processing has attracted more attention, which can reduce the cost and require no user intervention.
In the past few decades, several methods related to people counting have been proposed. Generally speaking, these methods can be divided into three approaches. The first approach is trajectory clustering based counting. This kind of methods count passengers based on the following hypothesis. Those trajectories belonging to the same human body are more similar than trajectories belonging to different individuals. In [1], the trajectories of visual features were clustered, and the number of passengers was estimated by the number of these clusters. Based on Dirichlet process mixture models (DPMMs), Topkaya et al. [2] employed a clustering scheme to estimate the number of passengers. This method fused a set of spatial, color and temporal information features for each detection. However, the performance of these methods will degrade greatly in illumination variation and low resolution transportation scene since this situation usually reduces the stability of the algorithms. The second one is regression based counting. The number of passengers in this type of methods is estimated by learning the regression function between features extracted from the input images and the people counted in a scene. Many regression functions, such as Bayesian regression [3], neural networks [4], [5] and SVR regression [6], [7], were used to estimate the crowd density. However, these methods just provide the counted number of people and fail to locate the individuals. Sometimes, the location information of individuals is important for video surveillance security systems, especially in public transportation scenarios. The last one is detection based counting. In this type of approach, a detector is carefully designed to detect the people from the input images. Different type of detection methods, such as body detection [8], [9], [10], head-shoulder detection [11], [12], [13], skeleton graph [14], [15] and head template matching [16], [17], were proposed. Although this kind of methods offers the promising results, the carefully hand-engineered designed detector is not robustness in low resolution scene. It fails for some real surveillance applications. Besides, the stereo camera based method in [16] needs multiple cameras and extra calibration procedure, which might cause lots of inconvenience in deployment.
In recent years, although a few advances have been proposed [18], [19] in this field, there are still some challenges, especially in complicated low-resolution scenes. The resolution of most videos in public transportation surveillance system is relatively low. Nonetheless, this task is nontrivial due to the illumination, occlusion, scale and pose variation of passengers in cluttered background. Some of them, such as the regression based method [19] are not suitable for our application. The number of passengers is time-varying in our scene. The examples of typical application are shown in Fig. 1.
From Fig. 1, we can observe that in addition to occlusion caused by the movement of passengers in Fig. 1(a) and (b), the images in Fig. 1(a)–(h) are of low resolution with 320*320 and the scales of heads in Fig. 1(c) and (d) change gradually as the passengers get on/off. Fig. 1(e) and (f) show the background clutter caused by the vibration of buses, which will be difficult for the followed-up counting. Meanwhile, lots of issues should be addressed due to the variation of illumination, as shown in Fig. 1(g) and (h). To our best knowledge, there are few papers discussing the problem of passenger counting in the complicated low-resolution public transportation scenarios.
Recently, convolutional neural networks (CNN) have shown outstanding performance on image classification tasks [20], [21], [22] and object detection tasks [23], [24], [25], [26]. Different from traditional hand-engineered representations, they have deep architectures [27] and can learn powerful object representations. Motivated by this, we adopt CNN to handle the above mentioned challenges, which can extract the target feature automatically and is robust to the variations of illumination, pose and scale.
In this paper, we propose a passenger counting system to address the counting problem by combining the CNN detection model and the spatio-temporal context (STC) model [28], which offers the following advantages. Firstly, by combining the mixture of Gaussian (MoG) model [29] and background subtraction, target pre-location will greatly reduce the detection time of CNN. The CNN model is used to automatically learn the related features of passengers and then detect the moving passengers. Secondly, inspired by the movement of ants in nature, a biologically inspired pheromone map and a 3D peak confidence map are proposed to address the tracking drift.
The rest of the paper is organized as follows. Section 2 briefly describes the overview of our framework and introduces each module of the proposed method in detail. In Section 3, we evaluate our method with extensive experiments on an actual public transportation dataset. Finally, conclusions and future work are given in Section 4.
Section snippets
Overview of our framework
As shown in Fig. 2, the proposed passenger counting framework includes two stages: the off-line training stage and the online counting stage. In the off-line training stage, the CNN model is trained by both the positive samples and negative samples with a new dataset constructed from the real complicated public transportation scenarios. The layers, weights and neuron numbers of CNN model are obtained by the off-line training. After that, the online counting stage is used to count the passengers
Experimental results
Our dataset is collected from surveillance video of public bus transportation in China. The classical scenes of our application are shown in Fig. 1. We manually extract 34,322 head positive samples and collect 179,398 negative samples from real surveillance video in bus transportation. We will show our counting demo video with front door, rear door and bi-directional counting scenes in the supplemental materials.
Conclusion
The proposed system uses a single inexpensive camera mounted overhead, which eliminates the need for calibration and creates a low-cost system. The presented passenger counting system combines the CNN detection model and spatio temporal context model to address the counting problem in the scenes of low resolution and with variation of illumination, pose and scale. Different from many sliding windows based detection method, our method adopts MoG model to obtain the foreground, which can
Acknowledgement
Thank Hitachi (China) Research & Development Corporation for providing the foundation for this research. This work was supported partly by the National Natural Science Foundation of China with Grant No. 61671452, No. 61675036 and No. 61302054.
References (45)
- et al.
Trends in extreme learning machines: a review
Neural Netw.
(2015) - et al.
Counting pedestrians in video sequences using trajectory clustering
IEEE Trans. Circuits Syst. Video Technol.
(2006) - et al.
Counting people by clustering person detector outputs
- et al.
Counting people with low-level features and Bayesian regression
Image Process. IEEE Trans.
(2012) - et al.
Cross-scene crowd counting via deep convolutional neural networks
- et al.
Counting pedestrians in crowds using viewpoint invariant training
- et al.
Multi-source multi-scale counting in extremely dense crowd images
- et al.
A method based on the indirect approach for counting people in crowded scenes
- et al.
Joint deep learning for pedestrian detection
- et al.
Upper body detection and tracking in extended signing sequences
Int. J. Comput. Vision
(2011)
Histograms of oriented gradients for human detection
Robust head-shoulder detection by pca-based multilevel hog-lbp detector for people counting
A new edge feature for head-shoulder detection
Head-shoulder detection using joint HOG features for people counting and video surveillance in library
Fast people counting using head detection from skeleton graph
Head detection based on skeleton graph method for counting people in crowded environments
J. Electron. Imaging
Head detection in stereo data for people counting and segmentation
VISAPP
Pedestrian detection: an evaluation of the state of the art
IEEE Trans. Pattern Anal. Mach. Intell.
People counting based on head detection combining Adaboost and CNN in crowded surveillance environment
Neurocomputing
Single-image crowd counting via multi-column convolutional neural network
Caffe: convolutional architecture for fast feature embedding
Cited by (75)
Wireless Sensor Network Architecture for Passenger Counting in Public Transportation
2023, Transportation Research ProcediaA CNN-Bi_LSTM parallel network approach for train travel time prediction
2022, Knowledge-Based SystemsAI-based neural network models for bus passenger demand forecasting using smart card data
2022, Journal of Urban ManagementDeep Learning-Based Passenger Counting System Using Surveillance Cameras
2024, 2024 16th International Conference on COMmunication Systems and NETworkS, COMSNETS 2024A data decomposition and attention mechanism-based hybrid approach for electricity load forecasting
2024, Complex and Intelligent Systems