Single image vehicle classification using pseudo long short-term memory classifier☆
Introduction
Vehicles play a significant role in modern life, and many people use them on a daily basis. The increasing number of vehicles on roads has increased the risk of traffic incidents. The Canadian Council of Motor Transport Administrators, in their 2014 statistic report [1], noted that traffic accidents in Canada cause more than 1800 fatalities and more than 150,000 injuries. The situation forced the traffic safety division to implement a real-time surveillance system for road scene analysis. The aim of the real-time surveillance system is to understand the behavior of road users, including non-vehicle users such as cyclists and pedestrians. The combination of CCTV cameras with a video management system is the main configuration used to implement real-time traffic surveillance systems in the world today because it is cheaper and easier to implement than other sensors. Usually, traffic is manually analyzed by a control-room operator using real-time video data from CCTV cameras. To increase effectiveness, a system that can automatically analyze road traffic is required. One component of such a system is a vehicle recognition module which is very important in analyzing the behavior of road users. The vehicle recognition task is necessary to narrow down a target efficiently and effectively from a large number of vehicle categories. A vehicle recognition module usually consists of two different systems: a vehicle detection system and a vehicle classification system. The vehicle detection system is responsible for detecting the vehicle appearing in the CCTV video data, while the vehicle classification system is responsible for categorizing the vehicle. This paper discusses the vehicle classification system, by assuming that detection has already been achieved in previous steps and that the input of the vehicle classification system is the cropped region of the vehicle candidate.
The development of vehicle classification systems began in early 1980 with the use of induction loop sensors. Such sensors were used until approximately 2000, as described in [2], [3], [4], [5], [6]. One disadvantage of induction loop sensors is that they have very high implementation and maintenance costs compared with other approaches. Researchers have proposed other approaches for vehicle classification systems, including acoustic sensors [7], [8], blade sensors [9], wireless sensor networks [10], and vision sensors [11], [12], [13]. In recent years, convolutional neural network (CNN) classifiers have proven to be very effective for image classification tasks [14], [15], [16]. Additionally, CNNs classifier have been applied to vehicle classification problems, as described in [17], [18], [19], [20]. We briefly describe several vehicle classification approaches using CNN classifier in Section 2. One advantage of a vehicle image is geometric consistency because a vehicle is not a deformable object. This characteristic can be used as an additional clue for classifying the vehicle category. Some approaches have explored geometric features, including spatial pyramid pooling [21], [22] and deformable parts models [23].
In this paper, we propose a new pseudo long short-term memory (P-LSTM) classifier for single image vehicle classification problems. The P-LSTM classifier is designed based on spatial pyramid features and is composed of parallel convolutional networks and an LSTM classifier. The parallel convolutional networks aim to extract the spatial pyramid features, while the LSTM classifier attempts to identify the correlations between the features extracted from the parallel convolutional networks, by treating each convolutional network as an independent timestamp. Our contributions can be described as follows.
- •
We proposed an end-to-end classifier (called P-LSTM) composed of parallel convolutional networks and an LSTM classifier. The P-LSTM works by exploring the interdependencies of the spatial pyramid features extracted from the input image.
- •
We investigated several different CNN architectures for parallel convolutional networks, including AlexNet, SqueezeNet, and Resnet18. Experiments show that the Resnet18 CNN architecture is the most efficient architecture for our parallel convolutional networks.
- •
We investigated LSTM and multilayer LSTM for searching for the correlations between the features extracted from parallel convolutional networks. Experiments show that by adding more layers to the LSTM network, the evaluation score increases.
The rest of the paper is organized as follows. In Section 2, we briefly discuss related work by various researchers who have addressed the vehicle classification problem. We describe the concept and explain the technical details of our proposed classifier in Section 3. Section 4 discusses experiments with our proposed classifier using an MIO-TCD vehicle classification dataset. In the last section, we summarize the experiments and potential future work for this research.
Section snippets
Related work
In this section, we briefly describe the relevant work on vehicle classification described in [17], [18], [19], [20]. All methods were developed based on deep learning methods and tested using the MIO-TCD vehicle classification dataset [24]. Table 1 shows a summary of several related works that use the same dataset.
Approach
In this section, we briefly describe the main concept of spatial pyramid matching theory and LSTM classifiers, followed by a detailed explanation of our proposed classifier. Our proposed classifier is designed based on spatial pyramid matching theory and an LSTM classifier.
Results and discussion
All training and testing processes were performed using a Caffe framework [46] and the results were uploaded to an MIO-TCD vehicle classification submission site [24] for evaluation. The weights of the parallel convolutional network were initialized using the original CNN architecture weights, trained using an ImageNet dataset [47]. The Cohen Kappa score [48], [49] is used as the main method for evaluating our proposed classifier, and can be computed using the following equation:
Conclusion
We have presented a P-LSTM classifier combining parallel convolutional networks and an LSTM classifier for a single image vehicle classification problem. The parallel convolutional networks are used to extract the spatial pyramid features, with two levels of spatial pyramid region configurations. The LSTM network is used to search for dependencies between the features extracted in each convolutional network. We investigated several different modern CNN architectures as the basis of the parallel
References (54)
- et al.
A field trial of acoustic signature analysis for vehicle classification
Transport. Res. C: Emerg. Technol.
(1997) - et al.
Vehicle classification by acoustic signature
Math. Comput. Model.
(1998) - et al.
Recognizing vehicle classification information from blade sensor signature
Pattern Recognit. Lett.
(2007) - et al.
Vehicle classification in distributed sensor networks
J. Parallel Distribut. Comput.
(2004) - et al.
A model for fine-grained vehicle classification based on deep learning
Neurocomputing
(2017) - et al.
Action recognition from still images based on deep VLAD spatial pyramids
Signal Process.: Image Commun.
(2017) - T.M. of Transport of Canada, Canadian motor vehicle traffic collision statistics,...
On-line vehicle classification
IEEE Trans. Veh. Technol.
(1980)- M. Pursula, I. Kosonen, Microprocessor and pc-based vehicle classification equipments using induction loops, in: Road...
- M. Pursula, P. Pikkarainen, A neural network approach to vehicle classification with double inductionloops, in: 17TH...
A vehicle classification based on inductive loop detectors
Edge-based rich representation for vehicle classification
Automatic traffic surveillance system for vehicle tracking and classification
IEEE Trans. Intell. Transport. Syst.
Robust visual tracking and vehicle classification via sparse representation
IEEE Trans. Pattern Anal. Mach. Intell.
Road sign classification system using cascade convolutional neural network
Int J Innovative Comput, Inform Control
Spatial pyramid pooling in deep convolutional networks for visual recognition
IEEE Trans. Pattern Anal. Mach. Intell.
Object detection with discriminatively trained part-based models
IEEE Trans. Pattern Anal. Mach. Intell.
MIO-TCD: a new benchmark dataset for vehicle classification and localization
IEEE Trans. Image Process.
Dropout: a simple way to prevent neural networks from overfitting
J. Mach. Learn. Res.
Identity mappings in deep residual networks
Cited by (13)
Detection and classification of vehicles using audio visual cues
2023, Multimedia Tools and ApplicationsVehicle type classification using graph ant colony optimizer based stack autoencoder model
2022, Multimedia Tools and ApplicationsVehicle Type Classification Using Hybrid Features and a Deep Neural Network
2022, International Journal of Software Innovation
- ☆
This paper has been recommended for acceptance by Zicheng Liu.