Single image vehicle classification using pseudo long short-term memory classifier

https://doi.org/10.1016/j.jvcir.2018.09.021Get rights and content

Highlights

  • A pseudo-LSTM classifier by reasoning based on deep spatial pyramid features.

  • Multi-phase fine-tuning strategy for the training process.

  • A comprehensive performance evaluation on MIO-TCD vehicle classification dataset.

  • The best overall performances are obtained using ensemble pseudo-LSTM classifiers.

Abstract

In this paper, we propose a pseudo long short-term memory (LSTM) classifier for single image vehicle classification. The proposed pseudo-LSTM (P-LSTM) uses spatially divided images rather than time-series images. In other words, the proposed method considers the divided images to be time-series frames. The divided images are formed by cropping input images using two-level spatial pyramid region configuration. Parallel convolutional networks are used to extract the spatial pyramid features of the divided images. To explore the correlations between the spatial pyramid features, we attached an LSTM classifier to the end of the parallel convolutional network and treated each convolutional network as an independent timestamp. Although LSTM classifiers are typically used for time-dependent data, our experiments demonstrated that they can also be used for non-time-dependent data. We attached one fully connected layer to the end of the network to compute a final classification decision. Experiments on an MIO-TCD vehicle classification dataset show that our proposed classifier produces a high evaluation score and is comparable with several other state-of-the-art methods.

Introduction

Vehicles play a significant role in modern life, and many people use them on a daily basis. The increasing number of vehicles on roads has increased the risk of traffic incidents. The Canadian Council of Motor Transport Administrators, in their 2014 statistic report [1], noted that traffic accidents in Canada cause more than 1800 fatalities and more than 150,000 injuries. The situation forced the traffic safety division to implement a real-time surveillance system for road scene analysis. The aim of the real-time surveillance system is to understand the behavior of road users, including non-vehicle users such as cyclists and pedestrians. The combination of CCTV cameras with a video management system is the main configuration used to implement real-time traffic surveillance systems in the world today because it is cheaper and easier to implement than other sensors. Usually, traffic is manually analyzed by a control-room operator using real-time video data from CCTV cameras. To increase effectiveness, a system that can automatically analyze road traffic is required. One component of such a system is a vehicle recognition module which is very important in analyzing the behavior of road users. The vehicle recognition task is necessary to narrow down a target efficiently and effectively from a large number of vehicle categories. A vehicle recognition module usually consists of two different systems: a vehicle detection system and a vehicle classification system. The vehicle detection system is responsible for detecting the vehicle appearing in the CCTV video data, while the vehicle classification system is responsible for categorizing the vehicle. This paper discusses the vehicle classification system, by assuming that detection has already been achieved in previous steps and that the input of the vehicle classification system is the cropped region of the vehicle candidate.

The development of vehicle classification systems began in early 1980 with the use of induction loop sensors. Such sensors were used until approximately 2000, as described in [2], [3], [4], [5], [6]. One disadvantage of induction loop sensors is that they have very high implementation and maintenance costs compared with other approaches. Researchers have proposed other approaches for vehicle classification systems, including acoustic sensors [7], [8], blade sensors [9], wireless sensor networks [10], and vision sensors [11], [12], [13]. In recent years, convolutional neural network (CNN) classifiers have proven to be very effective for image classification tasks [14], [15], [16]. Additionally, CNNs classifier have been applied to vehicle classification problems, as described in [17], [18], [19], [20]. We briefly describe several vehicle classification approaches using CNN classifier in Section 2. One advantage of a vehicle image is geometric consistency because a vehicle is not a deformable object. This characteristic can be used as an additional clue for classifying the vehicle category. Some approaches have explored geometric features, including spatial pyramid pooling [21], [22] and deformable parts models [23].

In this paper, we propose a new pseudo long short-term memory (P-LSTM) classifier for single image vehicle classification problems. The P-LSTM classifier is designed based on spatial pyramid features and is composed of parallel convolutional networks and an LSTM classifier. The parallel convolutional networks aim to extract the spatial pyramid features, while the LSTM classifier attempts to identify the correlations between the features extracted from the parallel convolutional networks, by treating each convolutional network as an independent timestamp. Our contributions can be described as follows.

  • We proposed an end-to-end classifier (called P-LSTM) composed of parallel convolutional networks and an LSTM classifier. The P-LSTM works by exploring the interdependencies of the spatial pyramid features extracted from the input image.

  • We investigated several different CNN architectures for parallel convolutional networks, including AlexNet, SqueezeNet, and Resnet18. Experiments show that the Resnet18 CNN architecture is the most efficient architecture for our parallel convolutional networks.

  • We investigated LSTM and multilayer LSTM for searching for the correlations between the features extracted from parallel convolutional networks. Experiments show that by adding more layers to the LSTM network, the evaluation score increases.

The rest of the paper is organized as follows. In Section 2, we briefly discuss related work by various researchers who have addressed the vehicle classification problem. We describe the concept and explain the technical details of our proposed classifier in Section 3. Section 4 discusses experiments with our proposed classifier using an MIO-TCD vehicle classification dataset. In the last section, we summarize the experiments and potential future work for this research.

Section snippets

Related work

In this section, we briefly describe the relevant work on vehicle classification described in [17], [18], [19], [20]. All methods were developed based on deep learning methods and tested using the MIO-TCD vehicle classification dataset [24]. Table 1 shows a summary of several related works that use the same dataset.

Approach

In this section, we briefly describe the main concept of spatial pyramid matching theory and LSTM classifiers, followed by a detailed explanation of our proposed classifier. Our proposed classifier is designed based on spatial pyramid matching theory and an LSTM classifier.

Results and discussion

All training and testing processes were performed using a Caffe framework [46] and the results were uploaded to an MIO-TCD vehicle classification submission site [24] for evaluation. The weights of the parallel convolutional network were initialized using the original CNN architecture weights, trained using an ImageNet dataset [47]. The Cohen Kappa score [48], [49] is used as the main method for evaluating our proposed classifier, and can be computed using the following equation:κ=po-pe1-pe=1-1-

Conclusion

We have presented a P-LSTM classifier combining parallel convolutional networks and an LSTM classifier for a single image vehicle classification problem. The parallel convolutional networks are used to extract the spatial pyramid features, with two levels of spatial pyramid region configurations. The LSTM network is used to search for dependencies between the features extracted in each convolutional network. We investigated several different modern CNN architectures as the basis of the parallel

References (54)

  • C. Sun, An investigation in the use of inductive loop signatures for vehicle classification, California Partners for...
  • J. Gajda et al.

    A vehicle classification based on inductive loop detectors

  • X. Ma et al.

    Edge-based rich representation for vehicle classification

  • J.-W. Hsieh et al.

    Automatic traffic surveillance system for vehicle tracking and classification

    IEEE Trans. Intell. Transport. Syst.

    (2006)
  • X. Mei et al.

    Robust visual tracking and vehicle classification via sparse representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • R.F. Rachmadi et al.

    Road sign classification system using cascade convolutional neural network

    Int J Innovative Comput, Inform Control

    (2017)
  • K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, arXiv preprint...
  • H. Jung, M.-K. Choi, J. Jung, J.-H. Lee, S. Kwon, W. Young Jung, Resnet-based vehicle classification and localization...
  • J. Taek Lee, Y. Chung, Deep learning-based vehicle classification using an ensemble of local expert and global...
  • P.-K. Kim, K.-T. Lim, Vehicle type classification using bagging and convolutional neural network on multi view...
  • R. Theagarajan, F. Pala, B. Bhanu, Eden: Ensemble of deep networks for vehicle classification, in: The IEEE Conference...
  • S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene...
  • K. He et al.

    Spatial pyramid pooling in deep convolutional networks for visual recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • P.F. Felzenszwalb et al.

    Object detection with discriminatively trained part-based models

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • Z. Luo et al.

    MIO-TCD: a new benchmark dataset for vehicle classification and localization

    IEEE Trans. Image Process.

    (2018)
  • N. Srivastava et al.

    Dropout: a simple way to prevent neural networks from overfitting

    J. Mach. Learn. Res.

    (2014)
  • K. He et al.

    Identity mappings in deep residual networks

  • This paper has been recommended for acceptance by Zicheng Liu.

    View full text