A novel video based system for detecting and counting vehicles at user-defined virtual loops

https://doi.org/10.1016/j.eswa.2014.09.045Get rights and content

Highlights

  • We present a system for detecting and counting vehicles in urban traffic videos.

  • The detected and tracked vehicles are counted at user defined virtual loops.

  • A background model is defined using Mixtures of Gaussians and Motion Energy Images.

  • Vehicle detection and tracking relies on a flexible particle clustering scheme.

  • The method potentially can be reliable based on experiments and method comparisons.

Abstract

This paper presents a new system for detecting and counting vehicles in urban traffic videos at user-defined virtual loops. The proposed method uses motion coherence and spatial adjacency to group sampling particles in urban video sequences. A foreground mask is created using Gaussian Mixture Models and Motion Energy Images to determine the preferable locations that the particles must sample, and the convex particle groups are then analyzed to detect the vehicles. After a vehicle is detected, it is tracked using the similarity of its colors in adjacent frames. The vehicles are counted in user-defined virtual loops, by detecting the intersections of the tracked vehicles with these virtual loops. The experimental results based on different traffic videos, with a total of 80,000 video frames, suggest that our approach potentially can be more reliable than comparable methods available in the literature.

Introduction

Traffic management can generate many benefits to drivers, pedestrians, governments and to the environment. Information about the traffic conditions can be used in several ways, such as to synchronize traffic lights, assist drivers in the selection of routes, and assist governments in planning the traffic system expansion and on building new roads. Drivers benefit with less time spent in urban and road traffic, resulting in economy and better quality of life. Governments acquire data for designing better solutions for the urban and road traffic, and the environment benefits from the reduction in the emission of pollutants, resulting from an optimized flow of vehicles.

Conventional techniques for measuring traffic flow such as inductive loops, sonar or microwave detectors, have disadvantages such as the installation cost, traffic disruption during installation or maintenance, and usually these methods are unable to detect slow or static vehicles (Mandellos, Keramitsoglou, & Kiranoudis, 2011).

The recent improvements in sensor and communication technology systems allow local transport authorities to monitor closely the conditions of the urban transport systems, promoting the development of a wide variety of techniques for monitoring the traffic flow, and collecting data on the traffic flow characteristics (Cho, Quek, Seah, & Chong, 2009).

The use of image-based sensors and computer vision techniques for data acquisition on the traffic of vehicles have been intensely investigated in the recent years, since traffic videos provide more information about the traffic of vehicles than other classes of sensors (e.g. inductive loops, sonar or microwave detectors), and sometimes such video based systems can expand their monitoring capabilities by taking advantage of the video cameras already installed on site (Tian, Yao, Gu, Wang, & Li, 2011). Moreover, video based systems are easy to install, can be easily upgraded since they offer the possibility of redesigning the system and its functionalities by upgrading the installed algorithms. Among several possible applications of these video-based systems, they can be used for counting and classifying vehicles, measuring the speed of vehicles, and for the identification of traffic incidents (Mandellos et al., 2011).

Therefore, the current technological trend in the area of traffic monitoring is oriented towards video based systems, since video sensors have relatively low maintenance costs and allow detecting and counting vehicles in a non-intrusive way. Besides, there are several applications demanding traffic video surveillance nowadays, such as: providing essential traffic and travel information to drivers so road safety and traffic efficiency can be improved (Cheng, Gau, Huang, & Hwang, 2012), detecting pedestrians in intelligent transportation systems (ITS), providing traffic data to safety driving assistance systems (SDASs) (Guo, Ge, Zhang, Li, & Zhao, 2012), vehicle overtaking (Milanés et al., 2012), detecting and extracting vehicles in traffic surveillance scenarios (Mandellos et al., 2011), and counting vehicles and/or detecting traffic incidents (Cho et al., 2009).

Currently, there are several methods for detecting, tracking and counting vehicles in traffic videos. Generally, these methods start by separating the static part of the scene (background), from the non-static part of the scene (foreground) where the moving objects of interest are usually found (i.e. moving vehicles) (Tian et al., 2011). Various techniques can be used to segment the background and the foreground. The subtraction of a static background model from each video frame is often used. This background model can be obtained by using simple methods, such as the average pixel intensities in a set of frames (Lai & Yung, 1998), or by more elaborate methods such as building Gaussian Mixtures Models for each background pixel (Stauffer & Grimson, 1999), by background reconstruction (Mandellos et al., 2011), or yet by determining the optimal threshold for foreground–background segmentation and object detection (Karasulu & Korukoglu, 2012). However, often is challenging for background subtraction methods to deal with noise, illumination changes, occlusions and the splitting of multiple objects that have been incorrectly merged by the foreground segmentation process. Other approaches, like pixel-by-pixel differences between two or more adjacent frames also have been used to detect the objects of interest, since this method is more robust to illumination variations than background subtraction, but using this approach only the objects moving against a static background can be detected (Cucchiara, Piccardi, & Mello, 2000). To avoid incorrectly merging spatially close vehicles (e.g. in cast shadows situations), shading removal has been investigated to help improve the vehicle identification when cast shadows are present (Zhong & Junping, 2008).

There are several methods for detecting and tracking moving vehicles (Tian et al., 2011). Often the approaches used for detecting targets (i.e. vehicles) are model-based methods that use prior knowledge to detect the desired targets (Lai et al., 2010, Shen, 2008), or deformable templates which are used when targets are matched against known vehicle models in the video frames (Takeuchi, Mita, & McAllester, 2010), or yet methods that rely on simpler features such as corners and edges (Tu, Xu, & Zhou, 2008). The identified targets (vehicles) often are tracked using approaches such as mean-shift (Bouttefroy, Bouzerdoum, Phung, & Beghdadi, 2008), Kalman filtering (Xie, Zhu, Wang, Xu, & Zhang, 2005), or yet particle filtering (Scharcanski, de Oliveira, Cavalcanti, & Yari, 2011). Different schemes have been proposed for vehicle counting, such as incrementing a vehicle counter when new vehicles are detected in a video scene (Sánchez, Suarez, Conci, & de Oliveira Nunes, 2011), or by incrementing a vehicle counter only when the tracked vehicles are on pre-defined virtual loops (Tseng, Lin, & Smith, 2002), or yet by counting new vehicles passing at user-defined virtual loops without previously tracking these vehicles (Purnama, Zaini, Putra, & Hariadi, 2009).

Despite the recent advances, still there are challenging issues in vehicle detection and tracking, such as: (a) detecting accurately the foreground, specially when there are rapid changes in background lighting or imaging artifacts; (b) identifying the vehicles to be tracked when there are multiple vehicles in the scene; and (c) tracking vehicles in occlusion situations, specially when a vehicle being tracked is partially (or completely) occluded by other vehicles or obstacles. In the present work, we try to address the first two challenges, and restrict ourselves to the cases where the camera positioning minimizes vehicle occlusions.

The proposed method improves on the scheme presented by Bouvie, Scharcanski, Barcellos, and Escouto (2013) by providing a new segmentation of the moving vehicles against the background (road or street), which tends to be robust to artifacts in traffic videos, leading to less vehicle counting errors. The approach used to estimate the background in Bouvie et al. (2013) uses a simple temporal median, which has limitations when the scene illumination changes abruptly. In the present work, we use of a background model based on Mixtures of Gaussians, that is more robust to scene illumination changes, improving background and vehicles detection even in adverse conditions. Vehicle tracking is performed by a particle filtering method that is significantly more robust than the approach proposed in Bouvie et al. (2013), as the comparative experimental results indicate. Vehicle counting is performed by detecting the intersection of the tracked particle groups (i.e. moving vehicles) with a set of user-defined virtual loops. The experimental comparisons with methods representative of the state-of-the-art (Bouvie et al., 2013, Kim, 2008, Sánchez et al., 2011, Yuan et al., 2013) suggest that the proposed approach can achieve more accurate results in terms of vehicle detection and counting, while better handling challenging vehicle tracking issues, such as tracking long vehicles which other methods tend to divide into smaller moving objects, leading to inaccuracies in vehicle counting.

This paper is organized as follows: Section 2 presents our proposed vehicle tracking method, Section 3 presents and discusses the obtained experimental results, and finally Section 4 concludes with our final remarks.

Section snippets

Our proposed vehicle detection and counting method

In order to reduce the number of pixels that must be processed, we sub-sample video frames using particles (see Section 2.1). Therefore, particles belonging to the same vehicle are assumed to be: (a) spatially coherent, i.e. particles associated with the same vehicle must be spatially close to each other, and groups of particles must be distant from each other if associated to different vehicles; (b) temporally coherent, meaning that particles associated to a vehicle appearing in a given frame

Experimental results

Our goal is to count tracked vehicles on user-defined virtual loops. In order to validate the proposed method, we compared the obtained results in terms of vehicle counts with other approaches representative of the state of the art, such as the methods proposed by Kim, 2008, Bouvie et al., 2013 that also use particle filtering, the method proposed by Sánchez et al. (2011) that does not use particle filtering to detect moving vehicles, and the method proposed by Yuan et al. (2013) that relies on

Conclusion

This paper presents a new method for detecting and counting vehicles on urban traffic video sequences. The proposed method uses a particle filtering approach to measure the sampling particles motion coherence and spatial adjacency, and associates groups of sampling particles to moving vehicles locations in urban video sequences. Moving vehicles are detected when the groups of sampling particles have convex shapes, and the group members (i.e. moving particles) are persistent and show similar

Acknowledgment

This work was supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Brazil.

References (34)

  • D. Comaniciu et al.

    Real-time tracking of non-rigid objects using mean shift

  • R. Cucchiara et al.

    Image analysis and rule-based reasoning for a traffic monitoring system

    IEEE Transactions on Intelligent Transportation Systems

    (2000)
  • J.W. Davis et al.

    The representation and recognition of human movement using temporal templates

  • R.C. Gonzalez et al.

    Digital image processing

    (2002)
  • KaewTraKulPong, P., & Bowden, R. (2002). An improved adaptive background mixture model for real-time tracking with...
  • T. Kanungo et al.

    An efficient k-means clustering algorithm: Analysis and implementation

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2002)
  • Z. Kim

    Real time object tracking based on dynamic feature grouping with background subtraction

  • Cited by (51)

    • DSPNet: Deep scale purifier network for dense crowd counting

      2020, Expert Systems with Applications
      Citation Excerpt :

      By determining the number of pedestrians in such scenes and subsequently taking effective measures, some tragedies may be entirely avoided. Furthermore, counting semantic features can be extended to other important domains, including medical and biological image processing (Lempitsky & Zisserman, 2010), traffic monitoring (Barcellos, Bouvié, Escouto, & Scharcanski, 2015; De Almeida, Oliveira, Britto Jr, Silva Jr, & Koerich, 2015), and wildlife census (Laradji, Rostamzadeh, Pinheiro, Vazquez, & Schmidt, 2018). As a well-established problem in computer vision, crowd counting has plagued researchers with many challenges over the last few years.

    • Intelligent traffic analysis system for Indian road conditions

      2022, International Journal of Information Technology (Singapore)
    • Lane-Level Vehicle Counting Based on V2X and Centimeter-level Positioning at Urban Intersections

      2022, International Journal of Intelligent Transportation Systems Research
    View all citing articles on Scopus

    The authors thank CNPq – Conselho Nacional de Desenvolvimento Científico e Tecnológico, and CAPES – Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, Brazil and DIGICON for funding this project.

    View full text