A real-time vehicle detection and a novel vehicle tracking systems for estimating and monitoring traffic flow on highways

https://doi.org/10.1016/j.aei.2021.101393Get rights and content

Abstract

Real-time highway traffic monitoring systems play a vital role in road traffic management, planning, and preventing frequent traffic jams, traffic rule violations, and fatal road accidents. These systems rely entirely on online traffic flow info estimated from time-dependent vehicle trajectories. Vehicle trajectories are extracted from vehicle detection and tracking data obtained by processing road-side camera images. General-purpose object detectors including Yolo, SSD, EfficientNet have been utilized extensively for real-time object detection task, but, in principle, Yolo is preferred because it provides a high frame per second (FPS) performance and robust object localization functionality. However, this algorithm’s average vehicle classification accuracy is below 57%, which is insufficient for traffic flow monitoring. This study proposes improving the vehicle classification accuracy of Yolo, and developing a novel bounding box (Bbox)-based vehicle tracking algorithm. For this purpose, a new vehicle dataset is prepared by annotating 7216 images with 123831 object patterns collected from highway videos. Nine machine learning-based classifiers and a CNN-based classifier were selected. Next, the classifiers were trained via the dataset. One out of ten classifiers with the highest accuracy was selected to combine to Yolo. This way, the classification accuracy of the Yolo-based vehicle detector was increased from 57% to 95.45%. Vehicle detector 1 (Yolo) and vehicle detector 2 (Yolo + best classifier), and the Kalman filter-based tracking as vehicle tracker 1 and the Bbox-based tracking as vehicle tracker 2 were applied to the categorical/total vehicle counting tasks on 4 highway videos. The vehicle counting results show that the vehicle counting accuracy of the developed approach (vehicle detector 2 + vehicle tracker 2) was improved by 13.25% and this method performed better than the other 3 vehicle counting systems implemented in this study.

Introduction

Classification of vehicles (such as cars, trucks, buses, motorbikes, or bicycles) on urban roads/highways, and estimating statistical traffic flow information (for example, flow frequency of vehicles and determining the numbers of which vehicle types go in what direction) are an important input in urban/highway traffic analysis and planning tools [1]. However, real-time highway traffic flow monitoring is the challenging issue of urban areas in this modern age of growing technology and population. Poor road/highway traffic management results in frequent traffic jams, traffic rules violations, and fatal road accidents. Using traditional techniques (RADAR, LIDAR, RFID, or LASAR) to address this problem is time-consuming, expensive, and tedious [2]. In cases where such sensors are insufficient, human observers go to the region and count vehicles pass through. After all, this is not a practical solution, and these methods cannot generate real-time traffic flow information. And they are insufficient in vehicle classification or in obtaining information such as the number of vehicles by their types and moving directions [3]. However, recent artificial intelligence (computer vision) approaches, especially deep and machine learning-based image processing techniques, which are among modern data processing methods, are used for online video processing systems. These systems generally contain vehicle detection, tracking (by associating best-matched vehicle peers in successive frames), time-dependent trajectory extraction and traffic flow information estimation units. Traffic flow information includes speed, categorical or total number of vehicles, vehicles’ entry and exit points to the specified area and the period of time between the entry and exit points. All this information is extracted from the vehicle trajectories via vehicle detection and tracking methods in their core. Thus, robust and high-performance vehicle detection and tracking algorithms are critical for these kinds of systems, [4], [5].

Vehicle detection is a technique for recognition the type of target objects and localizing them on a video frame. Object detection algorithms are usually divided into conventional machine learning and deep learning methods. Conventional detection techniques such as “Background Subtraction (BS) + Support Vector Machines (SVM)”, “BS + K-Nearest Neighbor (KNN)”, or other classical detection algorithms based on the Speeded Up Robust Features (SURF) or the Scale Invariant Feature Transform (SIFT) visual features, utilize manually and hand-crafted feature vectors. Thus, all these algorithms require deep knowledge and expertise to select the best representative features of the target objects [6]. It is a daunting task to manually determine the most effective and the most representative features that can perfectly describe the contents of objects in an image frame. Furthermore, the classical detection methods are very slow, which is insufficient for real-time traffic flow monitoring. On the other hand, deep learning methods don’t require any expertise and deep understanding of the contents of objects on an image because these approaches automatically extract deep and hidden features using deep neural nets (DNN). The DNNs use tens of hidden layers that contain linear or non-linear activation functions, so the DNNs have the capability to extract feature vectors from original images and learn to make accurate and optimal decisions [3].

Vehicle tracking is a technique for re-identifying the detected objects and associating them to the best-matched peers through consecutive frames. Pixel, shape, color, and bounding box (Bbox) information are widely used to trace the detected objects and to extract the object trajectories. Although pixel, shape, and color-based tracking methods are considered robust to track objects through successive frames, these methods are not sufficient for real-time video analysis applications since they are very slow. However, the methods including Kalman or Particle Filter tracking algorithms, which use bounding box info, are slightly faster than pixel, shape, or color-based approaches because they only process the coordinate information of the detected objects. But, these methods also become short when the number of objects on a certain frame increases [7]. For instance, the Kalman and Particle Filter tracking algorithms struggle on highways where objects move very fast and the number of objects is more than 30 on a frame. Furthermore, vehicle detection and tracking have been challenging tasks of classical computer vision and image processing research because of the issues such as partial or full occlusion of objects, illusion, camera shaking, extremely high or low-quality picture, distinct weather conditions including rain, snow and wind that complicate the vehicle detection, tracking, and data association processes, and in some cases, these problems make such systems completely fail [8]. Due to these reasons, a robust and real-time tracking algorithm is paramount for the effective and efficient video analysis tools.

General-purpose object detection architectures such as Yolo, Single-Shot-Detector, EfficientNet are being widely used for the online vehicle detection tasks, however, among them, Yolo has clear advantages with its high fps rate(frame per second) and robust vehicle localization functionality. Nevertheless, this algorithm’s average vehicle classification accuracy is below 57%, which is not enough for traffic flow monitoring systems [9], [10], [11]. This study proposes improving the vehicle classification accuracy of the Yolo algorithm by combining a robust classification layer based on the accuracy results of 10 classification algorithms. Additionally, a novel Bbox-based vehicle tracking algorithm was developed in this study. For these purposes, a new vehicle dataset was prepared by annotating 7216 images with 123831 object patterns obtained directly from the selected road/highway videos [12]. Nine object classification algorithms plus a CNN-based classification method were trained via the dataset. One out of ten classifiers with the highest accuracy were selected to combine to Yolo. This way, the classification accuracy of the Yolo-based vehicle detection algorithm was increased from 57% to 95.45%. The flow-chart of the entire process is illustrated in Fig. 1. Besides, we have implemented the Kalman filter-based vehicle tracking. Then, Yolo and “Yolo + the best classifier” as vehicle detectors, and the Bbox-based and Kalman filter-based trackers as vehicle tracking algorithms were applied to the categorical and total vehicle counting tasks on 4 highways, see Fig. 2. The vehicle counting results show that vehicle counter 2 (Yolo + the best classifier + the Bbox tracker) performed with an average 13.25% better accuracy than vehicle counter 1 (Yolo + the Bbox tracker), and this approach performed better than other vehicle counting systems developed in this study. The contribution of the study is as follows:

  • (i)

    creating a new dataset by annotating 7216 images with 123831 object patterns collected directly from road/highway videos. Implementing nine classifiers and developing our CNN-based classifier, and training the classifiers via the created new vehicle dataset. Next, determining a classifier with the highest accuracy out of the ten classifiers, and developing real-time and high-accuracy vehicle detection system by combining the determined best classifier to Yolo,

  • (ii)

    developing a novel bounding-box-based vehicle tracking algorithm and the implementation of the Kalman filter-based tracking algorithm. Then, applying both vehicle detectors and trackers into the vehicle trajectory extraction task,

  • (iii)

    developing real-time traffic flow monitoring systems that can process up to 500 vehicles in a video frame simultaneously. The systems monitor the traffic flow by estimating the categorical and total numbers of vehicles using the extracted vehicle trajectories. The four case study highway videos were processed via the developed traffic flow monitoring systems. Then, the accuracy of the developed vehicle counting systems were compared based on the developed vehicle detectors and trackers.

Section snippets

Literature review

Extraction of valuable traffic flow information by analyzing video scenes is crucial for a vision-based highway/intersection monitoring and management system (a vision-based traffic flow monitoring and management system) since it enables dynamic intelligent transportation systems (ITS), which is an important component of smart (sustainable) cities. The system fully relies on vehicle recognition and extraction of vehicle trajectories. Recognizing vehicles and extraction of their trajectory data

Methodology

The developed highway traffic monitoring system consists of four main modules: vehicle detection, tracking, trajectory extraction, and traffic flow info estimation. In this study, the categorical and total number of vehicles were estimated as traffic flow information for 4 highway videos. These videos were processed via 2 vehicle detection approaches and 2 vehicle tracking algorithms. Vehicle detector 1 was based on the general-purpose weight model of the Yolo object detection algorithm, and

Training and test results of the classification systems

The accuracy results of ten classification algorithms are illustrated in Table 1. The weighted average accuracy were taken in this study since the distribution of vehicle numbers by their types are not equal or close to each other. The dataset includes 123831 images, each image contains only one object label. The dataset was split into train/test parts 75/25 ratio, 92873/30958 object images, respectively. In the test set the distribution of the numbers of vehicles by the vehicle types is 23518

Conclusion

Real-time traffic flow data extraction system has been developed by processing ordinary camera images using vehicle detection and tracking algorithms. On four highway videos, the categorical and total number of vehicles were estimated with two vehicle counting systems. Vehicle counting 1 and 2 were developed via vehicle detection 1 and 2, respectively. Vehicle detection 1 was built on the general-purpose weight model of Yolo, and vehicle detection 2 was developed by combining a CNN-based

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This research was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under the Grant No:119E077 and Title: “Development of a Customized Traffic Planning System for Sakarya City by Processing Multiple Camera Images with Convolutional Neural Networks (CNN) and Machine Learning Techniques”.

References (38)

  • P. Liu et al.

    Vehicle tracking based on shape information and inter-frame motion vector

    Comput. Electr. Eng.

    (2019)
  • D. Song et al.

    Multi-vehicle tracking with microscopic traffic flow model-based particle filtering

    Automatica

    (2019)
  • X. Xiao et al.

    A kalman filter algorithm for identifying track irregularities of railway bridges using vehicle dynamic responses

    Mech. Syst. Signal Process.

    (2020)
  • T. Yang et al.

    Online multi-object tracking combining optical flow and compressive tracking in markov decision process

    J. Vis. Commun. Image Represent.

    (2019)
  • J. Yang et al.

    Tracking multiple workers on construction sites using video cameras

    Adv. Eng. Inform.

    (2010)
  • S. Khan et al.

    An intelligent monitoring system of vehicles on highway traffic

  • Z. Zhao et al.

    Object detection with deep learning: A review

    IEEE Trans. Neural Netw. Learn. Syst.

    (2019)
  • N.K. Chauhan et al.

    A review on conventional machine learning vs deep learning

  • S.R.E. Datondji et al.

    A survey of vision-based traffic monitoring of road intersections

    IEEE Trans. Intell. Transp. Syst.

    (2016)
  • Cited by (63)

    View all citing articles on Scopus
    1

    Jahongir Azimjonov is a Ph.D researcher. He is studying computer vision/image processing, machine and deep learning methods on intelligent transportation systems.

    2

    Ahmet Özmen is a full professor at the Department of Software Engineering. His research interests are computer vision and system monitoring.

    View full text