Elsevier

Neurocomputing

Volume 501, 28 August 2022, Pages 123-134
Neurocomputing

Skeleton-based traffic command recognition at road intersections for intelligent vehicles

https://doi.org/10.1016/j.neucom.2022.05.107Get rights and content

Highlights

  • Pioneering research on traffic command recognition distinguishing directions and gestures.

  • A two-stage recognition model exploiting skeletal geometry and co-occurrence features.

  • A specialized dataset for recognizing Chinese traffic commands at road intersections.

Abstract

Understanding traffic officer commands is a fundamental perception task for intelligent vehicles in driver assistance and autonomous driving. Previous studies have emphasized explicit traffic command gesture recognition but have not considered situations where the traffic officer is controlling the subjects in other directions, which would also influence decision-making of the ego vehicle. To fill in the gap, this article aims to research visual skeleton-based recognition of traffic commands occurring at road intersections, where both command directions and gestures should be determined. Specifically, a two-stage recognition framework for four cross-shaped directions and eight command gestures is proposed. Two kinds of handcrafted features, including upper-body geometric features and keypoint co-occurrence features, are established with estimated 2D human keypoint coordinates and heatmaps and further combined into a deep learning network. The first stage handles human body orientation classification, while the second stage addresses command gesture recognition with extra usage of the output from the first stage. Combining the recognized body orientation and command gesture, the type of traffic command can ultimately be inferred. For training and validation, a dataset termed the Chinese Traffic Command at Intersections (CTCX) is built. The proposed method gains an outperforming edit accuracy of 89.67% on the CTCX test set, demonstrating its effectiveness. This work provides a foundation in this area and is expected to inspire more research on traffic command recognition with directions in the near future.

Introduction

Just as human drivers can understand regulated traffic command gestures when traffic officers are controlling, intelligent vehicles should be capable of recognizing them as well. Traffic command gesture recognition is a fundamental perception task in driver assistance and autonomous driving. This task is particularly critical under mixed traffic scenarios because it can inform drivers or vehicles of driving situations and improve safety.

Recent years have witnessed a revolution in methodology and database research on traffic command gesture recognition [1], [2], [3], [4], [5], owing to the advancement of deep learning in computer vision, especially in human action recognition. Being aware of the significance of automatically recognizing traffic commands for intelligent driving, a growing number of researchers have devoted attention to related studies. To date, the proposed methods and public databases have been increasing, and the gap between their scientific research and real applications has been narrowing. Previous studies mainly aimed to recognize explicit commands, such as eight kinds of Chinese regulated traffic command gestures. Generally, only the command gestures directed to the vehicle must be recognized. However, command gestures in other directions also influence the ego vehicle. For example, as shown in Fig. 1, when a driver notices that a traffic officer gestures for the vehicles coming in the crosswise direction to go straight, he/she is obligated to stop at the same time. Humans are able to decipher these seemly unrelated gestures. Supposing a vehicle could be informed of the controls to other vehicles, it would have a more comprehensive knowledge of the surrounding environment and tend to make correct decisions rather than continue driving by mistake. Therefore, it is necessary to make intelligent vehicles learn these associated skills.

To serve the understanding of implicit traffic commands, we specify an extended traffic command recognition task to obtain awareness of both command gestures and directions. As a startup, direction recognition is simplified as a four-direction classification task, which adapts to the circumstance of typical road intersections. High accuracy and great robustness are the primary objectives of this new task for the sake of on-board practical applications. To our knowledge, this work is the first to clearly describe the issue and aim to resolve it.

The human skeleton is a compact representation for action and has been a widely used modality in human action recognition [6], [7], traffic agent intent [8] and trajectory prediction [9], [10]. Previous works on traffic command gesture recognition have also illustrated the superiority of skeleton information usage [4], [11], [5], [12]. Considering the compactness, robustness and reusability of skeleton modality, we propose a two-stage recognition framework based on an estimated 2D skeleton. The discriminative features are generated by combining the handcrafted skeletal geometry and co-occurrence with deep learning. The architecture of the two stages is simple, and the output of the first stage continues to be fed into the second stage as a part of the features. We also build a dataset containing Chinese traffic command gesture instances in the four directions for training and validation. To summarize, this work makes contributions in three ways:

  • A pioneering study on traffic command recognition distinguishing directions and gestures is carried out to fill in research blanks, promoting the perception capability of intelligent vehicles.

  • With estimated 2D human skeletons, a two-stage recognition framework with discriminative features combined with handcrafted skeletal geometry and co-occurrence is presented to tackle the challenge. Our approach is characterized by a simple but effective structure and attains significant performance in the experiments.

  • An extended dataset termed “Chinese Traffic Command at Intersections” (CTCX) is established with videos of traffic command gestures in the four cross-shape directions for methodology studies. The experimental results verify the effectiveness of the proposed approach1.

The rest of the article is organized as follows. Section 2 introduces the related and illuminating works. Section 3 gives a specific definition of the proposed problem, and Section 4 explains the details of the presented approach. Section 5 illustrates the dataset establishment and the evaluation metrics, demonstrates the experimental results and comparisons, and discusses the influences of model settings and parameter selection, as well as generalization. Finally, conclusions are drawn in Section 6.

Section snippets

Related Work

Traffic command gesture recognition is a subcategory and practical application of human action recognition. Because of the specific onboard scenarios, the approaches based on signals of wearable sensors [13] or depth sensors [2], [3] are limited for real use. Moreover, it is difficult for RGB-based recognition models [14], [15], [1] to adapt to diverse scenes because the current data amount of traffic command gesture instances cannot guarantee the requirements of generalization. Recently, an

Problem Formulation

When human drivers notice a traffic officer directing traffic at road intersections, normally the first aim is to identify which direction the officer is controlling and then to recognize the meaning of the gesture. In this paper, considering the case of intersections, the command directions are modeled in terms of an orientation set including self, left, opposite and right. In line with the Chinese traffic rules, there are eight standard traffic control gestures including stop, go straight,

Methodology

As illustrated in Fig. 2, our proposed method is basically a two-stage framework sharing a partially similar network architecture. Human pose estimation is conducted as data preprocessing to extract skeleton information. Then, the spatial process is launched to generate the upper-body geometric features and keypoint co-occurrence features for the human instance at each frame. In the temporal process, LSTM layers are used to further process the spatial features across the frames. At the last

Experiments

This section demonstrates the experiments for validating the proposed method quantitatively and qualitatively. The dataset, evaluation metrics and implementation details are introduced successively, and the experimental results and discussion are presented at the end.

Conclusion

This article, which aims at Chinese traffic command recognition at road intersections, is the first to clarify the issue of implicit traffic commands and formulates the problem as an associated task of orientation classification and gesture recognition. We propose a two-stage recognition framework that takes advantage of a concise LSTM-based network and combines handcrafted features with deep learning. The upper-body geometric features make use of only seven human keypoints but show remarkable

CRediT authorship contribution statement

Sijia Wang: Conceptualization, Methodology, Investigation, Writing - original draft. Kun Jiang: Data curation, Resources, Project administration. Junjie Chen: Formal analysis, Writing - review & editing. Mengmeng Yang: Data curation, Writing - review & editing. Zheng Fu: Validation, Visualization. Tuopu Wen: Methodology, Software. Diange Yang: Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported by the National Natural Science Foundation of China (Grant Nos. U1864203, 52102396, and 52102464), and Sharing-Van Automatic Driving Development Project (Grant No. HT20082302).

Sijia Wang received her B.S. degree in automotive engineering from Tsinghua University, Beijing, China in 2017. She is currently working toward the Ph.D. degree at School of Vehicle and Mobility, Tsinghua University, Beijing, China. Her research interests include pose estimation and activity recognition of vulnerable road users for autonomous driving.

References (42)

  • Z. Fang et al.

    Intention recognition of pedestrians and cyclists by 2d pose estimation

    IEEE Transactions on Intelligent Transportation Systems

    (2019)
  • R.Q. Mínguez et al.

    Pedestrian path, pose, and intention prediction through gaussian process dynamical models and pedestrian activity recognition

    IEEE Transactions on Intelligent Transportation Systems

    (2018)
  • J. Liang et al.

    Peeking into the future: Predicting future person activities and locations in videos

  • Z. Fang et al.

    Traffic police gesture recognition by pose graph convolutional networks

  • T. Yuan et al.

    Accelerometer-based chinese traffic police gesture recognition system

    Chinese Journal of Electronics

    (2010)
  • F. Guo, J. Tang, C. Zhu, Gesture recognition for chinese traffic police, in: International Conference on Virtual...
  • Z. Cai et al.

    Max-covering scheme for gesture recognition of chinese traffic police

    Pattern Analysis and Applications

    (2015)
  • X. Xiong et al.

    Traffic police gesture recognition based on gesture skeleton extractor and multichannel dilated graph convolution network

    Electronics

    (2021)
  • T.L. Munea et al.

    The progress of human pose estimation: A survey and taxonomy of models applied in 2d human pose estimation

    IEEE Access

    (2020)
  • S. Gupta et al.

    Conventionalized gestures for the interaction of people in traffic with autonomous vehicles

  • F. Guo et al.

    Chinese traffic police gesture recognition in complex scene

  • Cited by (5)

    • Real-Time Visual Recognition of Ramp Hand Signals for UAS Ground Operations

      2023, Journal of Intelligent and Robotic Systems: Theory and Applications
    • Traffic Control Gesture Recognition Based on Masked Decoupling Adaptive Graph Convolution Network

      2023, CICTP 2023: Emerging Data-Driven Sustainable Technological Innovation in Transportation - Proceedings of the 23rd COTA International Conference of Transportation Professionals
    • A YOLO-based Method for Improper Behavior Predictions

      2023, Proceedings of IEEE InC4 2023 - 2023 IEEE International Conference on Contemporary Computing and Communications

    Sijia Wang received her B.S. degree in automotive engineering from Tsinghua University, Beijing, China in 2017. She is currently working toward the Ph.D. degree at School of Vehicle and Mobility, Tsinghua University, Beijing, China. Her research interests include pose estimation and activity recognition of vulnerable road users for autonomous driving.

    Kun Jiang received his B.S. degree in mechanical and automation engineering from Shanghai Jiao Tong University, China in 2011. Then, he received his master’s degree in the mechatronics system and his Ph.D. degree in information and systems technologies from the University of Technology of Compiègne (UTC), Compiègne, France, in 2013 and 2016, respectively. He is an assistant research professor at the School of Vehicle and Mobility of Tsinghua University, Beijing, China. His research interests include autonomous vehicles, high-precision digital maps, and sensor fusion.

    Junjie Chen received his Ph.D. degree in traffic information engineering and control from Beijing Jiaotong University in 2020. He was a research assistant at Carnegie Mellon University (CMU), Pittsburgh, PA, USA, from 2018 to 2020. He currently holds a postdoctoral position at Tsinghua University, Beijing, China. His research interests include nonparametric Bayesian learning, platoon operation control, and recognition and application of human driving characteristics.

    Mengmeng Yang received her Ph.D. degree in photogrammetry and remote sensing from Wuhan University, Hubei, China in 2018. She is now conducting postdoctoral research at the School of Vehicle and Mobility of Tsinghua University, Beijing, China. Her research interests include high-definition maps for autonomous driving.

    Zheng Fu received her master’s degree in pattern recognition and intelligent systems from Nanjing University of Posts and Telecommunications, Jiangsu, China, in 2019. She is pursuing the Ph.D. degree at School of Vehicle and Mobility, Tsinghua University, Beijing, China. Her current research interests include human 3-D pose estimation and human intention analysis for autonomous driving.

    Tuopu Wen received his B.S. degree from Electronic Engineering, Tsinghua University, Beijing, China in 2018. He is currently working toward the Ph.D. degree at School of Vehicle and Mobility of Tsinghua University, Beijing, China. His research interests include computer vision, high definition maps, and high precision localization for autonomous driving.

    Diange Yang is a professor at the School of Vehicle and Mobility of Tsinghua University. He received his B.S. and Ph.D. from Tsinghua University in 1996 and 2001, respectively. His research work mainly focuses on intelligent connected vehicles and autonomous driving. He has published over 120 articles, registered more than 60 national patents, and authored over 10 software copyrights. He has received numerous awards during his career, including the Distinguished Young Science Technology talent of Chinese Automobile Industry in 2011 and the Excellent Young Scientist of Beijing in 2010. He was also the recipient of the Second Prize of National Technology Invention Rewards of China in 2010 and in 2013.

    View full text