Skeleton-based traffic command recognition at road intersections for intelligent vehicles
Introduction
Just as human drivers can understand regulated traffic command gestures when traffic officers are controlling, intelligent vehicles should be capable of recognizing them as well. Traffic command gesture recognition is a fundamental perception task in driver assistance and autonomous driving. This task is particularly critical under mixed traffic scenarios because it can inform drivers or vehicles of driving situations and improve safety.
Recent years have witnessed a revolution in methodology and database research on traffic command gesture recognition [1], [2], [3], [4], [5], owing to the advancement of deep learning in computer vision, especially in human action recognition. Being aware of the significance of automatically recognizing traffic commands for intelligent driving, a growing number of researchers have devoted attention to related studies. To date, the proposed methods and public databases have been increasing, and the gap between their scientific research and real applications has been narrowing. Previous studies mainly aimed to recognize explicit commands, such as eight kinds of Chinese regulated traffic command gestures. Generally, only the command gestures directed to the vehicle must be recognized. However, command gestures in other directions also influence the ego vehicle. For example, as shown in Fig. 1, when a driver notices that a traffic officer gestures for the vehicles coming in the crosswise direction to go straight, he/she is obligated to stop at the same time. Humans are able to decipher these seemly unrelated gestures. Supposing a vehicle could be informed of the controls to other vehicles, it would have a more comprehensive knowledge of the surrounding environment and tend to make correct decisions rather than continue driving by mistake. Therefore, it is necessary to make intelligent vehicles learn these associated skills.
To serve the understanding of implicit traffic commands, we specify an extended traffic command recognition task to obtain awareness of both command gestures and directions. As a startup, direction recognition is simplified as a four-direction classification task, which adapts to the circumstance of typical road intersections. High accuracy and great robustness are the primary objectives of this new task for the sake of on-board practical applications. To our knowledge, this work is the first to clearly describe the issue and aim to resolve it.
The human skeleton is a compact representation for action and has been a widely used modality in human action recognition [6], [7], traffic agent intent [8] and trajectory prediction [9], [10]. Previous works on traffic command gesture recognition have also illustrated the superiority of skeleton information usage [4], [11], [5], [12]. Considering the compactness, robustness and reusability of skeleton modality, we propose a two-stage recognition framework based on an estimated 2D skeleton. The discriminative features are generated by combining the handcrafted skeletal geometry and co-occurrence with deep learning. The architecture of the two stages is simple, and the output of the first stage continues to be fed into the second stage as a part of the features. We also build a dataset containing Chinese traffic command gesture instances in the four directions for training and validation. To summarize, this work makes contributions in three ways:
- •
A pioneering study on traffic command recognition distinguishing directions and gestures is carried out to fill in research blanks, promoting the perception capability of intelligent vehicles.
- •
With estimated 2D human skeletons, a two-stage recognition framework with discriminative features combined with handcrafted skeletal geometry and co-occurrence is presented to tackle the challenge. Our approach is characterized by a simple but effective structure and attains significant performance in the experiments.
- •
An extended dataset termed “Chinese Traffic Command at Intersections” (CTCX) is established with videos of traffic command gestures in the four cross-shape directions for methodology studies. The experimental results verify the effectiveness of the proposed approach1.
The rest of the article is organized as follows. Section 2 introduces the related and illuminating works. Section 3 gives a specific definition of the proposed problem, and Section 4 explains the details of the presented approach. Section 5 illustrates the dataset establishment and the evaluation metrics, demonstrates the experimental results and comparisons, and discusses the influences of model settings and parameter selection, as well as generalization. Finally, conclusions are drawn in Section 6.
Section snippets
Related Work
Traffic command gesture recognition is a subcategory and practical application of human action recognition. Because of the specific onboard scenarios, the approaches based on signals of wearable sensors [13] or depth sensors [2], [3] are limited for real use. Moreover, it is difficult for RGB-based recognition models [14], [15], [1] to adapt to diverse scenes because the current data amount of traffic command gesture instances cannot guarantee the requirements of generalization. Recently, an
Problem Formulation
When human drivers notice a traffic officer directing traffic at road intersections, normally the first aim is to identify which direction the officer is controlling and then to recognize the meaning of the gesture. In this paper, considering the case of intersections, the command directions are modeled in terms of an orientation set including self, left, opposite and right. In line with the Chinese traffic rules, there are eight standard traffic control gestures including stop, go straight,
Methodology
As illustrated in Fig. 2, our proposed method is basically a two-stage framework sharing a partially similar network architecture. Human pose estimation is conducted as data preprocessing to extract skeleton information. Then, the spatial process is launched to generate the upper-body geometric features and keypoint co-occurrence features for the human instance at each frame. In the temporal process, LSTM layers are used to further process the spatial features across the frames. At the last
Experiments
This section demonstrates the experiments for validating the proposed method quantitatively and qualitatively. The dataset, evaluation metrics and implementation details are introduced successively, and the experimental results and discussion are presented at the end.
Conclusion
This article, which aims at Chinese traffic command recognition at road intersections, is the first to clarify the issue of implicit traffic commands and formulates the problem as an associated task of orientation classification and gesture recognition. We propose a two-stage recognition framework that takes advantage of a concise LSTM-based network and combines handcrafted features with deep learning. The upper-body geometric features make use of only seven human keypoints but show remarkable
CRediT authorship contribution statement
Sijia Wang: Conceptualization, Methodology, Investigation, Writing - original draft. Kun Jiang: Data curation, Resources, Project administration. Junjie Chen: Formal analysis, Writing - review & editing. Mengmeng Yang: Data curation, Writing - review & editing. Zheng Fu: Validation, Visualization. Tuopu Wen: Methodology, Software. Diange Yang: Supervision, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was supported by the National Natural Science Foundation of China (Grant Nos. U1864203, 52102396, and 52102464), and Sharing-Van Automatic Driving Development Project (Grant No. HT20082302).
Sijia Wang received her B.S. degree in automotive engineering from Tsinghua University, Beijing, China in 2017. She is currently working toward the Ph.D. degree at School of Vehicle and Mobility, Tsinghua University, Beijing, China. Her research interests include pose estimation and activity recognition of vulnerable road users for autonomous driving.
References (42)
- et al.
Visual recognition of traffic police gestures with convolutional pose machine and handcrafted features
Neurocomputing
(2020) - et al.
Simple but effective: Upper-body geometric features for traffic command gesture recognition
IEEE Transactions on Human-Machine Systems (Early Access)
(2021) - et al.
Deep 3d human pose estimation: A review
Computer Vision and Image Understanding
(2021) - et al.
Appearance based pedestrians’ head pose and body orientation estimation using deep learning
Neurocomputing
(2018) - et al.
An online approach for gesture recognition toward real-world applications
- et al.
Gesture recognition of traffic police based on static and dynamic descriptor fusion
Multimedia Tools and Applications
(2017) - et al.
Traffic command gesture recognition for virtual urban scenes based on a spatiotemporal convolution neural network
ISPRS International Journal of Geo-Information
(2018) - et al.
Traffic control gesture recognition for autonomous vehicles
- B. Ren, M. Liu, R. Ding, H. Liu, A survey on 3d skeleton-based action recognition using learning method, arXiv preprint...
- et al.
Human activity recognition process using 3-d posture data
IEEE Transactions on Human-Machine Systems
(2015)
Intention recognition of pedestrians and cyclists by 2d pose estimation
IEEE Transactions on Intelligent Transportation Systems
Pedestrian path, pose, and intention prediction through gaussian process dynamical models and pedestrian activity recognition
IEEE Transactions on Intelligent Transportation Systems
Peeking into the future: Predicting future person activities and locations in videos
Traffic police gesture recognition by pose graph convolutional networks
Accelerometer-based chinese traffic police gesture recognition system
Chinese Journal of Electronics
Max-covering scheme for gesture recognition of chinese traffic police
Pattern Analysis and Applications
Traffic police gesture recognition based on gesture skeleton extractor and multichannel dilated graph convolution network
Electronics
The progress of human pose estimation: A survey and taxonomy of models applied in 2d human pose estimation
IEEE Access
Conventionalized gestures for the interaction of people in traffic with autonomous vehicles
Chinese traffic police gesture recognition in complex scene
Cited by (5)
Modified online sequential extreme learning machine algorithm using model predictive control approach
2023, Intelligent Systems with ApplicationsReal-Time Visual Recognition of Ramp Hand Signals for UAS Ground Operations
2023, Journal of Intelligent and Robotic Systems: Theory and ApplicationsTraffic Control Gesture Recognition Based on Masked Decoupling Adaptive Graph Convolution Network
2023, CICTP 2023: Emerging Data-Driven Sustainable Technological Innovation in Transportation - Proceedings of the 23rd COTA International Conference of Transportation ProfessionalsA YOLO-based Method for Improper Behavior Predictions
2023, Proceedings of IEEE InC4 2023 - 2023 IEEE International Conference on Contemporary Computing and Communications
Sijia Wang received her B.S. degree in automotive engineering from Tsinghua University, Beijing, China in 2017. She is currently working toward the Ph.D. degree at School of Vehicle and Mobility, Tsinghua University, Beijing, China. Her research interests include pose estimation and activity recognition of vulnerable road users for autonomous driving.
Kun Jiang received his B.S. degree in mechanical and automation engineering from Shanghai Jiao Tong University, China in 2011. Then, he received his master’s degree in the mechatronics system and his Ph.D. degree in information and systems technologies from the University of Technology of Compiègne (UTC), Compiègne, France, in 2013 and 2016, respectively. He is an assistant research professor at the School of Vehicle and Mobility of Tsinghua University, Beijing, China. His research interests include autonomous vehicles, high-precision digital maps, and sensor fusion.
Junjie Chen received his Ph.D. degree in traffic information engineering and control from Beijing Jiaotong University in 2020. He was a research assistant at Carnegie Mellon University (CMU), Pittsburgh, PA, USA, from 2018 to 2020. He currently holds a postdoctoral position at Tsinghua University, Beijing, China. His research interests include nonparametric Bayesian learning, platoon operation control, and recognition and application of human driving characteristics.
Mengmeng Yang received her Ph.D. degree in photogrammetry and remote sensing from Wuhan University, Hubei, China in 2018. She is now conducting postdoctoral research at the School of Vehicle and Mobility of Tsinghua University, Beijing, China. Her research interests include high-definition maps for autonomous driving.
Zheng Fu received her master’s degree in pattern recognition and intelligent systems from Nanjing University of Posts and Telecommunications, Jiangsu, China, in 2019. She is pursuing the Ph.D. degree at School of Vehicle and Mobility, Tsinghua University, Beijing, China. Her current research interests include human 3-D pose estimation and human intention analysis for autonomous driving.
Tuopu Wen received his B.S. degree from Electronic Engineering, Tsinghua University, Beijing, China in 2018. He is currently working toward the Ph.D. degree at School of Vehicle and Mobility of Tsinghua University, Beijing, China. His research interests include computer vision, high definition maps, and high precision localization for autonomous driving.
Diange Yang is a professor at the School of Vehicle and Mobility of Tsinghua University. He received his B.S. and Ph.D. from Tsinghua University in 1996 and 2001, respectively. His research work mainly focuses on intelligent connected vehicles and autonomous driving. He has published over 120 articles, registered more than 60 national patents, and authored over 10 software copyrights. He has received numerous awards during his career, including the Distinguished Young Science Technology talent of Chinese Automobile Industry in 2011 and the Excellent Young Scientist of Beijing in 2010. He was also the recipient of the Second Prize of National Technology Invention Rewards of China in 2010 and in 2013.