Abstract:
A major challenge for autonomous vehicles is when an AV encounters a flagman who regulates traffic near a construction workzone area or at an automotive crash site. In su...Show MoreMetadata
Abstract:
A major challenge for autonomous vehicles is when an AV encounters a flagman who regulates traffic near a construction workzone area or at an automotive crash site. In such real-world scenarios, recognizing the flagman's gesture is an essential AV function that has not received much attention. A key piece of earlier work used a chroma-key (green background) screen to generate datasets and evaluate the accuracy of flagman gesture recognition. We find that this background-agnostic approach drops significantly in performance when the chroma-key background is replaced with images from real-world traffic scenarios. In this paper, we extend that baseline approach by adding contextual information and boosting the baseline's accuracy and robustness. First, we replace the chroma-key background to create virtually augmented (VA) images to capture real-world contexts. Next, we propose and evaluate three approaches: (1) Extract CNN features directly from the “raw” VA images, (2) Extracting CNN features after embedding flagman skeleton and prop information in the VA images, and (3) Utilize an attention mechanism via a transformer with padding masks. All three approaches outperform the baseline method. Notably, our Two-Stage Classifier with Transformer approach boosts the baseline's performance from an F-score of 32% to 80% in challenging VA traffic scenarios.
Date of Conference: 08-12 October 2022
Date Added to IEEE Xplore: 01 November 2022
ISBN Information: