Abstract:
A rapidly growing amount of content posted online inherently holds knowledge about concepts of interest, i.e. driver actions. We leverage methods at the intersection of v...Show MoreMetadata
Abstract:
A rapidly growing amount of content posted online inherently holds knowledge about concepts of interest, i.e. driver actions. We leverage methods at the intersection of vision and language to surpass costly annotation and present the first automated framework for anticipating driver intention by learning from recorded driving exam conversations. We query YouTube and collect a dataset of posted mock road tests comprising student-teacher dialogs and video data, which we use for learning to foresee the next maneuver without any additional supervision. However, instructional conversations give us very loose labels, while casual chat results in a high amount of noise. To mitigate this effect, we propose a technique for automatic detection of smalltalk based on the likelihood of spoken words being present in everyday dialogs. While visually recognizing driver's intention by learning from natural dialogs only is a challenging task, learning from less but better data via our smalltalk refinement consistently improves performance.
Published in: 2021 IEEE Intelligent Vehicles Symposium (IV)
Date of Conference: 11-17 July 2021
Date Added to IEEE Xplore: 01 November 2021
ISBN Information: