Abstract:
Many users obtain content from a screen and want to make requests of a system based on items that they have seen. Eye-gaze information is a valuable signal in speech reco...Show MoreMetadata
Abstract:
Many users obtain content from a screen and want to make requests of a system based on items that they have seen. Eye-gaze information is a valuable signal in speech recognition and spoken-language understanding (SLU) because it provides context for a user's next utterance-what the user says next is probably conditioned on what they have seen. This paper investigates three types of features for connecting eye-gaze information to an SLU system: lexical, and two types of eye-gaze features. These features help us to understand which object (i.e. a link) that a user is referring to on a screen. We show a 17% absolute performance improvement in the referenced-object F-score by adding eye-gaze features to conventional methods based on a lexical comparison of the spoken utterance and the text on the screen.
Published in: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 19-24 April 2015
Date Added to IEEE Xplore: 06 August 2015
Electronic ISBN:978-1-4673-6997-8