Abstract:
The paper presents an empirical exploration of google.com query stream language modeling. We describe the normalization of the typed query stream resulting in out-of-voca...Show MoreMetadata
Abstract:
The paper presents an empirical exploration of google.com query stream language modeling. We describe the normalization of the typed query stream resulting in out-of-vocabulary (OoV) rates below 1% for a one million word vocabulary. We present a comprehensive set of experiments that guided the design decisions for a voice search service. In the process we re-discovered a less known interaction between Kneser-Ney smoothing and entropy pruning, and found empirical evidence that hints at non-stationarity of the query stream, as well as strong dependence on various English locales-USA, Britain and Australia.
Published in: 2010 IEEE Spoken Language Technology Workshop
Date of Conference: 12-15 December 2010
Date Added to IEEE Xplore: 24 January 2011
ISBN Information: