Phrase-based data selection for language model adaptation in spoken language translation | IEEE Conference Publication | IEEE Xplore

Phrase-based data selection for language model adaptation in spoken language translation


Abstract:

In this paper, we propose an unsupervised phrase-based data selection model, address the problem of selecting no-domain-specific language model (LM) training data to buil...Show More

Abstract:

In this paper, we propose an unsupervised phrase-based data selection model, address the problem of selecting no-domain-specific language model (LM) training data to build adapted LM for use. In spoken language translation (SLT) system, we aim at finding the LM training sentences which are similar to the translation task. Compared with the traditional bag-of-words models, the phrase-based data selection model is more effective because it captures contextual information in modeling the selection of phrase as a whole, rather than selection of single words in isolation. Large-scale experimental results demonstrate that our approach significantly outperforms the state-of-the-art approaches on both LM perplexity and translation performance, respectively.
Date of Conference: 05-08 December 2012
Date Added to IEEE Xplore: 31 January 2013
ISBN Information:
Conference Location: Hong Kong, China

Contact IEEE to Subscribe

References

References is not available for this document.