Combating the Sparse Data Problem of Language Modelling

Jelinek, Frederick

doi:10.1007/978-3-540-39398-6_1

Combating the Sparse Data Problem of Language Modelling

Frederick Jelinek⁷

Conference paper

418 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2807))

Abstract

The talk will concern several ideas that combat the sparse data problem of language modeling. All alleviate it, neither solves it. These ideas are: equivalence classification of histories, positional clustering (different cluster systems for different n-gram positions), use of linguistic classes (e.g., Wordnet), class constraints in maximum entropy estimation, random forests, and neural network classification. An interesting problem that must be faced is as follows: words that are sparse and need to be classified do not have sufficient statistics to indicate their appropriate class membership.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Author information

Authors and Affiliations

Center for Language and Speech Processing, Johns Hopkins University, 309 Barton Hall, 3400 N. Charles St., Baltimore, MD, 21218, USA
Frederick Jelinek

Authors

Frederick Jelinek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Václav Matoušek & Pavel Mautner &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jelinek, F. (2003). Combating the Sparse Data Problem of Language Modelling. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-39398-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics