Skip to main content

A Three Level Cache-Based Adaptive Chinese Language Model

  • Conference paper
Natural Language Processing – IJCNLP 2004 (IJCNLP 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

  • 1646 Accesses

Abstract

Even if n-grams were proved to be very powerful and robust in various tasks involving language models, they have a certain handicap that the dependency is limited to very short local context because of the Markov assumption. This article presents an improved cache based approach to Chinese statistical language modeling. We extend this model by introducing the Chinese concept lexicon into it. The cache of the extended language model contains not only the words occurred recently but also the semantically related words. Experiments have shown that the performance of the adaptive model has been improved greatly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Kuhn, R., De Mori, R.: A Cache-Based Natural Language Model for Speech Reproduction. IEEE Transactions on Pattern Analysis and Machine Intelligence (1990)

    Google Scholar 

  2. Kuhn, R., De Mori, R.: Corrections to.A Cache-Based Natural Language Model for Speech Reproduction’. IEEE Transactions on Pattern Analysis and Machine Intelligence (1992)

    Google Scholar 

  3. Iyer, R., Ostendorf, M.: Modeling Long Distance Dependencies in Language: Topic Mixtures vs. Dynamic Cache Models. In: Proceedings International Conference on Spoken Language Processing, Philadelphia, USA (1996)

    Google Scholar 

  4. Jelinek, F., Merialdo, B., Roukos, S., Strauss, M.: A Dynamic Language Model for Speech Recognition. In: Proceedings of Speech and Natural Language DARPA Workshop (1991)

    Google Scholar 

  5. Clarkson, P., Robinson, A.: Language model adaption using mixture and an exponentially decaying cache. In: Boc. ICASSP-97 (1997)

    Google Scholar 

  6. JiaJu, M., YiMing, Z.: TongYiCi Ci Lin. ShangHai□ ShangHai Dictionary Publication (1983)

    Google Scholar 

  7. Yang, K.C., Ho, T.H., Chien, L.F., Lee, L.S.: Statistics-based segment pattern lexicon. a new direction for Chinese language modeling. In: Proc. IEEE 1998 International Conference on Acoustic, Speech, Signal Processing, Seattle, WA, pp. 169–172 (1998)

    Google Scholar 

  8. Witten, I., Bell, T.: The zero-frequency problem: Estimating the probabilities of Novel Events in adaptive text compression. IEEE Transactions on Information theory 37(4) (1991)

    Google Scholar 

  9. Dempster, P., Laivd, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39, 1–38 (1977)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, J., Sun, L., Qu, W., Du, L., Sun, Y. (2005). A Three Level Cache-Based Adaptive Chinese Language Model. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30211-7_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24475-2

  • Online ISBN: 978-3-540-30211-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics