On the Use of Topic Models for Word Completion

Wolf, Elisabeth; Vembu, Shankar; Miller, Tristan

doi:10.1007/11816508_50

Elisabeth Wolf²¹,
Shankar Vembu²¹ &
Tristan Miller²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4139))

Included in the following conference series:

International Conference on Natural Language Processing (in Finland)

1595 Accesses
2 Citations

Abstract

We investigate the use of topic models, such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA), for word completion tasks. The advantage of using these models for such an application is twofold. On the one hand, they allow us to exploit semantic or contextual information when predicting candidate words for completion. On the other hand, these probabilistic models have been found to outperform classical latent semantic analysis (LSA) for modeling text documents. We describe a word completion algorithm that takes into account the semantic context of the word being typed. We also present evaluation metrics to compare different models being used in our study. Our experiments validate our hypothesis of using probabilistic models for semantic analysis of text documents and their application in word completion tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Swiffin, A., Arnott, J., Pickering, J., Newell, A.: Adaptive and predictive techniques in a communication prosthesis. AAC: Augmentative and Alternative Communication 3, 181–191 (1987)
Article Google Scholar
Newell, A.F.: Effect of the PAL word prediction system on the quality and quantity of text generation. AAC: Augmentative and Alternative Communication 8, 304–311 (1992)
Article Google Scholar
Fazly, A., Hirst, G.: Testing the efficacy of part-of-speech information in word completion. In: Proceedings of the Workshop on Language Modeling for Text Entry Methods at the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary (2003)
Google Scholar
Kozima, H., Ito, A.: A scene-based model of word prediction. In: Proceedings of the International Conference on New Methods in Language Processing (NeMLaP), Ankara, Turkey, pp. 110–120 (1996)
Google Scholar
Li, J., Hirst, G.: Semantic knowledge in word completion. In: Proceedings of the 7th International ACM SIGACCESS Conference on Computers and Accessibility (2005)
Google Scholar
Miller, G.A.: Wordnet: An online lexical database. International Journal of Lexicography 3, 235–244 (1990)
Article Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)
Article Google Scholar
Wolf, E.: A semantic-based word completion utility using latent semantic analysis. Diplom-Informatik thesis, Department of Technical Sciences, University of Applied Sciences, Oldenburg/Ostfriesland/Wilhelmshaven, Emden (2005)
Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42, 177–196 (2001)
Article MATH Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Article MATH Google Scholar
Blei, D., Jordan, M.: Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada (2003)
Google Scholar
Sivic, J., Russell, B., Efros, A., Zisserman, A., Freeman, W.: Discovering object categories in image collections. In: Proceedings of the 10th IEEE International Conference on Computer Vision, Beijing, China (2005)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 34, 1–38 (1977)
MathSciNet Google Scholar
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical methods. Machine Learning 37, 183–233 (1999)
Article MATH Google Scholar
Brand, M.: Incremental singular value decomposition of uncertain data with missing values. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 707–720. Springer, Heidelberg (2002)
Chapter Google Scholar
Lewis, D.D.: Reuters-21578 Text Categorization Test Collection Distribution 1.0 README File v1.3 (2004)
Google Scholar
Brand, M.: Fast online SVD revisions for lightweight recommender systems. In: Proceedings of the SIAM International Conference on Data Mining, San Francisco, CA, USA (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

German Research Center for Artificial Intelligence, Erwin-Schroedinger-Strasse 57, 67663, Kaiserslautern, Germany
Elisabeth Wolf & Shankar Vembu
The Socialist Party of Great Britain, 52 Clapham High Street, London, SW4 7UN, United Kingdom
Tristan Miller

Authors

Elisabeth Wolf
View author publications
You can also search for this author in PubMed Google Scholar
Shankar Vembu
View author publications
You can also search for this author in PubMed Google Scholar
Tristan Miller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Turku Centre for Computer Science (TUCS), Department of Information Technology, University of Turku, Joukahaisenkatu 3-5 B, FIN-20520, Turku, Finland
Tapio Salakoski
Turku Centre for Computer Science (TUCS) and Department of IT, University of Turku, Lemminkäisenkatu 14 A, 20520, Turku, Finland
Filip Ginter & Sampo Pyysalo &
Department of Information Technology, University of Turku, Lemminkäisenkatu 14–18 A, FIN-20520, Turku, Finland
Tapio Pahikkala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wolf, E., Vembu, S., Miller, T. (2006). On the Use of Topic Models for Word Completion. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_50

Download citation

DOI: https://doi.org/10.1007/11816508_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37334-6
Online ISBN: 978-3-540-37336-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics