Word Embeddings Versus LDA for Topic Assignment in Documents

Jȩdrzejowicz, Joanna; Zakrzewska, Magdalena

doi:10.1007/978-3-319-67077-5_34

Joanna Jȩdrzejowicz¹⁸ &
Magdalena Zakrzewska¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10449))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1989 Accesses
2 Citations

Abstract

Topic assignment for a corpus of documents is a task of natural language processing (NLP). One of the noted and well studied methods is Latent Dirichlet Allocation (LDA) where statistical methods are applied. On the other hand applying deep-learning paradigm proved useful for many NLP tasks such as classification [3], sentiment analysis [8], text summarization [11]. This paper compares the results of LDA method and application of representations provided by Word2Vec [5] which makes use of deep learning paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://www.jmlr.org/papers/v3/blei03a.html
MATH Google Scholar
Documents for tests (2016). http://hereticsconsulting.files.wordpress.com/2016/01/textmining.zip
Enríquez, F., Troyano, J.A., López-Solaz, T.: An approach to the use of word embeddings in an opinion classification task. Expert Syst. Appl. 66, 1–6 (2016). http://dx.doi.org/10.1016/j.eswa.2016.09.005
Article Google Scholar
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, ETMTNLP 2002, vol. 1, pp. 63–70. Association for Computational Linguistics, Stroudsburg (2002). http://dx.doi.org/10.3115/1118108.1118117
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Proceedings of a Meeting, 5–8 December 2013, Lake Tahoe, Nevada, USA, pp. 3111–3119 (2013). http://papers.nips.cc/book/advances-in-neural-information-processing-systems-26-2013
Nallapati, R., Cohen, W.W., Lafferty, J.D.: Parallelized variational EM for latent Dirichlet allocation: an experimental evaluation of speed and scalability. In: Workshops Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), 28–31 October 2007, Omaha, Nebraska, USA, pp. 349–354. IEEE Computer Society (2007). http://dx.doi.org/10.1109/ICDMW.2007.33
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworksm, pp. 45–50. ELRA, Valletta, May 2010. http://is.muni.cz/publication/884893/en
Sakenovich, N.S., Zharmagambetov, A.S.: On one approach of solving sentiment analysis task for Kazakh and Russian languages using deep learning. In: Nguyen, N.-T., Manolopoulos, Y., Iliadis, L., Trawiński, B. (eds.) ICCCI 2016. LNCS (LNAI), vol. 9876, pp. 537–545. Springer, Cham (2016). doi:10.1007/978-3-319-45246-3_51
Chapter Google Scholar
Skfuzzy: Fuzzy logic toolkit in python (2016). http://pythonhosted.org/scikit-fuzzy/
Topicmodels: Package for r (2016). https://cran.r-project.org/web/packages/topicmodels/
Yousefi-Azar, M., Hamey, L.: Text summarization using unsupervised deep learning. Expert Syst. Appl. 68, 93–105 (2017). http://dx.doi.org/10.1016/j.eswa.2016.10.017
Article Google Scholar
Zhang, W., Wang, J.: Prior-based dual additive latent Dirichlet allocation for user-item connected documents. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI 2015, pp. 1405–1411. AAAI Press (2015). http://dl.acm.org/citation.cfm?id=2832415.2832445

Download references

Author information

Authors and Affiliations

Faculty of Mathematics, Physics and Informatics, Institute of Informatics, University of Gdansk, 80-308, Gdansk, Poland
Joanna Jȩdrzejowicz & Magdalena Zakrzewska

Authors

Joanna Jȩdrzejowicz
View author publications
You can also search for this author in PubMed Google Scholar
Magdalena Zakrzewska
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Magdalena Zakrzewska .

Editor information

Editors and Affiliations

Department of Information Systems, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Department of Computer Science, University of Cyprus, Nicosia, Cyprus
George A. Papadopoulos
Department of Information Systems, Gdynia Maritime University, Gdynia, Poland
Piotr Jędrzejowicz
Department of Information Systems, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński
Department of Information Systems, University of Münster, Münster, Germany
Gottfried Vossen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jȩdrzejowicz, J., Zakrzewska, M. (2017). Word Embeddings Versus LDA for Topic Assignment in Documents. In: Nguyen, N., Papadopoulos, G., Jędrzejowicz, P., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2017. Lecture Notes in Computer Science(), vol 10449. Springer, Cham. https://doi.org/10.1007/978-3-319-67077-5_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-67077-5_34
Published: 07 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67076-8
Online ISBN: 978-3-319-67077-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics