Abstract
Recent years have seen the success of applying word embedding algorithms to natural language processing (NLP) tasks. Most word embedding algorithms only produce a single embedding per word. This makes the learned embeddings indiscriminative since many words are polysemous. Some prior work utilizes the context in which the word resides to learn multiple word embeddings. However, context-based solutions are problematic for short texts, such as tweets, which have limited context. Moreover, existing approaches tend to enumerate all possible context types of a particular word regardless of their target applications. Applying multiple vector representations per word in NLP tasks can be computationally expensive because all possible combinations of senses of words in a snippet need to be considered. Sometimes, a word sense can be captured when the class information or label of the short text is presented. For example, in a disaster-related dataset, when a text snippet is labeled as “hurricane related”, the word “water” in the snippet is more likely to be interpreted as rain and flood; when a snippet is labeled as “hurricane unrelated”, the word “water” can be interpreted as its more general meaning. In this work, we propose to use class information to enhance the discriminativeness of words. Instead of enumerating all potential senses per word in the text, the number of vector representations per word should be a function of the future classification task. We show that learning the number of vector representations per word according to the number of classes in the classification task is often sufficient to clarify the polysemy. Word embeddings learned from neural language models typically have the property of good linear compositionality. We utilize this property to encode class information into the vector representation of a word. We explore four approaches to train class-specific embeddings to encode class information by utilizing the label information and the linear compositionality property of word embeddings. We present a general framework consisting of a pair of convolutional neural networks to utilize the learned class-specific word embeddings as input for text classification tasks. We evaluate our approach and framework on topic classification of a disaster-focused Twitter dataset and a benchmark Twitter sentiment classification dataset from SemEval 2013. Our results show a relative accuracy improvement of 3–4% over a recent baseline.
Similar content being viewed by others
Change history
18 November 2019
The article Learning class-specific word embeddings, written by Sicong Kuang and Brian D. Davison, was originally published electronically on the publisher’s Internet portal (currently SpringerLink) on 23 October 2019 with open access.
Notes
These three tweets are extracted from SemEval 2013 training data.
References
Nematzadeh A, Meylan SC, Griffiths TL (2017) Evaluating vector-space models of word representation, or, the unreasonable effectiveness of counting words near other words. In: Proceedings of the 39th Annual Meeting of the Cognitive Science Society
Harris ZS (1954) Distributional structure. Word 10:146–162
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp 3111–3119
Liu Q, Ling Z-H, Jiang H, Hu Y (2016) Part-of-speech relevance weights for learning word embeddings, arXiv preprint arXiv:1603.07695
Sienčnik SK (2015) Adapting word2vec to named entity recognition, In: Proceedings of the 20th Nordic Conference of Computational Linguistics, Vilnius, Lithuania, 109, Linköping University Electronic Press, pp 239–243
Levy O, Goldberg Y (2014) Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol 2, pp 302–308
Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for Twitter sentiment classification. In: Proceeding of the 52nd Annual Meeting of the Association for Computational Linguistics
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp 427–431
Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations, In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), vol 1, pp 2227–2237
Zheng X, Feng J, Chen Y, Peng H, Zhang W (2017) Learning context-specific word/character embeddings, AAAI Conference on Artificial Intelligence
Tian F, Dai H, Bian J, Gao B, Zhang R, Chen E, Liu T-Y (2014) A probabilistic model for learning multi-prototype word embeddings, In: The 25th International Conference on Computational Linguistics: Technical Papers Proceedings of COLING 2014, pp 151–160
Olteanu A, Castillo C, Diaz F, Vieweg S (2014) Crisislex: A lexicon for collecting and filtering microblogged communications in crises. In: Proceedings of the International AAAI Conference on Weblogs and Social Media
Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes, In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, Association for Computational Linguistics, pp 873–882
Neelakantan A, Shankar J, Passos A, McCallum A (2015) Efficient non-parametric estimation of multiple embeddings per word in vector space, arXiv preprint arXiv:1504.06654
Yu M, Dredze M (2014) Improving lexical embeddings with semantic knowledge, In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp 545–550
Faruqui M, Dodge J, Jauhar SK, Dyer C, Hovy EH, Smith NA (2014) Retrofitting word vectors to semantic lexicons, CoRR abs/1411.4166
Yu M, Gormley M, Dredze M (2014) Factor-based compositional embedding models. In: NIPS Workshop on Learning Semantics
Chen X, Liu Z, Sun M (2014) A unified model for word sense representation and disambiguation, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1025–1035
Kuang S, Davison BD (2018) Class-specific word embedding through linear compositionality. In: Proceedings of the IEEE international conference on big data and smart computing (BigComp), pp 390–397
Kim Y (2014) Convolutional neural networks for sentence classification, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1746–1751. ArXiv preprint arXiv:1408.5882
Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25:259–284
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
Ling W, Dyer C, Black A, Trancoso I (2015) Two/too simple adaptations of word2vec for syntax problems, In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1299–1304
Chen Y, Perozzi B, Al-Rfou R, Skiena S (2013) The expressive power of word embeddings. In: ICML 2013 Workshop on Deep Learning for Audio, Speech, and Language Processing
Trask A, Michalak P, Liu J (2015) sense2vec-a fast and accurate method for word sense disambiguation in neural word embeddings, arXiv preprint arXiv:1511.06388
Guo J, Che W, Wang H, Liu T (2014) Learning sense-specific word embeddings by exploiting bilingual resources, In: The 25th International Conference on Computational Linguistics: Technical Papers Proceedings of COLING 2014, pp 497–507
Su J, Wu S, Zhang B, Wu C, Qin Y, Xiong D (2018) A neural generative autoencoder for bilingual word embeddings. Inf Sci 424:287–300
Pelevina M, Arefyev N, Biemann C, Panchenko A (2017) Making sense of word embeddings, arXiv preprint arXiv:1708.03390
Bollegala D, Yoshida Y, Kawarabayashi K (2018) Using k-way co-occurrences for learning word embeddings, In: AAAI 2018 Conference on Artificial Intelligence
Pennington J, Socher R, Manning CD (2014) GloVe: Global vectors for word representation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543
Scheepers T, Kanoulas E, Gavves E (2018) Improving word embedding compositionality using lexicographic definitions, In: Proceedings of the 2018 World Wide Web Conference, WWW ’18, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 1083–1093
Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information, arXiv preprint arXiv:1607.04606
Athiwaratkun AGW Ben, Anandkumar A (2018) Probabilistic FastText for multi-sense word embeddings, In: Conference of the Association for Computational Linguistics (ACL)
Reynolds D (2015) Gaussian mixture models. Encyclopedia of biometrics, pp 827–832
Chen H, Wei B, Liu Y, Li Y, Yu J, Zhu W (2018) Bilinear joint learning of word and entity embeddings for entity linking. Neurocomputing 294:12–18
Mitchell J, Lapata M (2008) Vector-based models of semantic composition. In: Proceeding of the Annual Meeting of the Association for Computational Linguistics, pp 236–244
Li Q, Shah S, Liu X, Nourbakhsh A (2017) Data sets: word embeddings learned from tweets and general data, arXiv preprint arXiv:1708.03994
Attardi G (2015) DeepNL: a deep learning NLP pipeline. In: Proceedings of NAACL-HLT, pp 109–115
Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28:100–108
Acknowledgements
This material is based in part upon work supported by the National Science Foundation under Grant No. CMMI-1541177.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original version of this article was revised: The article Learning class‑specific word embeddings, written by Sicong Kuang and Brian D. Davison, was originally published electronically on the publisher's internet portal (currently SpringerLink) on 23 October 2019 with open access. With the author(s)’ decision to step back from Open Choice, the copyright of the article changed on 18 November 2019 to © Springer Science+Business Media, LLC, part of Springer Nature 2019 and the article is forthwith distributed under the terms of copyright.
Rights and permissions
About this article
Cite this article
Kuang, S., Davison, B.D. Learning class-specific word embeddings. J Supercomput 76, 8265–8292 (2020). https://doi.org/10.1007/s11227-019-03024-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-03024-z