Learning class-specific word embeddings

Kuang, Sicong; Davison, Brian D.

doi:10.1007/s11227-019-03024-z

Learning class-specific word embeddings

Published: 23 October 2019

Volume 76, pages 8265–8292, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

505 Accesses
10 Citations
Explore all metrics

A Correction to this article was published on 18 November 2019

This article has been updated

Abstract

Recent years have seen the success of applying word embedding algorithms to natural language processing (NLP) tasks. Most word embedding algorithms only produce a single embedding per word. This makes the learned embeddings indiscriminative since many words are polysemous. Some prior work utilizes the context in which the word resides to learn multiple word embeddings. However, context-based solutions are problematic for short texts, such as tweets, which have limited context. Moreover, existing approaches tend to enumerate all possible context types of a particular word regardless of their target applications. Applying multiple vector representations per word in NLP tasks can be computationally expensive because all possible combinations of senses of words in a snippet need to be considered. Sometimes, a word sense can be captured when the class information or label of the short text is presented. For example, in a disaster-related dataset, when a text snippet is labeled as “hurricane related”, the word “water” in the snippet is more likely to be interpreted as rain and flood; when a snippet is labeled as “hurricane unrelated”, the word “water” can be interpreted as its more general meaning. In this work, we propose to use class information to enhance the discriminativeness of words. Instead of enumerating all potential senses per word in the text, the number of vector representations per word should be a function of the future classification task. We show that learning the number of vector representations per word according to the number of classes in the classification task is often sufficient to clarify the polysemy. Word embeddings learned from neural language models typically have the property of good linear compositionality. We utilize this property to encode class information into the vector representation of a word. We explore four approaches to train class-specific embeddings to encode class information by utilizing the label information and the linear compositionality property of word embeddings. We present a general framework consisting of a pair of convolutional neural networks to utilize the learned class-specific word embeddings as input for text classification tasks. We evaluate our approach and framework on topic classification of a disaster-focused Twitter dataset and a benchmark Twitter sentiment classification dataset from SemEval 2013. Our results show a relative accuracy improvement of 3–4% over a recent baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context Representation with Word Embeddings for WSD

Using a Chinese Lexicon to Learn Sense Embeddings and Measure Semantic Similarity

Word-class embeddings for multiclass text classification

Article 19 February 2021

Change history

18 November 2019
The article Learning class-specific word embeddings, written by Sicong Kuang and Brian D. Davison, was originally published electronically on the publisher’s Internet portal (currently SpringerLink) on 23 October 2019 with open access.

Notes

Example 3 in Table 1 is extracted from the disaster-focused Twitter corpus T6 [13] which we describe in Sect. 4.1.
https://www.cs.york.ac.uk/semeval-2013/.
These three tweets are extracted from SemEval 2013 training data.

References

Nematzadeh A, Meylan SC, Griffiths TL (2017) Evaluating vector-space models of word representation, or, the unreasonable effectiveness of counting words near other words. In: Proceedings of the 39th Annual Meeting of the Cognitive Science Society
Harris ZS (1954) Distributional structure. Word 10:146–162
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp 3111–3119
Liu Q, Ling Z-H, Jiang H, Hu Y (2016) Part-of-speech relevance weights for learning word embeddings, arXiv preprint arXiv:1603.07695
Sienčnik SK (2015) Adapting word2vec to named entity recognition, In: Proceedings of the 20th Nordic Conference of Computational Linguistics, Vilnius, Lithuania, 109, Linköping University Electronic Press, pp 239–243
Levy O, Goldberg Y (2014) Dependency-based word embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol 2, pp 302–308
Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for Twitter sentiment classification. In: Proceeding of the 52nd Annual Meeting of the Association for Computational Linguistics
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
MATH Google Scholar
Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp 427–431
Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations, In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), vol 1, pp 2227–2237
Zheng X, Feng J, Chen Y, Peng H, Zhang W (2017) Learning context-specific word/character embeddings, AAAI Conference on Artificial Intelligence
Tian F, Dai H, Bian J, Gao B, Zhang R, Chen E, Liu T-Y (2014) A probabilistic model for learning multi-prototype word embeddings, In: The 25th International Conference on Computational Linguistics: Technical Papers Proceedings of COLING 2014, pp 151–160
Olteanu A, Castillo C, Diaz F, Vieweg S (2014) Crisislex: A lexicon for collecting and filtering microblogged communications in crises. In: Proceedings of the International AAAI Conference on Weblogs and Social Media
Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes, In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, Association for Computational Linguistics, pp 873–882
Neelakantan A, Shankar J, Passos A, McCallum A (2015) Efficient non-parametric estimation of multiple embeddings per word in vector space, arXiv preprint arXiv:1504.06654
Yu M, Dredze M (2014) Improving lexical embeddings with semantic knowledge, In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp 545–550
Faruqui M, Dodge J, Jauhar SK, Dyer C, Hovy EH, Smith NA (2014) Retrofitting word vectors to semantic lexicons, CoRR abs/1411.4166
Yu M, Gormley M, Dredze M (2014) Factor-based compositional embedding models. In: NIPS Workshop on Learning Semantics
Chen X, Liu Z, Sun M (2014) A unified model for word sense representation and disambiguation, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1025–1035
Kuang S, Davison BD (2018) Class-specific word embedding through linear compositionality. In: Proceedings of the IEEE international conference on big data and smart computing (BigComp), pp 390–397
Kim Y (2014) Convolutional neural networks for sentence classification, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1746–1751. ArXiv preprint arXiv:1408.5882
Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25:259–284
Article Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
MATH Google Scholar
Ling W, Dyer C, Black A, Trancoso I (2015) Two/too simple adaptations of word2vec for syntax problems, In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1299–1304
Chen Y, Perozzi B, Al-Rfou R, Skiena S (2013) The expressive power of word embeddings. In: ICML 2013 Workshop on Deep Learning for Audio, Speech, and Language Processing
Trask A, Michalak P, Liu J (2015) sense2vec-a fast and accurate method for word sense disambiguation in neural word embeddings, arXiv preprint arXiv:1511.06388
Guo J, Che W, Wang H, Liu T (2014) Learning sense-specific word embeddings by exploiting bilingual resources, In: The 25th International Conference on Computational Linguistics: Technical Papers Proceedings of COLING 2014, pp 497–507
Su J, Wu S, Zhang B, Wu C, Qin Y, Xiong D (2018) A neural generative autoencoder for bilingual word embeddings. Inf Sci 424:287–300
Article MathSciNet Google Scholar
Pelevina M, Arefyev N, Biemann C, Panchenko A (2017) Making sense of word embeddings, arXiv preprint arXiv:1708.03390
Bollegala D, Yoshida Y, Kawarabayashi K (2018) Using k-way co-occurrences for learning word embeddings, In: AAAI 2018 Conference on Artificial Intelligence
Pennington J, Socher R, Manning CD (2014) GloVe: Global vectors for word representation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543
Scheepers T, Kanoulas E, Gavves E (2018) Improving word embedding compositionality using lexicographic definitions, In: Proceedings of the 2018 World Wide Web Conference, WWW ’18, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp 1083–1093
Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information, arXiv preprint arXiv:1607.04606
Athiwaratkun AGW Ben, Anandkumar A (2018) Probabilistic FastText for multi-sense word embeddings, In: Conference of the Association for Computational Linguistics (ACL)
Reynolds D (2015) Gaussian mixture models. Encyclopedia of biometrics, pp 827–832
Chen H, Wei B, Liu Y, Li Y, Yu J, Zhu W (2018) Bilinear joint learning of word and entity embeddings for entity linking. Neurocomputing 294:12–18
Article Google Scholar
Mitchell J, Lapata M (2008) Vector-based models of semantic composition. In: Proceeding of the Annual Meeting of the Association for Computational Linguistics, pp 236–244
Li Q, Shah S, Liu X, Nourbakhsh A (2017) Data sets: word embeddings learned from tweets and general data, arXiv preprint arXiv:1708.03994
Attardi G (2015) DeepNL: a deep learning NLP pipeline. In: Proceedings of NAACL-HLT, pp 109–115
Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28:100–108
MATH Google Scholar

Download references

Acknowledgements

This material is based in part upon work supported by the National Science Foundation under Grant No. CMMI-1541177.

Author information

Authors and Affiliations

Lehigh University, Bethlehem, PA, USA
Sicong Kuang & Brian D. Davison

Authors

Sicong Kuang
View author publications
You can also search for this author in PubMed Google Scholar
Brian D. Davison
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sicong Kuang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised: The article Learning class‑specific word embeddings, written by Sicong Kuang and Brian D. Davison, was originally published electronically on the publisher's internet portal (currently SpringerLink) on 23 October 2019 with open access. With the author(s)’ decision to step back from Open Choice, the copyright of the article changed on 18 November 2019 to © Springer Science+Business Media, LLC, part of Springer Nature 2019 and the article is forthwith distributed under the terms of copyright.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuang, S., Davison, B.D. Learning class-specific word embeddings. J Supercomput 76, 8265–8292 (2020). https://doi.org/10.1007/s11227-019-03024-z

Download citation

Published: 23 October 2019
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11227-019-03024-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning class-specific word embeddings

Abstract

Access this article

Similar content being viewed by others

Context Representation with Word Embeddings for WSD

Using a Chinese Lexicon to Learn Sense Embeddings and Measure Semantic Similarity

Word-class embeddings for multiclass text classification

Change history

18 November 2019

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning class-specific word embeddings

Abstract

Access this article

Similar content being viewed by others

Context Representation with Word Embeddings for WSD

Using a Chinese Lexicon to Learn Sense Embeddings and Measure Semantic Similarity

Word-class embeddings for multiclass text classification

Change history

18 November 2019

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation