skip to main content
10.1145/3063955.3063979acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-turcConference Proceedingsconference-collections
research-article

Learning word embeddings via context grouping

Published: 12 May 2017 Publication History

Abstract

Recently, neural-network based word embedding models have been shown to produce high-quality distributional representations capturing both semantic and syntactic information. In this paper, we propose a grouping-based context predictive model by considering the interactions of context words, which generalizes the widely used CBOW model and Skip-Gram model. In particular, the words within a context window are split into several groups with a grouping function, where words in the same group are combined while different groups are treated as independent. To determine the grouping function, we propose a relatedness hypothesis stating the relationship among context words and propose several context grouping methods. Experimental results demonstrate better representations can be learned with suitable context groups.

References

[1]
Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Paşca, and Aitor Soroa. 2009. A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 19--27.
[2]
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. The Journal of Machine Learning Research 3 (2003), 1137--1155.
[3]
Elia Bruni, Nam-Khanh Tran, and Marco Baroni. 2014. Multimodal Distributional Semantics. J. Artif. Intell. Res.(JAIR) 49, 1--47 (2014).
[4]
Danqi Chen and Christopher D Manning. 2014. A Fast and Accurate Dependency Parser using Neural Networks. In EMNLP. 740--750.
[5]
Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, and Huan-Bo Luan. 2015. Joint Learning of Character and Word Embeddings. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25--31, 2015. 1236--1242.
[6]
Jianpeng Cheng, Zhongyuan Wang, Ji-Rong Wen, Jun Yan, and Zheng Chen. 2015. Contextual Text Understanding in Distributional Semantic Space. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 133--142.
[7]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. ACM, 160--167.
[8]
Scott C. Deerwester, Susan T Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. JAsIs 41, 6 (1990), 391--407.
[9]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research 12 (2011), 2121--2159.
[10]
Zellig S Harris. 1954. Distributional structure. Word (1954).
[11]
Felix Hill, Roi Reichart, and Anna Korhonen. 2016. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics (2016).
[12]
Eric H Huang, Richard Socher, Christopher D Manning, and Andrew Y Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 873--882.
[13]
Douwe Kiela, Felix Hill, and Stephen Clark. 2015. Specializing word embeddings for similarity or relatedness. In Proceedings of EMNLP.
[14]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP. ACL, 1746--1751.
[15]
Omer Levy and Yoav Goldberg. 2014. Dependency-Based Word Embeddings. In ACL (2). 302--308.
[16]
Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems. 2177--2185.
[17]
Wang Ling, Chris Dyer, Alan Black, and Isabel Trancoso. 2015. Two/too simple adaptations of word2vec for syntax problems. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1299--1304.
[18]
Wang Ling, Yulia Tsvetkov, Silvio Amir, Ramon Fermandez, Chris Dyer, Alan W. Black, Isabel Trancoso, and Chu-Cheng Lin. 2015. Not All Contexts Are Created Equal: Better Word Representations with Variable Attention. In EMNLP. The Association for Computational Linguistics, 1367--1372.
[19]
Quan Liu, Hui Jiang, Si Wei, Zhen-Hua Ling, and Yu Hu. 2015. Learning semantic word embeddings based on ordinal knowledge constraints. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP). 1501--1511.
[20]
Minh-Thang Luong, Richard Socher, and Christopher D Manning. 2013. Better word representations with recursive neural networks for morphology. CoNLL-2013 104 (2013).
[21]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[22]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
[23]
Andriy Mnih and Geoffrey Hinton. 2007. Three new graphical models for statistical language modelling. In Proceedings of the 24th international conference on Machine learning. ACM, 641--648.
[24]
Frederic Morin and Yoshua Bengio. 2005. Hierarchical probabilistic neural network language model. In Proceedings of the international workshop on artificial intelligence and statistics. Citeseer, 246--252.
[25]
Ramesh Nallapati, Bowen Zhou, Cícero Nogueira dos Santos, Çaglar Gülçehre, and Bing Xiang. 2016. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond. In CoNLL. ACL, 280--290.
[26]
Arvind Neelakantan, Jeevan Shankar, Alexandre Passos, and Andrew McCallum. 2014. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space. In EMNLP. ACL, 1059--1069.
[27]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014) 12 (2014), 1532--1543.
[28]
Chang Xu, Yalong Bai, Jiang Bian, Bin Gao, Gang Wang, Xiaoguang Liu, and Tie-Yan Liu. 2014. Rc-net: A general framework for incorporating knowledge into word representations. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 1219--1228.
[29]
Geoffrey Zweig and Christopher JC Burges. 2011. The Microsoft Research sentence completion challenge. Technical Report. Technical Report MSR-TR-2011-129, Microsoft.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACM TURC '17: Proceedings of the ACM Turing 50th Celebration Conference - China
May 2017
371 pages
ISBN:9781450348737
DOI:10.1145/3063955
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. context grouping
  2. non-parametric clustering
  3. word embeddings

Qualifiers

  • Research-article

Conference

ACM TUR-C '17

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 187
    Total Downloads
  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media