Learning Sparse Overcomplete Word Vectors Without Intermediate Dense Representations

Chen, Yunchuan; Li, Ge; Jin, Zhi

doi:10.1007/978-3-319-63558-3_1

Yunchuan Chen^18,19,
Ge Li^18,20 &
Zhi Jin^18,20

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10412))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2011 Accesses

Abstract

Dense word representation models have attracted a lot of interest for their promising performances in various natural language processing (NLP) tasks. However, dense word vectors are uninterpretable, inseparable, and time and space consuming. We propose a model to learn sparse word representations directly from the plain text, rather than most existing methods that learn sparse vectors from intermediate dense word embeddings. Additionally, we design an efficient algorithm based on noise-contrastive estimation (NCE) to train the model. Moreover, a clustering-based adaptive updating scheme for noise distributions is introduced for effective learning when NCE is applied. Experimental results show that the resulting sparse word vectors are comparable to dense vectors on the word analogy tasks. Our models outperform dense word vectors on the word similarity tasks. The sparse word vectors are much more interpretable, according to the sparse vector visualization and the word intruder identification experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The same as a word is composed of word atoms, we also assume that a context is composed of context atoms. In this paper, we will use surrounding words as contexts and thus \(\mathcal {V} = \mathcal {C}\). The number of word and context atoms are also set to be equal, i.e., \(n_{b} = n_{c}\).
2.
SpVec-nce’s learning algorithm could be derived similarly.
3.
We use index convention from Python except that indexes start with 1.
4.
We define the product of a matrix \({\varvec{A}} \in \mathbb {R}^{m \times n}\) and a 3-way tensor \({\varvec{B}}\in \mathbb {R}^{n\times p \times q}\) to be a 3-way tensor \({\varvec{C}}\) such that \({\varvec{C}}_{:, i, j} = {\varvec{AB}}_{:,i,j}\).
5.
https://github.com/mfaruqui/sparse-coding.
6.
https://github.com/dav/word2vec.
7.
https://github.com/dav/word2vec/blob/master/data/questions-words.txt.
8.
http://research.microsoft.com/en-us/projects/rnn/.

References

Brown, P.F., Della Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)
Google Scholar
Chen, Y., Mou, L., Xu, Y., Li, G., Jin, Z.: Compressing neural language models by sparse word representations. In: Proceedings of ACL (2016)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Donoho, D.L., Elad, M., Temlyakov, V.N.: Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inf. Theory 52, 6–18 (2006)
Article MathSciNet MATH Google Scholar
Erk, K., Padó, S.: A structured vector space model for word meaning in context. In: Proceedings of EMNLP, pp. 897–906, Morristown, NJ, USA (2008)
Google Scholar
Faruqui, M., Tsvetkov, Y., Yogatama, D., Dyer, C., Smith, N.A.: Sparse overcomplete word vector representations. In: Proceedings of ACL, pp. 1491–1500 (2015)
Google Scholar
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002)
Article Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
Goodfellow, I.J.: On distinguishability criteria for estimating generative models. arXiv, December 2014
Google Scholar
Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13(1), 307–361 (2012)
MathSciNet MATH Google Scholar
Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954)
Article Google Scholar
Kalchbrenner, N., Blunsom, P.: Recurrent convolutional neural networks for discourse compositionality. arXiv.org, June 2013
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
Article Google Scholar
Levy, O., Goldberg, Y.: Linguistic regularities in sparse and explicit word representations. In: Conference on Natural Language Learning, pp. 171–180 (2014)
Google Scholar
Lund, K., Burgess, C., Atchley, R.A.: Semantic and associative priming in high-dimensional semantic space. In: Annual Conference of the Cognitive Science Society, vol. 17, pp. 660–665 (1995)
Google Scholar
Martins, A.F.T., Smith, N.A., Figueiredo, M.A.T., Aguiar, P.M.Q.: Structured sparsity in structured prediction. In: Proceedings of EMNLP (2011)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Computing Research Repository (2013)
Google Scholar
Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: NIPS, pp. 2265–2273 (2013)
Google Scholar
Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426 (2012)
Murphy, B., Talukdar, P.P., Mitchell, T.M.: Learning effective and interpretable semantic models using non-negative sparse embedding. In: Proceedings of COLING. ACL (2012)
Google Scholar
Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by v1? Vis. Res. 37(23), 3311–3325 (1997)
Article Google Scholar
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends\(\textregistered \) Optim. 1(3), 127–239 (2014)
Google Scholar
Paul, M., Dredze, M.: Factorial LDA: sparse multi-dimensional text models. In: Advances in Neural Information Processing (2012)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of ACL (2014)
Google Scholar
Sivaram, G.S.V.S., Nemala, S.K., Elhilali, M., Tran, T.D., Hermansky, H.: Sparse coding for speech recognition. In: ICASSP, pp. 4346–4349 (2010)
Google Scholar
Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: Proceedings of ACL (2013)
Google Scholar
Socher, R., Chen, D., Manning, C.D.: Reasoning with neural tensor networks for knowledge base completion. In: NIPS (2013)
Google Scholar
Soutner, D., Müller, L.: Continuous distributed representations of words as input of LSTM network language model. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 150–157. Springer, Cham (2014). doi:10.1007/978-3-319-10816-2_19
Google Scholar
Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Sparse word embeddings using \(\ell _{1}\) regularized online learning. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, USA, pp. 959–966 (2016)
Google Scholar
Toutanova, K., Johnson, M.: A Bayesian LDA-based model for semi-supervised part-of-speech tagging. In: NIPS (2007)
Google Scholar
Vinson, D.P., Vigliocco, G.: Semantic feature production norms for a large set of objects and events. Behav. Res. Methods 40(1), 183–190 (2008)
Article Google Scholar
Yogatama, D., Smith, N.A.: Linguistic structured sparsity in text categorization. In: Proceedings of ACL (2014)
Google Scholar

Download references

Acknowledgement

This research is supported by the National Basic Research Program of China (the 973 Program) under Grant No. 2015CB352201 and the National Natural Science Foundation of China under Grant Nos. 61421091, 61232015 and 61502014.

Author information

Authors and Affiliations

Key Laboratory of High Confidence Software Technologies, Peking University, Beijing, China
Yunchuan Chen, Ge Li & Zhi Jin
University of Chinese Academy of Sciences, Beijing, China
Yunchuan Chen
Institute of Software, Peking University, Beijing, China
Ge Li & Zhi Jin

Authors

Yunchuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ge Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ge Li or Zhi Jin .

Editor information

Editors and Affiliations

Deakin University, Burwood, Victoria, Australia
Gang Li
University of Arizona, Tucson, Arizona, USA
Yong Ge
Southwest University, Chongqing, China
Zili Zhang
Peking University, Beijing, China
Zhi Jin
University of Technology Sydney, Sydney, New South Wales, Australia
Michael Blumenstein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Li, G., Jin, Z. (2017). Learning Sparse Overcomplete Word Vectors Without Intermediate Dense Representations. In: Li, G., Ge, Y., Zhang, Z., Jin, Z., Blumenstein, M. (eds) Knowledge Science, Engineering and Management. KSEM 2017. Lecture Notes in Computer Science(), vol 10412. Springer, Cham. https://doi.org/10.1007/978-3-319-63558-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63558-3_1
Published: 19 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63557-6
Online ISBN: 978-3-319-63558-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics