Abstract
Spoken and written language processing has seen a dramatic shift in recent years to increased use of continuous-space representations of language via neural networks and other distributional methods. In particular, word embeddings are used in many applications. This paper looks at the advantages of the continuous-space approach and limitations of word embeddings, reviewing recent work that attempts to model more of the structure in language. In addition, we discuss how current models characterize the exceptions in language and opportunities for advances by integrating traditional and continuous approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alexandrescu, A., Kirchhoff, K.: Factored neural language models. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (2006)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the International Conference Learning Representations (ICLR) (2015)
Ballesteros, M., Dyer, C., Smith, N.: Improved transition-based parsing by modeling characters instead of words with LSTMs. In: Proceedings of the Conference Empirical Methods Natural Language Process (EMNLP), pp. 349–359 (2015)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. In: Proceedings of the Conference Neural Information Processing System (NIPS), pp. 932–938 (2001)
Botha, J.A., Blunsom, P.: Compositional morphology for word representations and language modelling. In: Proceedings of the International Conference on Machine Learning (ICML) (2014)
Brown, P.F., Della Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–479 (1992)
Bruni, E., Tran, N., Baroni, M.: Multimodal distributional semantics. J. Artif. Intell. Res. 49, 1–47 (2014)
Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: Proceedings of the International Conference Acoustic, Speech, and Signal Process (ICASSP), pp. 4960–4964 (2016)
Cho, K., van Merriënboer, B., Gulcehre, C., Bahadanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference Empirical Methods Natural Language Process (EMNLP), pp. 1724–1734 (2014)
Chrupala, G.: Normalizing tweets with edit scripts and recurrent neural embeddings. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2014)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the International Conference Machine Learning (ICML), pp. 160–167 (2008)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unannotated text. In: Proceedings International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR), June 2005
Dyer, C., Kuncoro, A., Ballesteros, M., Smith, N.A.: Recurrent neural network grammars. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL) (2015)
Eisenstein, J., Ahmed, A., Xing, E.P.: Sparse additive generative models of text. In: Proceedings of the International Conference Machine Learning (ICML) (2011)
Eyben, F., Wöllmer, M., Schuller, B., Graves, A.: From speech to letters - using a novel neural network architecture for grapheme based ASR. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 376–380 (2009)
Fang, H., Gupta, S., Iandola, F., Srivastava, R., Deng, L., Dollar, P., Gao, J., He, X., Mitchell, M., Platt, J., Zitnick, L., Zweig, G., Zitnick, L.: From captions to visual concepts and back. In: Proceedings of the Conference Computer Vision and Pattern Recognition (CVPR) (2015)
Fang, H., Ostendorf, M., Baumann, P., Pierrehumbert, J.: Exponential language modeling using morphological featues and multi-task learning. IEEE Trans. Audio Speech Lang. Process. 23(12), 2410–2421 (2015)
Gillick, D., Brunk, C., Vinyals, O., Subramanya, A.: Multilingual language processing from bytes. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL) (2016)
Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labeling unsegmented sequence data with recurrent neural networks. In: Proceedings of the International Conference Machine Learning (ICML) (2006)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)
He, J., Chen, J., He, X., Gao, J., Li, L., Deng, L., Ostendorf, M.: Deep reinforcement learning with a natural language action space. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2016)
Hershey, J.R., Roux, J.L., Weninger, F.: Deep unfolding: model-based inspiration of novel deep architectures. arXiv preprint arXiv:1409.2574v4 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the ACM International Conference on Information and Knowledge Management (2013)
Hutchinson, B., Ostendorf, M., Fazel, M.: A sparse plus low rank maximum entropy language model for limited resource scenarios. IEEE Trans. Audio Speech Lang. Process. 23(3), 494–504 (2015)
Hutchinson, B.: Rank and sparsity in language processing. Ph.D. thesis, University of Washington, August 2013
Jaech, A., Heck, L., Ostendorf, M.: Domain adaptation of recurrent neural networks for natural language understanding. In: Proceedings of the International Conference Speech Communication Association (Interspeech) (2016)
Ji, Y., Eisenstein, J.: One vector is not enough: entity-augmented distributional semantics for discourse relations. Trans. Assoc. Comput. Linguist. (TACL) 3, 329–344 (2015)
Jozafowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410 (2015)
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2014)
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the Conference Computer Vision and Pattern Recognition (CVPR) (2015)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the Conference Empirical Methods Natural Language Process (EMNLP) (2014)
Kim, Y., Jernite, Y., Sontag, D., Rush, A.: Character-aware neural language models. In: Proceedings of the AAAI, pp. 2741–2749 (2016)
Kong, L., Dyer, C., Smith, N.: Segmental neural networks. In: Proceedings of the International Conference Learning Representations (ICLR) (2016)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proceedings of the AAAI (2015)
Lev, G., Klein, B., Wolf, L.: In defense of word embedding for generic text representation. In: International Conference on Applications of Natural Language to Information Systems, pp. 35–50 (2015)
Levy, O., Goldberg, Y.: Linguistic regularities in sparse and explicit word representations. In: Proceedings of the Conference Computational Language Learning, pp. 171–180 (2014)
Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL), pp. 211–225 (2015)
Li, J., Galley, M., Brockett, C., Gao, J., Dolan, B.: A persona-based neural conversation model. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2016)
Li, J., Jurafsky, D.: Do multi-sense embeddings improve natural language understanding? In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL), pp. 1722–1732 (2015)
Lin, R., Liu, S., Yang, M., Li, M., Zhou, M., Li, S.: Hierarchical recurrent neural network for document modeling. In: Proceedings of the Conference Empirical Methods Natural Language Processing (EMNLP), pp. 899–907 (2015)
Ling, W., LuÃs, T., Marujo, L., Astudillo, R.F., Amir, S., Dyer, C., Black, A.W., Trancoso, I.: Finding function in form: compositional character models for open vocabulary word representation. In: EMNLP (2015)
Long, M.T., Socher, R., Manning, C.: Better word representations for recursive neural networks for morphology. In: Proceedings of the Conference Computational Natural Language Learning (CoNLL) (2013)
Lu, A., Wang, W., Bansal, M., Gimpel, K., Livescu, K.: Deep multilingual correlation for improved word embeddings. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL), pp. 250–256 (2015)
Maas, A., Xie, Z., Jurafsky, D., Ng, A.: Lexicon-free conversational speech recognition with neural networks. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL), pp. 345–354 (2015)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. Mach. Learn. Res. 9, 2579–2605 (2008)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference Learning Representations (ICLR) (2013)
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (2013)
Mikolov, T., Zweig, G.: Context dependent recurrent neural network language model. In: Proceedings of the IEEE Spoken Language Technologies Workshop (2012)
Mikolov, T., Martin, K., Burget, L., C̆ernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of the International Conference Speech Communication Association (Interspeech) (2010)
Mousa, A.E.D., Kuo, H.K.J., Mangu, L., Soltau, H.: Morpheme-based feature-rich language models using deep neural networks for LVCSR of Egyptian Arabic. In: Proceedings of the International Conference Acoustic, Speech, and Signal Process (ICASSP), pp. 8435–8439 (2013)
Murphy, B., Talukdar, P., Mitchell, T.: Learning effective and interpretable semantic models using non-negative sparse embedding. In: Proceedings of the International Conference Computational Linguistics (COLING) (2012)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the Conference Empirical Methods Natural Language Process (EMNLP) (2014)
Qui, S., Cui, Q., Bian, J., Gao, B., Liu, T.Y.: Co-learning of word representations and morpheme representations. In: Proceedings of the International Conference Computational Linguistics (COLING) (2014)
Rush, A., Chopra, S., Weston, J.: A neural attention model for sentence summarization. In: Proceedings of the International Conference Empirical Methods Natural Language Process (EMNLP), pp. 379–389 (2015)
dos Santos, C., Guimarães, V.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the ACL Named Entities Workshop, pp. 25–33 (2015)
dos Santos, C., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: Proceedings of the International Conference Machine Learning (ICML) (2015)
Schutze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)
Schwartz, R., Reichart, R., Rappoport, A.: Symmetric pattern-based word embeddings for improved word similarity prediction. In: Proceedings of the Conference Computational Language Learning, pp. 258–267 (2015)
Socher, R., Bauer, J., Manning, C.: Parsing with compositional vectors. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2013)
Socher, R., Lin, C., Ng, A., Manning, C.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the International Conference Machine Learning (ICML) (2011)
Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., Nie, J.Y., Gao, J., Dolan, B.: A neural network approach to context-sensitive generation of conversational responses. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL) (2015)
Srivastava, R., Greff, K., Schmidhuber, J.: Training very deep networks. In: Proceedings of the Conference Neural Information Processing System (NIPS) (2015)
Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: Proceedings of the Interspeech (2012)
Turney, P.: Similarity of semantic relations. Comput. Linguist. 32(3), 379–416 (2006)
Wu, Y., Lu, X., Yamamoto, H., Matsuda, S., Hori, C., Kashioka, H.: Factored language model based on recurrent neural network. In: Proceedings of the International Conference Computational Linguistics (COLING) (2012)
Yao, K., Zweig, G., Peng, B.: Intention with attention for a neural network conversation model. arXiv preprint arXiv:1510.08565v3 (2015)
Yogatama, D., Wang, C., Routledge, B., Smith, N., Xing, E.: Dynamic language models for streaming text. Trans. Assoc. Comput. Linguist. (TACL) 2, 181–192 (2014)
Zayats, V., Ostendorf, M., Hajishirzi, H.: Disfluency detection using a bidirectional LSTM. In: Proceedings of the International Conference Speech Communication Association (Interspeech) (2016)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the Conference Neural Information Processing System (NIPS), pp. 1–9 (2015)
Acknowledgments
I thank my students Hao Cheng, Hao Fang, Ji He, Brian Hutchinson, Aaron Jaech, Yi Luan, and Vicky Zayats for helping me gain insights into continuous space language methods through their many experiments and our paper discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Ostendorf, M. (2016). Continuous-Space Language Processing: Beyond Word Embeddings. In: Král, P., MartÃn-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science(), vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-45925-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45924-0
Online ISBN: 978-3-319-45925-7
eBook Packages: Computer ScienceComputer Science (R0)