Skip to main content

Learning Phrase Representations Based on Word and Character Embeddings

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9950))

Included in the following conference series:

Abstract

Most phrase embedding methods consider a phrase as a basic term and learn embeddings according to phrases’ external contexts, ignoring the internal structures of words and characters. There are some languages such as Chinese, a phrase is usually composed of several words or characters and contains rich internal information. The semantic meaning of a phrase is also related to the meanings of its composing words or characters. Therefore, we take Chinese for example, and propose a joint words and characters embedding model for learning phrase representation. In order to disambiguate the word and character and address the issue of non-compositional phrases, we present multiple-prototype word and character embeddings and an effective phrase selection method. We evaluate the effectiveness of the proposed model on phrase similarities computation and analogical reasoning. The empirical result shows that our model outperforms other baseline methods which ignore internal word and character information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://dumps.wikimedia.org/zhwiki/latest/zhwiki-latest-pages-articles.xml.bz2.

  2. 2.

    https://github.com/BYVoid/OpenCC.

References

  1. Chen, X., Xu, L., Liu, Z., Sun, M., Luan, H.: Joint learning of character and word embeddings. In: Proceedings of the Twenty-Fourth IJCAI, pp. 1236–1242 (2015)

    Google Scholar 

  2. Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on EMNLP, pp. 1724–1734. Association for Computational Linguistics (2014)

    Google Scholar 

  3. Huang, E., Socher, R., Manning, C., Ng, A.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the ACL, pp. 873–882. Association for Computational Linguistics (2012)

    Google Scholar 

  4. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013)

    Google Scholar 

  5. Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the NAACL: HLT, pp. 746–751. Association for Computational Linguistics (2013)

    Google Scholar 

  6. Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)

    Article  Google Scholar 

  7. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on EMNLP, pp. 1532–1543. Association for Computational Linguistics, Doha (2014)

    Google Scholar 

  8. Quoc, L., Tomas, M.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1188–1196. JMLR.org, Beijing (2014)

  9. Socher, R., Bauer, J., Manning, C.D., Andrew, Y., N.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the ACL, pp. 455–465. Association for Computational Linguistics (2013)

    Google Scholar 

  10. Turian, J., Ratinov, L.A., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the ACL, pp. 384–394. Association for Computational Linguistics, July 2010

    Google Scholar 

  11. Yu, Z., Zhiyuan, L., Maosong, S.: Phrase type sensitive tensor indexing model for semantic composition. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2195–2201 (2015)

    Google Scholar 

Download references

Acknowledgements

We thank the reviewers for their valuable comments and suggestions. This work is supported by grants: State Key Program of National Natural Science Foundation of China (61133012), National Natural Science Foundation of China (61373108 and 61170148), Humanities and Social Science Foundation of Ministry of Education of China (16YJCZH004), China Postdoctoral Science Foundation (2013M540593, 2014T70722).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donghong Ji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Huang, J., Ji, D., Yao, S., Huang, W., Chen, B. (2016). Learning Phrase Representations Based on Word and Character Embeddings. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9950. Springer, Cham. https://doi.org/10.1007/978-3-319-46681-1_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46681-1_65

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46680-4

  • Online ISBN: 978-3-319-46681-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics