Abstract
Word embeddings are powerful for capturing semantic similarity between words in a vocabulary. They have been demonstrated beneficial to various natural language processing tasks such as language modeling, part-of-speech tagging and machine translation, etc. Existing embedding methods derive word vectors from the co-occurrence statistics of target-context word pairs. They treat the context words of a target word equally, while not all contexts are created equal for a target. Some recent work learns non-uniform weights of the contexts for predicting the target, while none of them take the semantic relation types of target-context pairs into consideration. This paper observes co-hyponyms usually have similar contexts and can be substitutes of one another. To this end, this paper proposes a simple but effective method to improve word embeddings. It automatically identifies possible co-hyponyms within the context window and optimizes the embeddings of co-hyponyms to be close directly. Compared to 3 state-of-the-art neural embedding models, the proposed model performs better on several datasets of different languages in terms of the human similarity judgement and the language modeling tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ballesteros, M., Dyer, C., Smith, N.A.: Improved transition-based parsing by modeling characters instead of words with LSTMs. In: EMNLP, pp. 349–359 (2015)
Baroni, M., Lenci, A.: Distributional memory: a general framework for corpus-based semantics. Comput. Linguist. 36(4), 673–721 (2010)
Baskaya, O., Sert, E., Cirik, V., Yuret, D.: AI-KU: using substitute vectors and co-occurrence modeling for word sense induction and disambiguation. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 300–306 (2013)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(2), 1137–1155 (2003)
Bojanowski, P., Joulin, A., Mikolov, T.: Alternative structures for character-level RNNs. In: Workshop on International Conference of Learning Representation (2016)
Botha, J., Blunsom, P.: Compositional morphology for word representations and language modelling. In: International Conference on Machine Learning, pp. 1899–1907 (2014)
Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behav. Res. Methods 44(3), 890–907 (2012)
Cirik, V., Yuret, D.: Substitute based SCODE word embeddings in supervised NLP tasks. arXiv preprint arXiv:1407.6853 (2014)
Cohen, R., Goldberg, Y., Elhadad, M.: Domain adaptation of a dependency parser with a class-class selectional preference model. In: Proceedings of ACL 2012 Student Research Workshop, pp. 43–48. Association for Computational Linguistics (2012)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: ICML, pp. 160–167. ACM (2008)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)
Dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: COLING, pp. 69–78 (2014)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)
Finkelstein, L., et al.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, pp. 406–414. ACM (2001)
Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 767–778. Springer, Heidelberg (2005). https://doi.org/10.1007/11562214_67
Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
Hassan, S., Mihalcea, R.: Cross-lingual semantic relatedness using encyclopedic knowledge. In: Conference on Empirical Methods in Natural Language Processing: Volume, pp. 1192–1201 (2009)
Joubarne, C., Inkpen, D.: Comparison of semantic similarity for different languages using the Google n-gram corpus and second-order co-occurrence measures. In: Butz, C., Lingras, P. (eds.) AI 2011. LNCS (LNAI), vol. 6657, pp. 216–221. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21043-3_26
Jurgens, D., Klapaftis, I.: Semeval-2013 task 13: word sense induction for graded and non-graded senses. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 290–299 (2013)
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)
Ling, W., et al.: Not all contexts are created equal: better word representations with variable attention. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1367–1372 (2015)
Liu, L., Ruiz, F., Athey, S., Blei, D.: Context selection for embedding models. In: NIPS, pp. 4819–4828 (2017)
Luong, T., Socher, R., Manning, C.D.: Better word representations with recursive neural networks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 104–113 (2013)
Maron, Y., Lamar, M., Bienenstock, E.: Sphere embedding: an application to part-of-speech induction. In: Advances in Neural Information Processing Systems, pp. 1567–1575 (2010)
Melamud, O., McClosky, D., Patwardhan, S., Bansal, M.: The role of context types and dimensionality in learning word embeddings. In: NAACL, pp. 1030–1040 (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)
Panchenko, A., et al.: Human and machine judgements for Russian semantic relatedness. AIST 2016. CCIS, vol. 661, pp. 221–235. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52920-2_21
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. In: Annual Meeting of the Association for Computational Linguistics, pp. 135–146 (2017)
Ritter, A., Mausam, Etzioni, O.: A latent Dirichlet allocation method for selectional preferences. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 424–434. Association for Computational Linguistics (2010)
Rudolph, M., Ruiz, F., Mandt, S., Blei, D.: Exponential family embeddings. In: Advances in Neural Information Processing Systems, pp. 478–486 (2016)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)
Santos, C.D., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: ICML, pp. 1818–1826 (2014)
Séaghdha, D.O.: Latent variable models of selectional preference. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 435–444. Association for Computational Linguistics (2010)
Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)
Yatbaz, M.A., Sert, E., Yuret, D.: Learning syntactic categories using paradigmatic representations of word context. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 940–951. Association for Computational Linguistics (2012)
Zesch, T., Gurevych, I.: Automatically creating datasets for measures of semantic relatedness. In: The Workshop on Linguistic Distances, pp. 16–24 (2006)
Acknowledgements
This research is supported by National Natural Science Foundation of China (No. 61772289), Natural Science Foundation of Tianjin (No. 16JCQNJC00500) and Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Cai, X., Luo, Y., Zhang, Y., Yuan, X. (2018). Improving Word Embeddings by Emphasizing Co-hyponyms. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds) Web Information Systems and Applications. WISA 2018. Lecture Notes in Computer Science(), vol 11242. Springer, Cham. https://doi.org/10.1007/978-3-030-02934-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-02934-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02933-3
Online ISBN: 978-3-030-02934-0
eBook Packages: Computer ScienceComputer Science (R0)