ABSTRACT
Representing words as embeddings in a continuous vector space has been proven to be successful in improving the performance in many natural language processing (NLP) tasks. Beyond the traditional methods that learn the embeddings from large text corpora, ensemble methods have been proposed to leverage the merits from pre-trained word embeddings as well as external semantic sources. In this paper, we propose a knowledge-enhanced ensemble method to combine both knowledge graphs and pre-trained word embedding models. Specifically, we interpret relations in knowledge graphs as linear translation from one word to another. We also propose a novel weighting scheme to further distinguish edges in the knowledge graph with same type of relation. Extensive experiments demonstrate that our proposed method is up to 20% times better than state-of-the-art in word analogy task and up to 16% times better than state-of-the-art in word similarity task.
- Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems. 2787-2795. Google ScholarDigital Library
- Elia Bruni, Nam-Khanh Tran, and Marco Baroni. 2014. Multimodal distributional semantics. Journal of Artificial Intelligence Research 49 (2014), 1-47. Google ScholarDigital Library
- Kai-Wei Chang, Wen-tau Yih, and Christopher Meek. 2013. Multi-relational latent semantic analysis. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1602-1612.Google Scholar
- Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Global rdf vector space embeddings. In International Semantic Web Conference. 190-207.Google ScholarDigital Library
- Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning. 160-167. Google ScholarDigital Library
- Boyang Ding, Quan Wang, Bin Wang, and Li Guo. 2018. Improving Knowledge Graph Embedding Using Simple Constraints. arXiv preprint arXiv:1805.02408(2018).Google Scholar
- Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard Hovy, and Noah A Smith. 2015. Retrofitting Word Vectors to Semantic Lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1606-1615.Google ScholarCross Ref
- Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th international conference on World Wide Web. 406-414. Google ScholarDigital Library
- Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2013. PPDB: The paraphrase database. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 758-764.Google Scholar
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855-864. Google ScholarDigital Library
- Felix Hill, Kyunghyun Cho, Sebastien Jean, Coline Devin, and Yoshua Bengio. 2014. Embedding word similarity with neural machine translation. arXiv preprint arXiv:1412.6448(2014).Google Scholar
- Eric H Huang, Richard Socher, Christopher D Manning, and Andrew Y Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 873-882. Google ScholarDigital Library
- Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. 2016. Knowledge Graph Completion with Adaptive Sparse Transfer Matrix.. In AAAI. 985-991. Google ScholarDigital Library
- Omer Levy and Yoav Goldberg. 2014. Linguistic regularities in sparse and explicit word representations. In Proceedings of the eighteenth conference on computational natural language learning. 171-180.Google ScholarCross Ref
- Yong Luo, Jian Tang, Jun Yan, Chao Xu, and Zheng Chen. 2014. Pre-Trained Multi-View Word Embedding Using Two-Side Neural Network.. In AAAI. 1982-1988. Google ScholarDigital Library
- Thang Luong, Richard Socher, and Christopher Manning. 2013. Better word representations with recursive neural networks for morphology. In Proceedings of the 17th Conference on Computational Natural Language Learning. 104-113.Google Scholar
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008), 2579-2605.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781(2013).Google Scholar
- George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39-41. Google ScholarDigital Library
- George A Miller and Walter G Charles. 1991. Contextual correlates of semantic similarity. Language and cognitive processes 6, 1 (1991), 1-28.Google Scholar
- Andriy Mnih and Geoffrey E Hinton. 2009. A scalable hierarchical distributed language model. In Advances in neural information processing systems. 1081-1088. Google ScholarDigital Library
- Avo Muromägi, Kairit Sirts, and Sven Laur. 2017. Linear Ensembles of Word Embedding Models. arXiv preprint arXiv:1704.01419(2017).Google Scholar
- Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data.. In ICML, Vol. 11. 809-816. Google ScholarDigital Library
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing. 1532-1543.Google ScholarCross Ref
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701-710. Google ScholarDigital Library
- Richard Socher, John Bauer, Christopher D Manning, 2013. Parsing with compositional vector grammars. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Vol. 1. 455-465.Google Scholar
- Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems. 926-934. Google ScholarDigital Library
- Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. 4444-4451. Google ScholarDigital Library
- Robyn Speer and Catherine Havasi. 2013. ConceptNet 5: A large semantic network for relational knowledge. In The People's Web Meets NLP. Springer, 161-176.Google Scholar
- Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. 2003. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. 41-47. Google ScholarDigital Library
- Yuta Tsuboi. 2014. Neural networks leverage corpus-wide information for part-of-speech tagging. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 938-950.Google ScholarCross Ref
- Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics. 384-394. Google ScholarDigital Library
- Arnold D Well and Jerome L Myers. 2003. Research design & statistical analysis. Psychology Press.Google Scholar
- Chang Xu, Yalong Bai, Jiang Bian, Bin Gao, Gang Wang, Xiaoguang Liu, and Tie-Yan Liu. 2014. Rc-net: A general framework for incorporating knowledge into word representations. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 1219-1228. Google ScholarDigital Library
- Wenpeng Yin and Hinrich Schütze. 2016. Learning Word Meta-Embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Vol. 1. 1351-1360.Google ScholarCross Ref
- Mo Yu and Mark Dredze. 2014. Improving lexical embeddings with semantic knowledge. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 2. 545-550.Google ScholarCross Ref
Recommendations
Composing Word Embeddings for Compound Words Using Linguistic Knowledge
In recent years, the use of distributed representations has been a fundamental technology for natural language processing. However, Japanese has multiple compound words, and often we must compare the meanings of a word and a compound word. Moreover, word ...
Incorporating Prior Knowledge into Word Embedding for Chinese Word Similarity Measurement
Word embedding-based methods have received increasing attention for their flexibility and effectiveness in many natural language-processing (NLP) tasks, including Word Similarity (WS). However, these approaches rely on high-quality corpus and neglect ...
Contextual Compositionality Detection with External Knowledge Bases and Word Embeddings
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceWhen the meaning of a phrase cannot be inferred from the individual meanings of its words (e.g., hot dog), that phrase is said to be non-compositional. Automatic compositionality detection in multi-word phrases is critical in any application of semantic ...
Comments