skip to main content
10.1145/3534678.3539188acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

TaxoTrans: Taxonomy-Guided Entity Translation

Published: 14 August 2022 Publication History

Abstract

Taxonomies describe the definitions of entities, entities' attributes and the relations among the entities, and thus play an important role in building a knowledge graph. In this paper, we tackle the task of taxonomy entity translation, which is to translate the names of taxonomy entities in a source language to a target language. The translations then can be utilized to build a knowledge graph in the target language. Despite its importance, taxonomy entity translation remains a hard problem for AI models due to two major challenges. One challenge is understanding the semantic context in very short entity names. Another challenge is having deep understanding for the domain where the knowledge graph is built.
We present TaxoTrans, a novel method for taxonomy entity translation that can capture the context in entity names and the domain knowledge in taxonomy. To achieve this, TaxoTrans creates a heterogeneous graph to connect entities, and formulates the entity name translation problem as link prediction in the heterogeneous graph: given a pair of entity names across two languages, TaxoTrans applies a graph neural network to determine whether they form a translation pair or not. Because of this graph, TaxoTrans can capture both the semantic context and the domain knowledge. Our offline experiments on LinkedIn's skill and title taxonomies show that by modeling semantic information and domain knowledge in the heterogeneous graph, TaxoTrans outperforms the state-of-the-art translation methods by ∼10%. Human annotation and A/B test results further demonstrate that the accurately translated entities significantly improves user engagements and advertising revenue at LinkedIn.

References

[1]
[n.d.]. Google Knowledge Graph. https://en.wikipedia.org/wiki/Google_ Knowledge_Graph
[2]
[n.d.]. LinkedIn Economic Graph. https://economicgraph.linkedin.com/
[3]
Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In EMNLP. 2289--2294.
[4]
Mikel Artetxe and Holger Schwenk. 2019. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. TACL 7 (2019), 597--610.
[5]
Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.
[6]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. TACL 5 (2017), 135--146.
[7]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Neural Information Processing Systems (NIPS). 1--9.
[8]
Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc Le. 2017. Massive Exploration of Neural Machine Translation Architectures. In EMNLP. 1442--1451.
[9]
Yixin Cao, Zhiyuan Liu, Chengjiang Li, Juanzi Li, and Tat-Seng Chua. 2019. Multi-Channel Graph Neural Network for Entity Alignment. In ACL. 1452--1461.
[10]
Muhao Chen, Yingtao Tian, Mohan Yang, and Carlo Zaniolo. 2017. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In IJCAI. 1511--1517.
[11]
Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, et al. 2018. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. In ACL. 76--86.
[12]
Xilun Chen and Claire Cardie. 2018. Unsupervised Multilingual Word Embeddings. In EMNLP.
[13]
Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In EMNLP.
[14]
Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. Word translation without parallel data. In ICLR.
[15]
Michael AA Cox and Trevor F Cox. 2008. Multidimensional scaling. In Handbook of data visualization. Springer, 315--347.
[16]
Long Duong, Hiroshi Kanayama, Tengfei Ma, Steven Bird, and Trevor Cohn. 2016. Learning Crosslingual Word Embeddings without Bilingual Corpora. In EMNLP. 1285--1295.
[17]
Manaal Faruqui and Chris Dyer. 2014. Improving vector space word representations using multilingual correlation. In EACL. 462--471.
[18]
Yuqing Gao, Jisheng Liang, Benjamin Han, Mohamed Yakout, and Ahmed Mohamed. 2018. Building a large-scale, accurate and fresh knowledge graph. KDD2018, Tutorial 39 (2018), 1939--1374.
[19]
Stephan Gouws, Yoshua Bengio, and Greg Corrado. 2015. Bilbowa: Fast bilingual distributed representations without word alignments. In ICML. PMLR, 748--756.
[20]
Edouard Grave, Armand Joulin, and Quentin Berthet. 2019. Unsupervised alignment of embeddings with wasserstein procrustes. In International Conference on Artificial Intelligence and Statistics. PMLR, 1880--1890.
[21]
Yanchao Hao, Yuanzhe Zhang, Shizhu He, Kang Liu, and Jun Zhao. 2016. A joint embedding method for entity alignment of knowledge bases. In China Conference on Knowledge Graph and Semantic Computing. Springer, 3--14.
[22]
Geert Heyman, Ivan Vuli?, and Marie Francine Moens. 2017. Bilingual lexicon induction by learning to combine word-level and character-level representations. In EACL. 1085--1095.
[23]
Yedid Hoshen and Lior Wolf. 2018. Non-Adversarial Unsupervised Word Translation. In EMNLP.
[24]
Jin Huang, Zhaochun Ren, Wayne Xin Zhao, Gaole He, Ji-Rong Wen, and Daxiang Dong. 2019. Taxonomy-aware multi-hop reasoning networks for sequential recommendation. In WSDM. 573--581.
[25]
Armand Joulin, Edouard Grave, and Piotr Bojanowski Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. EACL (2017), 427.
[26]
Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In EMNLP. 1700--1709.
[27]
Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In ICLR.
[28]
Chengjiang Li, Yixin Cao, Lei Hou, Jiaxin Shi, Juanzi Li, and Tat-Seng Chua. 2019. Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model. In EMNLP-IJCNLP. 2723--2732.
[29]
Shan Li, Baoxu Shi, Jaewon Yang, Ji Yan, Shuai Wang, Fei Chen, and Qi He. 2020. Deep job understanding at linkedin. In SIGIR. 2145--2148.
[30]
Robert Litschko, Goran Glava, Simone Paolo Ponzetto, and Ivan Vuli?. 2018. Unsupervised cross-lingual information retrieval using monolingual data only. In SIGIR. 1253--1256.
[31]
Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In EMNLP. 1412-- 1421.
[32]
Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos, and Jiawei Han. 2020. Octet: Online Catalog Taxonomy Enrichment with Self-Supervision. In KDD. 2247--2257.
[33]
Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, and Jiawei Han. 2020. Hierarchical topic mining via joint spherical tree and text embedding. In KDD. 1908--1917.
[34]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS. 3111--3119.
[35]
Shichao Pei, Lu Yu, Guoxian Yu, and Xiangliang Zhang. 2020. Rea: Robust cross-lingual entity alignment between knowledge graphs. In KDD. 2175--2184.
[36]
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European semantic web conference. Springer, 593--607.
[37]
Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-Attention with Relative Position Representations. In NAACL-HLT. 464--468.
[38]
Samuel L Smith, David HP Turban, Steven Hamblin, and Nils Y Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In ICLR.
[39]
Zequn Sun, Wei Hu, and Chengkai Li. 2017. Cross-lingual entity alignment via joint attribute-preserving embedding. In International Semantic Web Conference. Springer, 628--644.
[40]
Zequn Sun, Wei Hu, Qingheng Zhang, and Yuzhong Qu. 2018. Bootstrapping Entity Alignment with Knowledge Graph Embedding. In IJCAI, Vol. 18. 4396-- 4402.
[41]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to Sequence Learning with Neural Networks. Advances in Neural Information Processing Systems 27 (2014), 3104--3112.
[42]
Gongbo Tang, Mathias Müller, Annette Rios, and Rico Sennrich. 2018. Why SelfAttention? A Targeted Evaluation of Neural Machine Translation Architectures. In EMNLP.
[43]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS.
[44]
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In ICLR.
[45]
Marie-Francine Moens. 2015. Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In SIGIR. 363--372.
[46]
Zhichun Wang, Qingsong Lv, Xiaohan Lan, and Yu Zhang. 2018. Cross-lingual knowledge graph alignment via graph convolutional networks. In EMNLP. 349-- 357.
[47]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).
[48]
Chao Xing, Dong Wang, Chao Liu, and Yiye Lin. 2015. Normalized word embedding and orthogonal transform for bilingual word translation. In NAACL-HLT. 1006--1011.
[49]
Kun Xu, Liwei Wang, Mo Yu, Yansong Feng, Yan Song, Zhiguo Wang, and Dong Yu. 2019. Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network. In ACL. 3156--3161.
[50]
Xiao Yan, Ada Ma, Jaewon Yang, Lin Zhu, How Jing, Jacob Bollinger, and Qi He. 2021. Contextual Skill Proficiency via Multi-task Learning at LinkedIn. In CIKM. 4273--4282.
[51]
Xiao Yan, Jaewon Yang, Mikhail Obukhov, Lin Zhu, Joey Bai, Shiqi Wu, and Qi He. 2019. Social skill validation at LinkedIn. In KDD. 2943--2951.
[52]
Hsiu-Wei Yang, Yanyan Zou, Peng Shi, Wei Lu, Jimmy Lin, and SUN Xu. 2019. Aligning Cross-Lingual Entities with Multi-Aspect Information. In EMNLPIJCNLP. 4422--4432.
[53]
Yuchen Zhang, Amr Ahmed, Vanja Josifovski, and Alexander Smola. 2014. Taxonomy discovery for personalized recommendation. In WSDM. 243--252.
[54]
Hao Zhu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2017. Iterative Entity Alignment via Joint Knowledge Embeddings. In IJCAI, Vol. 17. 4258--4264.

Cited By

View all
  • (2024)-generAItor: Tree-in-the-loop Text Generation for Language Model Explainability and AdaptationACM Transactions on Interactive Intelligent Systems10.1145/365202814:2(1-32)Online publication date: 5-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2022
5033 pages
ISBN:9781450393850
DOI:10.1145/3534678
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. entity translation
  2. graph neural network
  3. knowledge graph
  4. taxonomy

Qualifiers

  • Research-article

Conference

KDD '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)-generAItor: Tree-in-the-loop Text Generation for Language Model Explainability and AdaptationACM Transactions on Interactive Intelligent Systems10.1145/365202814:2(1-32)Online publication date: 5-Jun-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media