Abstract
Web-based mining is a promising approach for entity translation. Traditional web-based mining methods construct spotting queries in a heuristic, one-step paradigm, which cannot resolve the diversity of entity names and is incapable of leveraging the information in the returned bilingual pages for iterative spotting query refinement. To resolve the above drawbacks, this paper proposes a reinforcement learning-based method, which models web-based entity translation mining as a Markov Decision Process (MDP). Specifically, we regard the query construction as a multi-turn, state-to-action mapping procedure, and learn a Dueling Deep Q-network based clue selection agent which can adaptively select spotting clues based on both short-term and long-term benefits. Experiments verified the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The BIO tags are widely used in the sequence labeling tasks, where B, I, and O refer to the beginning, inside and outside token of the chunks in target types.
- 2.
The translation pairs in the LDC datasets can be easily fetched from a specific website (http://www.ichacha.com), leading to almost static search results with varied queries. We therefore construct a new dataset. The person, company, and film names are correspondingly collected from Wikidata (https://www.wikidata.org), Forbes (https://www.forbes.com), and IMDB (http://www.imdb.com).
- 3.
Dataset and source code are available at https://github.com/lingyongyan/entity_translation.
References
Fang, G., Yu, H., Nishino, F.: Chinese-English term translation mining based on semantic prediction. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pp. 199–206. Association for Computational Linguistics (2006)
Ge, Y.D., Hong, Yu., Yao, J.M., Zhu, Q.M.: Improving web-based OOV translation mining for query translation. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 576–587. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17187-1_54
Hassan, A., Fahmy, H., Hassan, H.: Improving named entity translation by exploiting comparable and parallel corpora. In: Proceedings of the Conference on Recent Advances in Natural Language Processing, Borovets, Bulgaria, pp. 1–6 (2007)
Hsu, C.C., Chen, C.H.: Mining synonymous transliterations from the World Wide Web. ACM Trans. Asian Lang. Inf. Process. 9(1), 1 (2010)
Huang, F., Vogel, S.: Improved named entity translation and bilingual named entity extraction. In: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, pp. 253–258 (2002)
Huang, F., Zhang, Y., Vogel, S.: Mining key phrase translations from web corpora. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 483–490 (2005)
Jiang, L., Zhou, M., Chien, L.F., Niu, C.: Named entity translation with web mining and transliteration. In: Proceedings of the 20th international Joint Conference on Artificial Intelligence, vol. 7, pp. 1629–1634 (2007)
Kim, J., Jiang, L., Hwang, S.W., Song, Y.I., Zhou, M.: Mining entity translations from comparable corpora: a holistic graph mapping approach. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1295–1304. ACM (2011)
Kumar, G., Foster, G., Cherry, C., Krikun, M.: Reinforcement learning based curriculum optimization for neural machine translation. arXiv:1903.00041 (2019)
Lee, T., Hwang, S.W.: Bootstrapping entity translation on weakly comparable corpora. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 631–640 (2013)
Liu, L., Ge, Y.D., Yan, Z.X., Yao, J.M.: A CLIR-oriented OOV translation mining method from bilingual webpages. In: Proceedings of the 2011 International Conference on Machine Learning and Cybernetics, pp. 1872–1877. IEEE (2011)
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Narasimhan, K., Yala, A., Barzilay, R.: Improving information extraction by acquiring external evidence with reinforcement learning. In: Proceedings of the Conference on the Empirical Methods in Natural Language Processing, pp. 2355–2365 (2016)
Nguyen, K., Daumé III, H., Boyd-Graber, J.: Reinforcement learning for bandit neural machine translation with simulated human feedback. In: Proceedings of the Conference on the Empirical Methods in Natural Language Processing, pp. 1464–1474 (2017)
Qin, P., XU, W., Wang, W.Y.: Robust distant supervision relation extraction via deep reinforcement learning. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 2137–2147 (2018)
Qu, J., Nguyen, L.M., Shimazu, A.: Cross-language information extraction and auto evaluation for OOV term translations. China Commun. 13(12), 277–296 (2016)
Ren, F.: A practical Chinese-English ON translation method based on ON’s distribution characteristics on the web. In: Proceedings of COLING 2012: Demonstration Papers, pp. 377–384 (2012)
Ren, F., Zhu, J., Wang, H.: Translate Chinese organization names using examples and web. In: Natural Language Processing and Knowledge Engineering, pp. 1–7. IEEE (2009)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, pp. 1995–2003 (2016)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992). https://doi.org/10.1007/BF00992698
Wu, J.C., Chang, J.S.: Learning to find English to Chinese transliterations on the web. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 996–1004 (2007)
Yang, F., Zhao, J., Liu, K.: A Chinese-English organization name translation system using heuristic web mining and asymmetric alignment. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 387–395 (2009)
You, G.w., Cha, Y.r., Kim, J., Hwang, S.w.: Enriching entity translation discovery using selective temporality. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, pp. 201–205. Association for Computational Linguistics, August 2013
You, G.w., Hwang, S.w., Song, Y.I., Jiang, L., Nie, Z.: Mining name translations from entity graph mapping. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 430–439 (2010)
Zhang, Y., Huang, F., Vogel, S.: Mining translations of OOV terms from the web through cross-lingual query expansion. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and development in Information Retrieval, pp. 669–670. ACM (2005)
Zhang, Y., Su, Y., Jin, C., Zhang, T.: Multi-feature representation for web-based English-Chinese OOV term translation. In: International Conference on Machine Learning and Cybernetics, ICMLC 2011, Proceedings, pp. 1515–1519. IEEE (2011)
Zhao, Y., Zhu, Q., Jin, C., Zhang, Y., Huang, X., Zhang, T.: Chinese-English OOV term translation with web mining, multiple feature fusion and supervised learning. In: Sun, M., Liu, Y., Zhao, J. (eds.) CCL/NLP-NABD-2014. LNCS (LNAI), vol. 8801, pp. 234–246. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12277-9_21
Acknowledge
This research work is supported by the National Key Research and Development Program of China under Grant No. 2017YFB1002104, the National Natural Science Foundation of China under Grants no. U1936207, Beijing Academy of Artificial Intelligence (BAAI2019QN0502), and in part by the Youth Innovation Promotion Association CAS (2018141).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yan, L., Han, X., Sun, L. (2021). Reinforcement Learning for Clue Selection in Web-Based Entity Translation Mining. In: Chen, H., Liu, K., Sun, Y., Wang, S., Hou, L. (eds) Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence. CCKS 2020. Communications in Computer and Information Science, vol 1356. Springer, Singapore. https://doi.org/10.1007/978-981-16-1964-9_6
Download citation
DOI: https://doi.org/10.1007/978-981-16-1964-9_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1963-2
Online ISBN: 978-981-16-1964-9
eBook Packages: Computer ScienceComputer Science (R0)