Skip to main content

Reinforcement Learning for Clue Selection in Web-Based Entity Translation Mining

  • Conference paper
  • First Online:
Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence (CCKS 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1356))

Included in the following conference series:

  • 890 Accesses

Abstract

Web-based mining is a promising approach for entity translation. Traditional web-based mining methods construct spotting queries in a heuristic, one-step paradigm, which cannot resolve the diversity of entity names and is incapable of leveraging the information in the returned bilingual pages for iterative spotting query refinement. To resolve the above drawbacks, this paper proposes a reinforcement learning-based method, which models web-based entity translation mining as a Markov Decision Process (MDP). Specifically, we regard the query construction as a multi-turn, state-to-action mapping procedure, and learn a Dueling Deep Q-network based clue selection agent which can adaptively select spotting clues based on both short-term and long-term benefits. Experiments verified the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The BIO tags are widely used in the sequence labeling tasks, where B, I, and O refer to the beginning, inside and outside token of the chunks in target types.

  2. 2.

    The translation pairs in the LDC datasets can be easily fetched from a specific website (http://www.ichacha.com), leading to almost static search results with varied queries. We therefore construct a new dataset. The person, company, and film names are correspondingly collected from Wikidata (https://www.wikidata.org), Forbes (https://www.forbes.com), and IMDB (http://www.imdb.com).

  3. 3.

    Dataset and source code are available at https://github.com/lingyongyan/entity_translation.

References

  1. Fang, G., Yu, H., Nishino, F.: Chinese-English term translation mining based on semantic prediction. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pp. 199–206. Association for Computational Linguistics (2006)

    Google Scholar 

  2. Ge, Y.D., Hong, Yu., Yao, J.M., Zhu, Q.M.: Improving web-based OOV translation mining for query translation. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 576–587. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17187-1_54

    Chapter  Google Scholar 

  3. Hassan, A., Fahmy, H., Hassan, H.: Improving named entity translation by exploiting comparable and parallel corpora. In: Proceedings of the Conference on Recent Advances in Natural Language Processing, Borovets, Bulgaria, pp. 1–6 (2007)

    Google Scholar 

  4. Hsu, C.C., Chen, C.H.: Mining synonymous transliterations from the World Wide Web. ACM Trans. Asian Lang. Inf. Process. 9(1), 1 (2010)

    Article  Google Scholar 

  5. Huang, F., Vogel, S.: Improved named entity translation and bilingual named entity extraction. In: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, pp. 253–258 (2002)

    Google Scholar 

  6. Huang, F., Zhang, Y., Vogel, S.: Mining key phrase translations from web corpora. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 483–490 (2005)

    Google Scholar 

  7. Jiang, L., Zhou, M., Chien, L.F., Niu, C.: Named entity translation with web mining and transliteration. In: Proceedings of the 20th international Joint Conference on Artificial Intelligence, vol. 7, pp. 1629–1634 (2007)

    Google Scholar 

  8. Kim, J., Jiang, L., Hwang, S.W., Song, Y.I., Zhou, M.: Mining entity translations from comparable corpora: a holistic graph mapping approach. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1295–1304. ACM (2011)

    Google Scholar 

  9. Kumar, G., Foster, G., Cherry, C., Krikun, M.: Reinforcement learning based curriculum optimization for neural machine translation. arXiv:1903.00041 (2019)

  10. Lee, T., Hwang, S.W.: Bootstrapping entity translation on weakly comparable corpora. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 631–640 (2013)

    Google Scholar 

  11. Liu, L., Ge, Y.D., Yan, Z.X., Yao, J.M.: A CLIR-oriented OOV translation mining method from bilingual webpages. In: Proceedings of the 2011 International Conference on Machine Learning and Cybernetics, pp. 1872–1877. IEEE (2011)

    Google Scholar 

  12. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)

    Google Scholar 

  13. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  14. Narasimhan, K., Yala, A., Barzilay, R.: Improving information extraction by acquiring external evidence with reinforcement learning. In: Proceedings of the Conference on the Empirical Methods in Natural Language Processing, pp. 2355–2365 (2016)

    Google Scholar 

  15. Nguyen, K., Daumé III, H., Boyd-Graber, J.: Reinforcement learning for bandit neural machine translation with simulated human feedback. In: Proceedings of the Conference on the Empirical Methods in Natural Language Processing, pp. 1464–1474 (2017)

    Google Scholar 

  16. Qin, P., XU, W., Wang, W.Y.: Robust distant supervision relation extraction via deep reinforcement learning. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 2137–2147 (2018)

    Google Scholar 

  17. Qu, J., Nguyen, L.M., Shimazu, A.: Cross-language information extraction and auto evaluation for OOV term translations. China Commun. 13(12), 277–296 (2016)

    Article  Google Scholar 

  18. Ren, F.: A practical Chinese-English ON translation method based on ON’s distribution characteristics on the web. In: Proceedings of COLING 2012: Demonstration Papers, pp. 377–384 (2012)

    Google Scholar 

  19. Ren, F., Zhu, J., Wang, H.: Translate Chinese organization names using examples and web. In: Natural Language Processing and Knowledge Engineering, pp. 1–7. IEEE (2009)

    Google Scholar 

  20. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  21. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, pp. 1995–2003 (2016)

    Google Scholar 

  22. Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992). https://doi.org/10.1007/BF00992698

    Article  MATH  Google Scholar 

  23. Wu, J.C., Chang, J.S.: Learning to find English to Chinese transliterations on the web. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 996–1004 (2007)

    Google Scholar 

  24. Yang, F., Zhao, J., Liu, K.: A Chinese-English organization name translation system using heuristic web mining and asymmetric alignment. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 387–395 (2009)

    Google Scholar 

  25. You, G.w., Cha, Y.r., Kim, J., Hwang, S.w.: Enriching entity translation discovery using selective temporality. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, pp. 201–205. Association for Computational Linguistics, August 2013

    Google Scholar 

  26. You, G.w., Hwang, S.w., Song, Y.I., Jiang, L., Nie, Z.: Mining name translations from entity graph mapping. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 430–439 (2010)

    Google Scholar 

  27. Zhang, Y., Huang, F., Vogel, S.: Mining translations of OOV terms from the web through cross-lingual query expansion. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and development in Information Retrieval, pp. 669–670. ACM (2005)

    Google Scholar 

  28. Zhang, Y., Su, Y., Jin, C., Zhang, T.: Multi-feature representation for web-based English-Chinese OOV term translation. In: International Conference on Machine Learning and Cybernetics, ICMLC 2011, Proceedings, pp. 1515–1519. IEEE (2011)

    Google Scholar 

  29. Zhao, Y., Zhu, Q., Jin, C., Zhang, Y., Huang, X., Zhang, T.: Chinese-English OOV term translation with web mining, multiple feature fusion and supervised learning. In: Sun, M., Liu, Y., Zhao, J. (eds.) CCL/NLP-NABD-2014. LNCS (LNAI), vol. 8801, pp. 234–246. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12277-9_21

    Chapter  Google Scholar 

Download references

Acknowledge

This research work is supported by the National Key Research and Development Program of China under Grant No. 2017YFB1002104, the National Natural Science Foundation of China under Grants no. U1936207, Beijing Academy of Artificial Intelligence (BAAI2019QN0502), and in part by the Youth Innovation Promotion Association CAS (2018141).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lingyong Yan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yan, L., Han, X., Sun, L. (2021). Reinforcement Learning for Clue Selection in Web-Based Entity Translation Mining. In: Chen, H., Liu, K., Sun, Y., Wang, S., Hou, L. (eds) Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence. CCKS 2020. Communications in Computer and Information Science, vol 1356. Springer, Singapore. https://doi.org/10.1007/978-981-16-1964-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-1964-9_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-1963-2

  • Online ISBN: 978-981-16-1964-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics