Identifying Word Translations in Scientific Literature Based on Labeled Bilingual Topic Model and Co-occurrence Features

Tian, Mingjie; Zhao, Yahui; Cui, Rongyi

doi:10.1007/978-3-030-01716-3_7

Mingjie Tian¹⁸,
Yahui Zhao¹⁸ &
Rongyi Cui¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11221))

Included in the following conference series:

1394 Accesses
5 Citations

Abstract

Aiming at the increasingly rich multi language information resources and multi-label data in scientific literature, in order to mining the relevance and correlation in languages, this paper proposed the labeled bilingual topic model and co-occurrence feature based similarity metric which could be adopted to the word translation identifying task. First of all, it could assume that the keywords in the scientific literature are relevant to the abstract in the same article, then extracted the keywords and regard it as labels, labels with topics are assigned and the “latent” topic was instantiated. Secondly, the abstracts in article were trained by the labeled bilingual topic model and got the word representation on the topic distribution. Finally, the most similar word between both languages was matched with similarity metric proposed in this paper. The experiment result shows that the labeled bilingual topic model reaches better precision than “latent” topic model based bilingual model, and co-occurrence features enhance the attractiveness of the bilingual word pairs to improve the identifying effects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Diab, M.T., Finch, S.: A statistical translation model using comparable corpora. In: Proceedings of the 2000 Conference on Content-Based Multi-media Information Access, pp. 1500–1508 (2000)
Google Scholar
Koehn, P., Knight, K.: Learning a translation lexicon from monolingual corpora. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, vol. 9, pp. 9–16. ACL, Stroudsburg (2002)
Google Scholar
Gaussier, E., Renders, J.M., Matveeva, I., Goutte, C., Déjean, H.: A geometric view on bilingual lexicon extraction from comparable corpora. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pp. 526–533. ACL, Stroudsburg (2004)
Google Scholar
Boyd-Graber, J., Blei, D.M.: Multilingual topic models for unaligned text. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 75–82. AUAI Press, Arlington (2009)
Google Scholar
Ni, X., Sun, J.T., Hu, J., Chen, Z.: Mining multilingual topics from Wikipedia. In: Proceedings of the 18th International World Wide Web Conference, pp. 1155–1156. ACM, New York (2009)
Google Scholar
Mimno, D., Wallach, H.M., Naradowsky, J., Smith, D.A., McCallum, A.: Polylingual topic models. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 880–889. ACL, Stroudsburg (2009)
Google Scholar
De Smet, W., Moens, M.F.: Cross language linking of news stories on the web using interlingual topic modelling. In: Proceedings of the 2nd ACM Workshop on Social Web Search and Mining, pp. 57–64. ACM, New York (2009)
Google Scholar
Vulić, I., De Smet, W., Moens, M.F.: Identifying word translations from comparable corpora using latent topic models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, vol. 2, pp. 479–484. ACL, Stroudsburg (2011)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Google Scholar
Qian, X.U., Zhou, J., Chen, J.: Dirichlet process and its applications in natural language processing. J. Chin. Inf. Process. 23(5), 25–33 (2009)
Google Scholar
Xu, G., Wang, H.F.: The development of topic models in natural language processing. Chin. J. Comput. 34(8), 1423–1436 (2011)
Article MathSciNet Google Scholar
Fang, A., Macdonald, C., Ounis, I., Habel, P., Yang, X.: Exploring time-sensitive variational Bayesian inference LDA for social media data. In: Jose, J.M., et al. (eds.) ECIR 2017. LNCS, vol. 10193, pp. 252–265. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56608-5_20
Chapter Google Scholar
Aiping, W., Gongying, Z., Fang, L.: Research and application of EM algorithm. Comput. Technol. Dev. 19(9), 108–110 (2009)
Google Scholar
Heinrich, G.: Parameter estimation for text analysis. Technical report (2008)
Google Scholar
Yerebakan, H.Z., Dundar, M.: Partially collapsed parallel Gibbs sampler for Dirichlet process mixture models. Pattern Recogn. Lett. 90, 22–27 (2017)
Article Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Goodstein, R.L., Harris, Z.: Mathematical structures of language. Math. Gaz. 54(388), 173 (1970)
Google Scholar
Bajpai, P., Verma, P.: Improved query translation for English to Hindi cross language information retrieval. Indones. J. Electr. Eng. Inf. 4(2), 134–140 (2016)
Google Scholar
Liu, J., Cui, R.Y., Zhao, Y.H.: Cross-lingual similar documents retrieval based on co-occurrence projection. In: Proceedings of the 6th International Conference on Computer Science and Network Technology, pp. 11–15. IEEE (2017)
Google Scholar

Download references

Acknowledgement

This research was financially supported by State Language Commission of China under Grant No. YB135-76.

Author information

Authors and Affiliations

Intelligent Information Processing Lab., Department of Computer Science and Technology, Yanbian University, Yanji, 133002, China
Mingjie Tian, Yahui Zhao & Rongyi Cui

Authors

Mingjie Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yahui Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Rongyi Cui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rongyi Cui .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Harbin Institute of Technology, Harbin, China
Ting Liu
Beijing University of Posts and Telecommunications, Beijing, China
Xiaojie Wang
Tsinghua University, Beijing, China
Zhiyuan Liu
Tsinghua University, Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, M., Zhao, Y., Cui, R. (2018). Identifying Word Translations in Scientific Literature Based on Labeled Bilingual Topic Model and Co-occurrence Features. In: Sun, M., Liu, T., Wang, X., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2018 2018. Lecture Notes in Computer Science(), vol 11221. Springer, Cham. https://doi.org/10.1007/978-3-030-01716-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-01716-3_7
Published: 07 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01715-6
Online ISBN: 978-3-030-01716-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics