Abstract
The problem of cross-lingual information retrieval in the legal domain is up-to-date, because of the need of studying the best international practices to improve legislation. One of the possible solutions is thematically similar document retrieval. However, there is an important task to transfer between languages. The paper describes different approaches to solve this problem: from classical mediator-based methods to modern procedures of distributive semantics. As a test collection, we have used the UN digital library. The combination of the extended translation model and BM25 ranking function demonstrates the best results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhebel, V., Kreskin, A., Sochenkov, I.: Cross-lingual document analysis in legal domain. Trudy Instituta sistemnogo analiza rossiyskoy akademii nauk 70(1), 24–29 (2020). https://doi.org/10.14357/20790279200103
Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-language plagiarism detection. Lang. Res. Eval. 45(1), 45–62 (2011)
Sochenkov, I.V., Zubarev, D.V., Tikhomirov, I.A.: Exploratory patent search. Inform. Appl. 12(1), 89–94 (2018)
Mikolov, T., Chen, K., Corrado G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)
Rekabsaz, N., Lupu, M., Hanbury, A., Zuccon, G.: Generalizing translation models in the probabilistic relevance framework. In: Proceedings of CIKM (2016)
Robertson, S.E., et al.: Okapi at TREC-3.0. In: Proceedings of the Third Text REtrieval Conference (TREC 1994), Gaithersburg, USA, November 1994
Vulić, I., Moens, M.F.: Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2, pp. 719–725 (2015)
Zubarev, D.V., Sochenkov, I.V.: Cross-lingual similar document retrieval methods. Proc. Inst. Syst. Prog. 31(5), 127–136 (2019). https://doi.org/10.15514/ISPRAS-2019-31(5)-9
Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of the Language Resources and Evaluation (LREC), pp. 2214–2218 (2012)
Artetxe, M., Schwenk, H.: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019)
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. arXiv:1702.08734 (2017)
Acknowledgements
This study was funded by RFBR according to the research projects â„–. 18-29-16172 and â„–. 18-29-16022.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhebel, V., Zubarev, D., Sochenkov, I. (2020). Different Approaches in Cross-Language Similar Documents Retrieval in the Legal Domain. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_65
Download citation
DOI: https://doi.org/10.1007/978-3-030-60276-5_65
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)