Skip to main content

Different Approaches in Cross-Language Similar Documents Retrieval in the Legal Domain

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2020)

Abstract

The problem of cross-lingual information retrieval in the legal domain is up-to-date, because of the need of studying the best international practices to improve legislation. One of the possible solutions is thematically similar document retrieval. However, there is an important task to transfer between languages. The paper describes different approaches to solve this problem: from classical mediator-based methods to modern procedures of distributive semantics. As a test collection, we have used the UN digital library. The combination of the extended translation model and BM25 ranking function demonstrates the best results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhebel, V., Kreskin, A., Sochenkov, I.: Cross-lingual document analysis in legal domain. Trudy Instituta sistemnogo analiza rossiyskoy akademii nauk 70(1), 24–29 (2020). https://doi.org/10.14357/20790279200103

    Article  Google Scholar 

  2. Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-language plagiarism detection. Lang. Res. Eval. 45(1), 45–62 (2011)

    Article  Google Scholar 

  3. Sochenkov, I.V., Zubarev, D.V., Tikhomirov, I.A.: Exploratory patent search. Inform. Appl. 12(1), 89–94 (2018)

    Google Scholar 

  4. Mikolov, T., Chen, K., Corrado G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)

    Google Scholar 

  5. Rekabsaz, N., Lupu, M., Hanbury, A., Zuccon, G.: Generalizing translation models in the probabilistic relevance framework. In: Proceedings of CIKM (2016)

    Google Scholar 

  6. Robertson, S.E., et al.: Okapi at TREC-3.0. In: Proceedings of the Third Text REtrieval Conference (TREC 1994), Gaithersburg, USA, November 1994

    Google Scholar 

  7. Vulić, I., Moens, M.F.: Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2, pp. 719–725 (2015)

    Google Scholar 

  8. Zubarev, D.V., Sochenkov, I.V.: Cross-lingual similar document retrieval methods. Proc. Inst. Syst. Prog. 31(5), 127–136 (2019). https://doi.org/10.15514/ISPRAS-2019-31(5)-9

    Article  Google Scholar 

  9. Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of the Language Resources and Evaluation (LREC), pp. 2214–2218 (2012)

    Google Scholar 

  10. Artetxe, M., Schwenk, H.: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019)

    Article  Google Scholar 

  11. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. arXiv:1702.08734 (2017)

    Google Scholar 

Download references

Acknowledgements

This study was funded by RFBR according to the research projects â„–. 18-29-16172 and â„–. 18-29-16022.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Zhebel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhebel, V., Zubarev, D., Sochenkov, I. (2020). Different Approaches in Cross-Language Similar Documents Retrieval in the Legal Domain. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60276-5_65

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60275-8

  • Online ISBN: 978-3-030-60276-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics