Skip to main content

QRselect: A User-Driven System for Collecting Translation Document Pairs from the Web

  • Conference paper
Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers (ICADL 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4822))

Included in the following conference series:

Abstract

In this paper we introduce a system that collects English-Japanese translation document pairs from the Web that are relevant to subject keywords specified by the user. The system, QRselect, is specifically designed to meet the needs of online volunteer translators who, in the process of translation, want to refer to a small and specific set of translation document pairs which are relevant to what they are translating. A system which collects relevant existing translated documents and makes them available for reference in the translation process will therefore greatly help these translators. Against this backdrop, we developed a prototype translated document collection system and evaluated its performance. We also examined the users’ role in improving the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boitet, C., Bey, Y., Kageura, K.: Main research issues in building web services for mutualized, non-commercial translation. In: Proceedings of the 6th Symposium on Natural Language Processing (2005)

    Google Scholar 

  2. Cao, Y., Li, H.: Base noun phrase translation using web data and the EM algorithm. In: Proceedings of COLING 2002, pp. 127–133 (2002)

    Google Scholar 

  3. Eijiro (2006), http://www.eijiro.jp/

  4. Fukushima, K., Taura, K., Chikayama, T.: Fast and accurate method for detecting English-Japanese parallel texts. In: Proceedings of the COLING/ACL Workshop on Multilingual Language Resources and Interoperability, pp. 60–67 (2006)

    Google Scholar 

  5. Fung, P.: A statistical view on bilingual lexicon extraction. In: Proceedings of AMTA 1998, pp. 1–16 (1998)

    Google Scholar 

  6. Huang, F., Zhang, Y., Vogel, S.: Mining key phrase translations from web corpora. In: Proceedings of HLT/EMNLP 2005, pp. 483–490 (2005)

    Google Scholar 

  7. Kageura, K.: The status of “corpus” in human translation. In: Proceedings of the 12th Annual Meeting of the Japan Society of Natural Language Processing, pp. 452–455 (2006)

    Google Scholar 

  8. Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Bilingual terminology mining – using brain, not brawn comparable corpora. In: Proceedings of ACL 2007, pp. 664–671 (2007)

    Google Scholar 

  9. Nagata, M., Saito, T., Suzuki, K.: Using the web as a bilingual dictionary. In: Proceedings of the Workshop on Data-driven Methods in Machine Translation, pp. 95–102 (2001)

    Google Scholar 

  10. Péry-Woodley, M.-P.: Quels corpus pour quels traitements automatiques? Traitement Automatique des Langues 36, 213–232 (1995)

    Google Scholar 

  11. Rapp, R.: Automatic identification of word translations from unrelated English and German corpora. In: Proceedings of ACL 1999, pp. 519–526 (1999)

    Google Scholar 

  12. Resnik, P., Smith, N.A.: The web as a parallel corpus. Computational Linguistics 29, 349–380 (2003)

    Article  Google Scholar 

  13. Sakai, T.: For the realisation of better IR systems. IPSJ Magazine 47, 147–158 (2006)

    Google Scholar 

  14. Shinagawa, T., Mori, T., Kageura, K.: Extraction and alignment of textual blocks from online translation document pairs. In: Proceedings of the 12th Annual Meeting of the Japan Society of Natural Language Processing, pp. 520–523 (2006)

    Google Scholar 

  15. Shinyama, Y.: Webstemmer (2006), http://www.unixuser.org/~euske/python/webstemmer/index.html

  16. Utsuro, T., Kida, M., Tonoike, M., Sato, S.: Collecting novel technical term from the Web by estimating domain specificity of a term. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 173–180. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dion Hoe-Lian Goh Tru Hoang Cao Ingeborg Torvik Sølvberg Edie Rasmussen

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kageura, K., Abekawa, T., Sekine, S. (2007). QRselect: A User-Driven System for Collecting Translation Document Pairs from the Web. In: Goh, D.HL., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds) Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. ICADL 2007. Lecture Notes in Computer Science, vol 4822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77094-7_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77094-7_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77093-0

  • Online ISBN: 978-3-540-77094-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics