QRselect: A User-Driven System for Collecting Translation Document Pairs from the Web

Kageura, Kyo; Abekawa, Takeshi; Sekine, Satoshi

doi:10.1007/978-3-540-77094-7_21

Kyo Kageura¹,
Takeshi Abekawa¹ &
Satoshi Sekine²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4822))

Included in the following conference series:

International Conference on Asian Digital Libraries

1713 Accesses
3 Citations

Abstract

In this paper we introduce a system that collects English-Japanese translation document pairs from the Web that are relevant to subject keywords specified by the user. The system, QRselect, is specifically designed to meet the needs of online volunteer translators who, in the process of translation, want to refer to a small and specific set of translation document pairs which are relevant to what they are translating. A system which collects relevant existing translated documents and makes them available for reference in the translation process will therefore greatly help these translators. Against this backdrop, we developed a prototype translated document collection system and evaluated its performance. We also examined the users’ role in improving the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Boitet, C., Bey, Y., Kageura, K.: Main research issues in building web services for mutualized, non-commercial translation. In: Proceedings of the 6th Symposium on Natural Language Processing (2005)
Google Scholar
Cao, Y., Li, H.: Base noun phrase translation using web data and the EM algorithm. In: Proceedings of COLING 2002, pp. 127–133 (2002)
Google Scholar
Eijiro (2006), http://www.eijiro.jp/
Fukushima, K., Taura, K., Chikayama, T.: Fast and accurate method for detecting English-Japanese parallel texts. In: Proceedings of the COLING/ACL Workshop on Multilingual Language Resources and Interoperability, pp. 60–67 (2006)
Google Scholar
Fung, P.: A statistical view on bilingual lexicon extraction. In: Proceedings of AMTA 1998, pp. 1–16 (1998)
Google Scholar
Huang, F., Zhang, Y., Vogel, S.: Mining key phrase translations from web corpora. In: Proceedings of HLT/EMNLP 2005, pp. 483–490 (2005)
Google Scholar
Kageura, K.: The status of “corpus” in human translation. In: Proceedings of the 12th Annual Meeting of the Japan Society of Natural Language Processing, pp. 452–455 (2006)
Google Scholar
Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Bilingual terminology mining – using brain, not brawn comparable corpora. In: Proceedings of ACL 2007, pp. 664–671 (2007)
Google Scholar
Nagata, M., Saito, T., Suzuki, K.: Using the web as a bilingual dictionary. In: Proceedings of the Workshop on Data-driven Methods in Machine Translation, pp. 95–102 (2001)
Google Scholar
Péry-Woodley, M.-P.: Quels corpus pour quels traitements automatiques? Traitement Automatique des Langues 36, 213–232 (1995)
Google Scholar
Rapp, R.: Automatic identification of word translations from unrelated English and German corpora. In: Proceedings of ACL 1999, pp. 519–526 (1999)
Google Scholar
Resnik, P., Smith, N.A.: The web as a parallel corpus. Computational Linguistics 29, 349–380 (2003)
Article Google Scholar
Sakai, T.: For the realisation of better IR systems. IPSJ Magazine 47, 147–158 (2006)
Google Scholar
Shinagawa, T., Mori, T., Kageura, K.: Extraction and alignment of textual blocks from online translation document pairs. In: Proceedings of the 12th Annual Meeting of the Japan Society of Natural Language Processing, pp. 520–523 (2006)
Google Scholar
Shinyama, Y.: Webstemmer (2006), http://www.unixuser.org/~euske/python/webstemmer/index.html
Utsuro, T., Kida, M., Tonoike, M., Sato, S.: Collecting novel technical term from the Web by estimating domain specificity of a term. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 173–180. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Education, University of Tokyo, 7–3–1 Hongo, Bunkyo-ku, Tokyo 113–0033, Japan
Kyo Kageura & Takeshi Abekawa
Computer Science Department, New York University, 715 Broadway, New York, NY 10003, USA
Satoshi Sekine

Authors

Kyo Kageura
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Abekawa
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Sekine
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Dion Hoe-Lian Goh Tru Hoang Cao Ingeborg Torvik Sølvberg Edie Rasmussen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kageura, K., Abekawa, T., Sekine, S. (2007). QRselect: A User-Driven System for Collecting Translation Document Pairs from the Web. In: Goh, D.HL., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds) Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. ICADL 2007. Lecture Notes in Computer Science, vol 4822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77094-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-540-77094-7_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77093-0
Online ISBN: 978-3-540-77094-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics