skip to main content
10.1145/1667780.1667803acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiucsConference Proceedingsconference-collections
research-article

QRpotato: a system that exhaustively collects bilingual technical term pairs from the web

Published:03 December 2009Publication History

ABSTRACT

This paper reports the system QRpotato, which exhaustively collects bilingual technical term pairs from the Web. The system uses bilingual (Japanese-English) term pairs taken from existing terminological dictionary as seed pairs, search Web pages using the seed pairs, and extract bilingual term pair candidates from the retrieved Web pages, using relational patterns identified between seed term pairs. We have successfully collected about 2.2 million different term pair candidates by using about 210,000 seed term pairs. The manual evaluation of the parts of the candidates shows the effectiveness of the method.

References

  1. T. Abekawa and K. Kageura. QRedit: An integrated editor system to support online volunteer translators. In Digital Humanities, pages 3--5, 2007.Google ScholarGoogle Scholar
  2. F. Bond, Z. Chang, and K. Uchimoto. Extracting bilingual terms from mainly monolingual data. In Proceedings of the 2008 Conference on Natural Language Processing in Japan, pages 456--459, 2008.Google ScholarGoogle Scholar
  3. P. Fung. Word translations from unrelated english and german corpora. In Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), pages 1--16, 1998.Google ScholarGoogle Scholar
  4. F. Gey, D. K. Evans, and N. Kando. A japanese-english technical lexicon for translation and language research. In LREC2008, pages 26--30, 2008.Google ScholarGoogle Scholar
  5. T. Hisamitsu and Y. Niwa. Information extraction from parenthetical expressions by using statistical measures and simple rules. In IPSJ SIG Notes NL-109, pages 113--118. Information Processing Society of Japan, 1997.Google ScholarGoogle Scholar
  6. F. Huang, Y. Zhang, and S. Vogel. Mining key phrase translations from Web corpora. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 483--490, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Kageura. Terminological lexicons and terms in context: The translator's perspective. In 7eme conference: Terminologie et Intelligence Artificielle, pages 1--10, 2007.Google ScholarGoogle Scholar
  8. N. Kando and A. Aizawa. Cross-lingual information retrieval using automatically generated multilingual keyword clusrters. In Proceedings of the 3rd International Workshop on Information Retrieval with Asian Languages, pages 86--94, 1998.Google ScholarGoogle Scholar
  9. E. Morin, B. Daille, K. Takeuchi, and K. Kageura. Bilingual terminology mining. In Proceedings of the 45th Annual Meeting of the ACL, pages 664--671, 2007.Google ScholarGoogle Scholar
  10. M. Nagata, T. Saito, and K. Suzuki. Using the Web as a bilingual dictionary. In Proceedings of the ACL-2001 Workshop on Data-driven Methods in Machine Translation, pages 95--102, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Rapp. Automatic identification of word translations from unrelated english and german corpora. In Proceedings of the 37th Annual Meeting of the ACL, pages 519--526, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Tsuji and K. Kageura. Extracting morpheme pairs from bilingual terminological corpora. Terminology, 7(1):101--114, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. Utiyama, T. Abekawa, E. Sumita, and K. Kageura. Hosting volunteer translators. In Machine Translation Summit XII, 2009.Google ScholarGoogle Scholar
  14. T. Utsuro, M. Kida, M. Tonoike, and S. Sato. Collecting novel technical terms from the web by estimating domain specificity of a term. In Proceedings of the 21st International Conference on the Computer Processing of Oriental Languages (ICCPOL), pages 173--180, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. QRpotato: a system that exhaustively collects bilingual technical term pairs from the web

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          IUCS '09: Proceedings of the 3rd International Universal Communication Symposium
          December 2009
          404 pages
          ISBN:9781605586410
          DOI:10.1145/1667780
          • General Chair:
          • Kazumasa Enami

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 December 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader