ABSTRACT
This paper reports the system QRpotato, which exhaustively collects bilingual technical term pairs from the Web. The system uses bilingual (Japanese-English) term pairs taken from existing terminological dictionary as seed pairs, search Web pages using the seed pairs, and extract bilingual term pair candidates from the retrieved Web pages, using relational patterns identified between seed term pairs. We have successfully collected about 2.2 million different term pair candidates by using about 210,000 seed term pairs. The manual evaluation of the parts of the candidates shows the effectiveness of the method.
- T. Abekawa and K. Kageura. QRedit: An integrated editor system to support online volunteer translators. In Digital Humanities, pages 3--5, 2007.Google Scholar
- F. Bond, Z. Chang, and K. Uchimoto. Extracting bilingual terms from mainly monolingual data. In Proceedings of the 2008 Conference on Natural Language Processing in Japan, pages 456--459, 2008.Google Scholar
- P. Fung. Word translations from unrelated english and german corpora. In Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), pages 1--16, 1998.Google Scholar
- F. Gey, D. K. Evans, and N. Kando. A japanese-english technical lexicon for translation and language research. In LREC2008, pages 26--30, 2008.Google Scholar
- T. Hisamitsu and Y. Niwa. Information extraction from parenthetical expressions by using statistical measures and simple rules. In IPSJ SIG Notes NL-109, pages 113--118. Information Processing Society of Japan, 1997.Google Scholar
- F. Huang, Y. Zhang, and S. Vogel. Mining key phrase translations from Web corpora. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 483--490, 2005. Google ScholarDigital Library
- K. Kageura. Terminological lexicons and terms in context: The translator's perspective. In 7eme conference: Terminologie et Intelligence Artificielle, pages 1--10, 2007.Google Scholar
- N. Kando and A. Aizawa. Cross-lingual information retrieval using automatically generated multilingual keyword clusrters. In Proceedings of the 3rd International Workshop on Information Retrieval with Asian Languages, pages 86--94, 1998.Google Scholar
- E. Morin, B. Daille, K. Takeuchi, and K. Kageura. Bilingual terminology mining. In Proceedings of the 45th Annual Meeting of the ACL, pages 664--671, 2007.Google Scholar
- M. Nagata, T. Saito, and K. Suzuki. Using the Web as a bilingual dictionary. In Proceedings of the ACL-2001 Workshop on Data-driven Methods in Machine Translation, pages 95--102, 2001. Google ScholarDigital Library
- R. Rapp. Automatic identification of word translations from unrelated english and german corpora. In Proceedings of the 37th Annual Meeting of the ACL, pages 519--526, 1999. Google ScholarDigital Library
- K. Tsuji and K. Kageura. Extracting morpheme pairs from bilingual terminological corpora. Terminology, 7(1):101--114, 2001.Google ScholarCross Ref
- M. Utiyama, T. Abekawa, E. Sumita, and K. Kageura. Hosting volunteer translators. In Machine Translation Summit XII, 2009.Google Scholar
- T. Utsuro, M. Kida, M. Tonoike, and S. Sato. Collecting novel technical terms from the web by estimating domain specificity of a term. In Proceedings of the 21st International Conference on the Computer Processing of Oriental Languages (ICCPOL), pages 173--180, 2006. Google ScholarDigital Library
Index Terms
- QRpotato: a system that exhaustively collects bilingual technical term pairs from the web
Recommendations
In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora
AbstractAutomatic term extraction is a productive field of research within natural language processing, but it still faces significant obstacles regarding datasets and evaluation, which require manual term annotation. This is an arduous task, made even ...
Integration of linguistic and web information to improve biomedical terminology extraction
IDEAS '14: Proceedings of the 18th International Database Engineering & Applications SymposiumComprehensive terminology is essential for a community to describe, exchange, and retrieve data. In multiple domain, the explosion of text data produced has reached a level for which automatic terminology extraction and enrichment is mandatory. ...
Some considerations on guidelines for bilingual alignment and terminology extraction
SIGHAN '02: Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18Despite progress in the development of computational means, human input is still critical in the production of consistent and useable aligned corpora and term banks. This is especially true for specialized corpora and term banks whose end-users are ...
Comments