research-article

QRpotato: a system that exhaustively collects bilingual technical term pairs from the web

Authors:
Takeshi Abekawa

National Institute of Informatics, Chiyoda-ku, Tokyo, Japan

National Institute of Informatics, Chiyoda-ku, Tokyo, Japan
View Profile

,
Kyo Kageura

University of Tokyo, Bunkyo-ku, Tokyo, Japan

University of Tokyo, Bunkyo-ku, Tokyo, Japan
View Profile

IUCS '09: Proceedings of the 3rd International Universal Communication SymposiumDecember 2009Pages 115–119https://doi.org/10.1145/1667780.1667803

Published:03 December 2009Publication History

IUCS '09: Proceedings of the 3rd International Universal Communication Symposium

Pages 115–119

ABSTRACT

This paper reports the system QRpotato, which exhaustively collects bilingual technical term pairs from the Web. The system uses bilingual (Japanese-English) term pairs taken from existing terminological dictionary as seed pairs, search Web pages using the seed pairs, and extract bilingual term pair candidates from the retrieved Web pages, using relational patterns identified between seed term pairs. We have successfully collected about 2.2 million different term pair candidates by using about 210,000 seed term pairs. The manual evaluation of the parts of the candidates shows the effectiveness of the method.

References

T. Abekawa and K. Kageura. QRedit: An integrated editor system to support online volunteer translators. In Digital Humanities, pages 3--5, 2007.Google Scholar
F. Bond, Z. Chang, and K. Uchimoto. Extracting bilingual terms from mainly monolingual data. In Proceedings of the 2008 Conference on Natural Language Processing in Japan, pages 456--459, 2008.Google Scholar
P. Fung. Word translations from unrelated english and german corpora. In Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), pages 1--16, 1998.Google Scholar
F. Gey, D. K. Evans, and N. Kando. A japanese-english technical lexicon for translation and language research. In LREC2008, pages 26--30, 2008.Google Scholar
T. Hisamitsu and Y. Niwa. Information extraction from parenthetical expressions by using statistical measures and simple rules. In IPSJ SIG Notes NL-109, pages 113--118. Information Processing Society of Japan, 1997.Google Scholar
F. Huang, Y. Zhang, and S. Vogel. Mining key phrase translations from Web corpora. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 483--490, 2005. Google ScholarDigital Library
K. Kageura. Terminological lexicons and terms in context: The translator's perspective. In 7eme conference: Terminologie et Intelligence Artificielle, pages 1--10, 2007.Google Scholar
N. Kando and A. Aizawa. Cross-lingual information retrieval using automatically generated multilingual keyword clusrters. In Proceedings of the 3rd International Workshop on Information Retrieval with Asian Languages, pages 86--94, 1998.Google Scholar
E. Morin, B. Daille, K. Takeuchi, and K. Kageura. Bilingual terminology mining. In Proceedings of the 45th Annual Meeting of the ACL, pages 664--671, 2007.Google Scholar
M. Nagata, T. Saito, and K. Suzuki. Using the Web as a bilingual dictionary. In Proceedings of the ACL-2001 Workshop on Data-driven Methods in Machine Translation, pages 95--102, 2001. Google ScholarDigital Library
R. Rapp. Automatic identification of word translations from unrelated english and german corpora. In Proceedings of the 37th Annual Meeting of the ACL, pages 519--526, 1999. Google ScholarDigital Library
K. Tsuji and K. Kageura. Extracting morpheme pairs from bilingual terminological corpora. Terminology, 7(1):101--114, 2001.Google ScholarCross Ref
M. Utiyama, T. Abekawa, E. Sumita, and K. Kageura. Hosting volunteer translators. In Machine Translation Summit XII, 2009.Google Scholar
T. Utsuro, M. Kida, M. Tonoike, and S. Sato. Collecting novel technical terms from the web by estimating domain specificity of a term. In Proceedings of the 21st International Conference on the Computer Processing of Oriental Languages (ICCPOL), pages 173--180, 2006. Google ScholarDigital Library

Index Terms

QRpotato: a system that exhaustively collects bilingual technical term pairs from the web
1. Information systems

Recommendations

In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora
Abstract
Automatic term extraction is a productive field of research within natural language processing, but it still faces significant obstacles regarding datasets and evaluation, which require manual term annotation. This is an arduous task, made even ...
Read More
Integration of linguistic and web information to improve biomedical terminology extraction
IDEAS '14: Proceedings of the 18th International Database Engineering & Applications Symposium

Comprehensive terminology is essential for a community to describe, exchange, and retrieve data. In multiple domain, the explosion of text data produced has reached a level for which automatic terminology extraction and enrichment is mandatory. ...
Read More
Some considerations on guidelines for bilingual alignment and terminology extraction
SIGHAN '02: Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18

Despite progress in the development of computational means, human input is still critical in the production of consistent and useable aligned corpora and term banks. This is especially true for specialized corpora and term banks whose end-users are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IUCS '09: Proceedings of the 3rd International Universal Communication Symposium
December 2009
404 pages
ISBN:9781605586410
DOI:10.1145/1667780
General Chair:
Kazumasa Enami
National Institute of Information and Communications Technology (NICT), Japan
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 December 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automatic term extraction
bilingual term pairs
bilingual terminology
web
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 76
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

QRpotato: a system that exhaustively collects bilingual technical term pairs from the web

IUCS '09: Proceedings of the 3rd International Universal Communication Symposium

ABSTRACT

References

Cited By

Index Terms

Recommendations

In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora

Integration of linguistic and web information to improve biomedical terminology extraction

Some considerations on guidelines for bilingual alignment and terminology extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

QRpotato: a system that exhaustively collects bilingual technical term pairs from the web

IUCS '09: Proceedings of the 3rd International Universal Communication Symposium

ABSTRACT

References

Cited By

Index Terms

Recommendations

In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora

Integration of linguistic and web information to improve biomedical terminology extraction

Some considerations on guidelines for bilingual alignment and terminology extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media