Automatic Dictionary Expansion Using Non-parallel Corpora

Rapp, Reinhard; Zock, Michael

doi:10.1007/978-3-642-01044-6_29

Reinhard Rapp⁵ &
Michael Zock

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2909 Accesses
2 Citations

Abstract

Automatically generating bilingual dictionaries from parallel, manually translated texts is a well established technique that works well in practice. However, parallel texts are a scarce resource. Therefore, it is desirable also to be able to generate dictionaries from pairs of comparable monolingual corpora. For most languages, such corpora are much easier to acquire, and often in considerably larger quantities. In this paper we present the implementation of an algorithm which exploits such corpora with good success. Based on the assumption that the co-occurrence patterns between different languages are related, it expands a small base lexicon. For improved performance, it also realizes a novel interlingua approach. That is, if corpora of more than two languages are available, the translations from one language to another can be determined not only directly, but also indirectly via a pivot language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Armstrong, S., Kempen, M., McKelvie, D., Petitpierre, D., Rapp, R., & Thompson, H. (1998). Multilingual corpora for cooperation. In Proceedings of the 1st International Conference on Linguistic Resources and Evaluation (LREC) (Vol. 2, pp. 975–980). Granada, Spain.
Google Scholar
Chiao, Y.-C., Sta, J.-D., & Zweigenbaum, P. (2004). A novel approach to improve word translations extraction from non-parallel, comparable corpora. In Proceedings of the International Joint Conference on Natural Language Processing. Hainan, China. AFNLP.
Google Scholar
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
Google Scholar
Fung, P., & McKeown, K. (1997). Finding terminology translations from non-parallel corpora. In Proceedings of the 5th Annual Workshop on Very Large Corpora (pp. 192–202). Hong Kong.
Google Scholar
Fung, P., & Yee, L. Y. (1998). An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of COLING-ACL 1998 (Vol. 1, pp. 414–420). Montreal.
Google Scholar
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of MT Summit (pp. 79–86). Phuket, Thailand.
Google Scholar
Munteanu, D. S., & Marcu, D. (2005). Improving machine translation performance by exploiting non-parallel corpora. Computational Linguistics, 31(4), 477–504.
Article Google Scholar
Rapp, R. (1995). Identifying word translations in non-parallel texts. In Proceedings of the 33rd Meeting of the Association for Computational Linguistics (pp. 320–322). Cambridge, MA.
Google Scholar
Rapp, R. (1999). Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (pp. 519–526). College Park, MD.
Google Scholar
Rapp, R., & Martin Vide, C. (2007). Statistical machine translation without parallel corpora. In: G. Rehm, A. Witt, & L. Lemnitzer (Eds.), Datenstrukturen für linguistische Ressourcen und ihre Anwendungen Data Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV Conference 2007 (pp. 231–240). Tübingen: Gunter Narr Verlag.
Google Scholar
Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., et al. (2006). The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th International Conference on Language Resources and Evaluation. Genoa, Italy.
Google Scholar
Wu, D., & Fung, P. (2005). Inversion transduction grammar constraints for mining parallel sentences from quasi-comparable corpora. In Proceedings of the Second International Joint Conference on Natural Language Processing (IJCNLP-2005). Jeju, Korea.
Google Scholar

Download references

Acknowledgements

Part of this research was supported by a Marie Curie Intra European Fellowship within the 6th European Community Framework Programme. We thank Olivier Ferret for valuable comments concerning French language resources.

Author information

Authors and Affiliations

Universitat Rovira i Virgili, GRLMC, Tarragona, Spain
Reinhard Rapp

Authors

Reinhard Rapp
View author publications
You can also search for this author in PubMed Google Scholar
Michael Zock
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reinhard Rapp .

Editor information

Editors and Affiliations

Universität der Bundeswehr, Fak. Wirtschafts-/Sozialwissenschaften, Helmut-Schmidt-Universität, Holstenhofweg 85, Hamburg, 22043, Germany
Andreas Fink
Dept. Mathematical Sciences, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom
Berthold Lausen
Universität der Bundeswehr, Fak. Wirtschafts-/Sozialwissenschaften, Helmut-Schmidt-Universität, Holstenhofweg 85, Hamburg, 22043, Germany
Wilfried Seidel
FB 12 Mathematik und Informatik, Datenbionik AG, Universität Marburg, Hans-Meerwein-Straße, Marburg, 35032, Germany
Alfred Ultsch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rapp, R., Zock, M. (2009). Automatic Dictionary Expansion Using Non-parallel Corpora. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-01044-6_29
Published: 31 July 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics