Mining Bilingual Lexical Equivalences Out of Parallel Corpora

Piperidis, Stelios; Harlas, Ioannis

doi:10.1007/11752912_32

Stelios Piperidis^22,23 &
Ioannis Harlas²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3955))

Included in the following conference series:

Hellenic Conference on Artificial Intelligence

Abstract

The role and importance of methods for lexical knowledge elicitation in the area of multilingual information processing, including machine translation, computer-aided translation and cross-lingual information retrieval is undisputable. The usefulness of such methods becomes even more apparent in cases of language pairs where no appropriate digital language resources exist. This paper presents encouraging experimental results in automatically eliciting bilingual lexica out of Greek-Turkish parallel corpora, consisting of international organizations’ documents available in English, Greek and Turkish, in an attempt to aid multilingual document processing involving these languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Harvesting Comparable Corpora and Mining Them for Equivalent Bilingual Sentences Using Statistical Classification and Analogy-Based Heuristics

New Areas of Application of Comparable Corpora

Unsupervised Construction of Quasi-comparable Corpora and Probing for Parallel Textual Data

References

Brown, P., Lai, J., Mercer, R.: Aligning sentences in parallel corpora. In: Proc. 29th Annual Meeting of the ACL, Berkley, Calif., June 18-21, 1991, pp. 169–176 (1991)
Google Scholar
Brown, R., Carbonell, J., Yang, Y.: Automatic Dictionary Extraction for Cross-Language Information Retrieval (December 1998)
Google Scholar
Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, ISBN 0-07-013151-1
Google Scholar
Gale, W.A., Church, K.W.: A Program for Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Annual Meeting of the ACL, pp. 177–184 (1991)
Google Scholar
Gaussier, E.: Flow network models for word alignment and terminology extraction from bilingual corpora (1998)
Google Scholar
Kageura, K., Tsuji, K., Aizawa, A.: Automatic Thesaurus Generation through Multiple Filtering (2000)
Google Scholar
Kosinov, S.: Evaluation of N-GRAMS Conflation Approach in text-based information retrieval (2001)
Google Scholar
Kupiec, J.: An algorithm for finding noun phrase correspondences in bilingual corpora. In: Proceedings of the 31st Annual Meeting of the ACL, Columbus, Ohio (1993)
Google Scholar
Papageorgiou, H., Prokopidis, P., Giouli, V., Piperidis, S.: A Unified Tagging Architecture and its Application to Greek. In: Proceedings of Second International Conference on Language Resources and Evaluation-LREC 2000, Athens, Greece, May 31-June 2, 2000, pp. 1455–1462 (2000)
Google Scholar
Piperidis, S., Boutsis, S., Demiros, I.: Automatic Translation Lexicon Generation from Multilingual texts. In: Workshop on Multilinguality in the Software Industry: the AI Contribution (MULSAIC 1997), Fifteenth International Joint Conference on Artificial Intelligence (IJCAI 1997), Nagoya, Japan, August 25, 1997, pp. 57–62 (1997)
Google Scholar
Piperidis, S., Malavazos, C., Triantafyllou, Y.: A Multi-level Framework for Memory-Based Translation Aid Tools. In: Aslib, Translating and the Computer, vol. 21, pp. 10–11, London (November 1999)
Google Scholar
Piperidis, S., Papageorgiou, H., Boutsis, S.: From sentences to words and clauses. In: Veronis, J. (ed.) Parallel Text Processing, Alignment and use of translation corpora. Text Speech and Language Technology Series, pp. 117–138. Kluwer Academic Publishers, Dordrecht (2000)
Google Scholar
Porter, M.: An algorithm for suffix stripping, M.F. (1980), http://www.tartarus.org/~martin/index.html
Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics 22(1), 1–38 (1996)
Google Scholar
Tiedemann, J.: Recycling Translations. Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing. In: Acta Universitatis Upsaliensis. Studia Linguistica Upsaliensia, Uppsala, pp. 1–130 (2003), ISBN: 91-554-5815-7
Google Scholar
Tufiş, D., Barbu, A.-M.: Automatic construction of translation lexicons (2001)
Google Scholar
Van der Eijk, P.: Automating the Acquisition of Bilingual Terminology. In: Proceedings Sixth Conference of the European Chapter of the Association for Computational Linguistics, pp. 113–119. Association for Computational Linguistics, Utrecht, The Netherlands (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Language and Speech Processing, Artemidos 6 & Epidavrou, 15125, Marousi, Greece
Stelios Piperidis
National Technical University of Athens, Greece
Stelios Piperidis
Athens University of Economics & Business, Greece
Ioannis Harlas

Authors

Stelios Piperidis
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Harlas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department of University of Crete, Greece
Grigoris Antoniou
Institute of Computer Science, Foundation for Research & Technology – Hellas (FORTH), Vassilika Vouton, P.O. Box 1385, 71110, Heraklion, Greece
George Potamias
Institute of Informatics and Telecommunications, NCSR "Demokritos", 15310 A., Paraskevi Attikis, Greece
Costas Spyropoulos
Institute of Computer Science, FO.R.T.H., Vassilika Vouton, P.O. Box 1385, GR 71110, Heraklion, Greece
Dimitris Plexousakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Piperidis, S., Harlas, I. (2006). Mining Bilingual Lexical Equivalences Out of Parallel Corpora. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds) Advances in Artificial Intelligence. SETN 2006. Lecture Notes in Computer Science(), vol 3955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752912_32

Download citation

DOI: https://doi.org/10.1007/11752912_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34117-8
Online ISBN: 978-3-540-34118-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics