Compositionality and lexical alignment of multi-word terms

Morin, Emmanuel; Daille, Béatrice

doi:10.1007/s10579-009-9098-8

Compositionality and lexical alignment of multi-word terms

Published: 06 August 2009

Volume 44, pages 79–95, (2010)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Emmanuel Morin¹ &
Béatrice Daille¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

The automatic compilation of bilingual lists of terms from specialized comparable corpora using lexical alignment has been successful for single-word terms (SWTs), but remains disappointing for multi-word terms (MWTs). The low frequency and the variability of the syntactic structures of MWTs in the source and the target languages are the main reported problems. This paper defines a general framework dedicated to the lexical alignment of MWTs from comparable corpora that includes a compositional translation process and the standard lexical context analysis. The compositional method which is based on the translation of lexical items being restrictive, we introduce an extended compositional method that bridges the gap between MWTs of different syntactic structures through morphological links. We experimented with the two compositional methods for the French–Japanese alignment task. The results show a significant improvement for the translation of MWTs and advocate further morphological analysis in lexical alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic alignment: A measure to quantify the degree of semantic equivalence for English–Chinese translation equivalents based on distributional semantics

Article 08 January 2025

Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation

Weighted Compositional Vectors for Translating Collocations Using Monolingual Corpora

Notes

http://www.kanji.free.fr/.
http://www.quebec-japon.com/lexique/index.php?a=index&d=25.
http://www.dico.fj.free.fr/index.php.
http://www.quebec-japon.com/lexique/index.php?a=index&d=3.
http://www.sciences.univ-nantes.fr/info/perso/permanents/daille/ and release for Mandriva Linux.
http://www.cl.cs.okayama-u.ac.jp/rsc/jacabit/.
The symbols for part-of-speech tags are Adj (Adjective), N (Noun), Pref (Prefix), Prep (Preposition), and Suff (Suffix).
http://www.atilf.fr/winbrill/.
http://www.univ-nancy2.fr/pers/namer/.
http://www.chasen-legacy.sourceforge.jp/.
The Precision corresponds to # correct JP trans. divided by # JP trans.

References

Baldwin, T., & Tanaka, T. (2004). Translation by machine of complex nominals: Getting it right. In Proceedings of the ACL 2004 Workshop on multiword expressions: Integrating processing. Barcelona, Spain, pp. 24–31.
Bowker, L., & Pearson, J. (2002). Working with specialized language: A practical guide to using corpora. London/New York: Routeledge.
Book Google Scholar
Brill, E. (1994). Some advances in transformation-based part of speech tagging. In Proceedings of the 12th national conference on artificial intelligence (AAAI’94). Seattle, Washington, USA, pp. 722–727.
Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.
Google Scholar
Chiao, Y. -C., & Zweigenbaum, P. (2002a). Looking for candidate translational equivalents in specialized, comparable corpora. In Proceedings of the 19th international conference on computational linguistics (COLING’02). Tapei, Taiwan, pp. 1208–1212.
Chiao, Y.-C., & Zweigenbaum, P. (2002b). Looking for French–English translations in comparable medical corpora. Journal of the American Society for Information Science, 8, 150–154.
Google Scholar
Daille, B. (2001). Qualitative terminology extraction: Identifying relational adjectives. In D. Bourigault, C. Jacquemin, & M.-C. L’Homme (Eds.), Recent advances in computational terminology, Vol. 2 of Natural language processing (pp. 149–166). John Benjamins.
Daille, B. (2003a). Conceptual structuring through term variations. In F. Bond, A. Korhonen, D. MacCarthy, & A. Villacicencio (Eds.), Proceedings of the ACL 2003 workshop on multiword expressions: Analysis, acquisition and treatment, pp. 9–16.
Daille, B. (2003b). Terminology mining. In M. T. Pazienza (Ed.), Information extraction in the web era. Springer, pp. 29–44.
Daille, B., & Morin, E. (2005). French–English terminology extraction from comparable corpora. In Proceedings of the 2nd international joint conference on natural language processing (IJCLNP’05). Jeju Island, Korea, pp. 707–718.
Déjean, H., & Gaussier, E. (2002). Une nouvelle approche à l’extraction de lexiques bilingues à partir de corpus comparables. Lexicometrica, Alignement lexical dans les corpus multilingues, pp. 1–22.
Déjean, H., Sadat, F., & Gaussier, E. (2002). An approach based on multilingual thesauri and model combination for bilingual lexicon extraction. In Proceedings of the 19th international conference on computational linguistics (COLING’02). Tapei, Taiwan, pp. 218–224.
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
Google Scholar
Fano, R. M. (1961). Transmission of information: A statistical theory of communications. Cambridge, MA, USA: MIT Press.
Google Scholar
Fung, P. (1998). A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In D. Farwell, L. Gerber, & E. Hovy (Eds.) , Proceedings of the 3rd conference of the association for machine translation in the Americas (AMTA’98). Langhorne, PA, USA (pp. 1–16).
Fung, P., & McKeown, K. (1997). Finding terminology translations from non-parallel corpora. In Proceedings of the 5th annual workshop on very large corpora (VLC’97). Hong Kong, China, pp. 192–202.
Grefenstette, G. (1994a). Corpus-derived first, second and third-order word affinities. In Proceedings of the 6th congress of the European association for lexicography (EURALEX’94). Amsterdam, The Netherlands, pp. 279–290.
Grefenstette, G. (1994b). Explorations in automatic thesaurus discovery. Boston, MA, USA: Kluwer Academic Publisher.
Google Scholar
Grefenstette, G. (1999). The world wide web as a resource for example-based machine translation tasks. In ASLIB’99 translating and the computer 21. London, UK.
Hakusui-sha. (Ed.). (1989). Dictionnaire des termes techniques et scientifiques: Francais-Japonais (4th ed.).
Jacquemin, C. (2001). Spotting and discovering terms through natural language processing. Cambridge: MIT Press.
Google Scholar
Keenan, E. L., & Faltz, L. M. (1985). Boolean semantics for natural language. Dordrecht, Holland: D. Reidel.
Google Scholar
Matsumoto, Y., Kitauchi, A., Yamashita, T., & Hirano, Y. (1999). Japanese morphological analysis system ChaSen 2.0 users manual. Technical report, Nara Institute of Science and Technology (NAIST).
Melamed, I. D. (1997). A word-to-word model of translational equivalence. In P. R. Cohen & W. Wahlster (Eds.), Proceedings of the 35th annual meeting of the association for computational linguistics (ACL’97) and 8th conference of the European chapter of the association for computational linguistics (EACL’97). Madrid, Spain, pp. 490–497.
Melamed, I. D. (2001). Empirical methods for exploiting parallel texts. Cambridge: MIT Press.
Google Scholar
Mikheev, A. (1997). Automatic rule induction for unknown-word guessing. Computational Linguistics, 23(3), 405–423.
Google Scholar
Morin, E., & Daille, B. (2006). Comparabilité de corpus et fouille terminologique multilingue. Traitement Automatique des Langues (TAL), 47(2), 113–136.
Google Scholar
Morin, E., Daille, B., Takeuchi, K., & Kageura, K. (2007). Bilingual terminology mining—using brain, not brawn comparable corpora. In Proceedings of the 45th annual meeting of the association for computational linguistics (ACL’07). Prague, Czech Republic, pp. 664–671.
Namer, F. (2000). FLEMM: Un analyseur flexionnel du français à base de règles. Traitement Automatique des Langues (TAL), 41(2), 523–547.
Google Scholar
Rapp, R. (1995). Identify word translations in non-parallel texts. In Proceedings of the 35th annual meeting of the association for computational linguistics (ACL’95). Boston, MA, USA, pp. 320–322.
Rapp, R. (1999). Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the 37th annual meeting of the association for computational linguistics (ACL’99). College Park, MD, USA, pp. 519–526.
Robitaille, X., Sasaki, X., Tonoike, M., Sato, S., & Utsuro, S. (2006). Compiling French–Japanese terminologies from the web. In Proceedings of the 11th conference of the European chapter of the association for computational linguistics (EACL’06). Trento, Italy, pp. 225–232.
Salton, G., & Lesk, M. E. (1968). Computer evaluation of indexing and text processing. Journal of the Association for Computational Machinery, 15(1), 8–36.
Google Scholar
Simard, M., & Langlais, P. (2003). Statistical translation alignment with compositionality constraint. In HLT-NAACL, worshop on building and using parallel texts: Data driven machine translation and beyond (Vol. 3, pp. 19–22).
Takeuchi, K., Kageura, K., Daille, B., & Romary, L. (2004). Construction of grammar based term extraction model for Japanese. In S. Ananadiou & P. Zweigenbaum (Eds.) Proceedings of the COLING 2004, 3rd international workshop on computational terminology (COMPUTERM’04). Geneva, Switzerland (pp. 91–94).
Tanaka, T. (2002). Measuring the similarity between compound nouns in different languages using non-parallel corpora. In Proceedings of the 19th international conference on computational linguistics (COLING’02). Taipei, Taiwan, pp. 1–7.
Tanaka, T., & Baldwin, T. (2003) Noun–noun compound machine translation: A feasibility study on shallow processing. In Proceedings of the ACL 2003 workshop on multiword expressions: Analysis, acquisition and treatment. Sapporo, Japan, pp. 17–24.
Tsutsumi, T. (1990). Wide-range restructuring of intermediate representations in machine translation. Computational Linguistics, 16(2), 71–78.
Google Scholar

Download references

Acknowledgement

This work was supported by the French National Research Agency grant ANR-08-CORD-013.

Author information

Authors and Affiliations

Université de Nantes, LINA-UMR CNRS 6241, 2 chemin de la Houssinière, BP 92208, 44322, Nantes Cedex 3, France
Emmanuel Morin & Béatrice Daille

Authors

Emmanuel Morin
View author publications
You can also search for this author inPubMed Google Scholar
Béatrice Daille
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Béatrice Daille.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morin, E., Daille, B. Compositionality and lexical alignment of multi-word terms. Lang Resources & Evaluation 44, 79–95 (2010). https://doi.org/10.1007/s10579-009-9098-8

Download citation

Received: 26 November 2007
Accepted: 14 July 2009
Published: 06 August 2009
Issue Date: April 2010
DOI: https://doi.org/10.1007/s10579-009-9098-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compositionality and lexical alignment of multi-word terms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic alignment: A measure to quantify the degree of semantic equivalence for English–Chinese translation equivalents based on distributional semantics

Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation

Weighted Compositional Vectors for Translating Collocations Using Monolingual Corpora

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now