Abstract
Recent works rely on comparable corpora to extract efficient bilingual lexicon. Most of approaches in the litterature for bilingual lexicon extraction are based on context vectors (CV). These approaches suffer from noisy vectors that affect their accuracy. This paper presents new approaches which relies on some advanced text mining methods to extract association rules between terms (AR) and extend them to contextual meta-rules (MR). In this respect, we propose to extract bilingual lexicons by deploying standard context vectors, association rules and contextual meta-rules. These proposed approaches utilize correlations between co-occurrence patterns across language. An experimental validation conducted on a specialized comparable corpora, highlights a significant improvement of bilingual lexicon based on MR compared to the standard approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Transliterations are words adapted from a source language to a target language, based on their pronunciation.
- 2.
By analogy to the itemset terminology used in data mining for a set of items.
- 3.
maxsupp means that the termset must occur at most this user-defined threshold.
- 4.
- 5.
- 6.
- 7.
References
Fung, P.: A statistical view on bilingual lexicon extraction: from parallel corpora to non-parallel corpora. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 1–17. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49478-2_1
Rapp, R.: Automatic identification of word translations from unrelated English and German corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 519–526 (1999)
Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Bilingual terminology mining-using brain, not brawn comparable corpora. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic, pp. 664–671 (2007)
Bouamor, D., Semmar, N., Zweigenbaum, P.: Towards a generic approach for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 15th Machine Translation Summit, Nice, 2–6 September 2013, pp. 143–150 (2013)
Morin, E., Hazem, A.: Exploiting unbalanced specialized comparable corpora for bilingual lexicon extraction. Nat. Lang. Eng. 22(4), 575–601 (2016)
Agrawal, R., Skirant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Databases, VLDB 1994, Santiago, Chile, pp. 478–499, September 1994
Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-59830-2
Prochasson, E., Morin, E.: Points d’ancrage pour l’extraction lexicale bilingue à partir de petits corpus comparables spécialisés. In: Traitement Automatique des Langues (TAL), vol. 50, pp. 283–304 (2009)
Ismail, A., Manandhar, S.: Bilingual lexicon extraction from comparable corpora using in-domain terms. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China, pp. 481–489 (2010)
Morin, E., Prochasson, E.: Bilingual lexicon extraction from comparable corpora enhanced with parallel corpora. In: Proceedings of the 4th Workshop on Building and Using Comparable Corpora, Portland, Oregon, pp. 27–34. Association for Computational Linguistics, June 2011
Déjean, H., Gaussier, E.: Une nouvelle approche à l’extraction de lexiques bilingues à partir de corpus comparables. In: Véronis, J. (ed.) Lexicometrica, Alignement lexical dans les corpus multilingues, pp. 1–22 (2002)
Hazem, A., Morin, E.: Adaptive dictionary for bilingual lexicon extraction from comparable corpora. In: Chair, N.C.C., et al. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey. European Language Resources Association (ELRA), May 2012
Gaussier, E., Renders, J.M., Matveeva, I., Goutte, C., Déjean, H.: A geometric view on bilingual lexicon extraction from comparable corpora. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), pp. 526–533 (2004)
Yu, K., Tsujii, J.: Bilingual dictionary extraction from Wikipedia. In: Proceedings of NAACL HLT 2009, pp. 121–124 (2009)
Rubino, R., Linarès, G.: Une approche multi-vue pour l’extraction terminologique bilingue. In: CORIA 2011, pp. 97–111 (2011)
Andrade, D., Matsuzaki, T., Tsujii, J.: Effective use of dependency structure for bilingual lexicon creation. In: Gelbukh, A. (ed.) CICLing 2011. LNCS, vol. 6609, pp. 80–92. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19437-5_7
Hazem, A., Morin, E.: Improving bilingual lexicon extraction from comparable corpora using window-based and syntax-based models. In: Gelbukh, A. (ed.) CICLing 2014. LNCS, vol. 8404, pp. 310–323. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54903-8_26
Hazem, A., Daille, B.: Bilingual lexicon extraction at the morpheme level using distributional analysis. In: Chair, N.C.C., et al. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Paris, France. European Language Resources Association (ELRA), May 2016
Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Hamilton, H.J. (ed.) AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45486-1_4
Zaki, M.J., Hsiao, C.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4), 462–478 (2005)
Kim, J., Kwon, H., Seo, H.: Evaluating a pivot-based approach for bilingual lexicon extraction. Comput. Intell. Neurosc. 2015, 434153:1–434153:13 (2015)
Lintean, M.C., Rus, V.: Measuring semantic similarity in short texts through greedy pairin and word semantics. In: Proceedings of the Twenty-Fifth International Florida Artificial Intelligence Research Society Conference, May 2012
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. CoRR (1997)
Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 19, 17–30 (1989)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, ACL 1994, pp. 133–138. Association for Computational Linguistics (1994)
Leacock, C., Chodorow, M.: Combining local context and WordNet sense similarity for word sense identification. In: WordNet: An Electronic Lexical Database. MIT Press (1998)
Navigli, R., Ponzetto, S.: Babelnet: building a very large multilingual semantic network. In: Hajic, J., Carberry, S., Clark, S. (eds.) Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), pp. 216–225. The Association for Computer Linguistics (2010)
Chebel, M., Latiri, C., Gaussier, E.: Bilingual lexicon extraction from comparable corpora based on closed concepts mining. In: Advances in Knowledge Discovery and Data Mining - 21st Pacific-Asia Conference (PAKDD 2017), 23–26 May 2017, pp. 586–598 (2017)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Belhaj Rhouma, S., Latiri, C., Berrut, C. (2023). Advanced Text Mining Methods for Bilingual Lexicon Extraction from Speciliazed Comparable Corpora. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13397. Springer, Cham. https://doi.org/10.1007/978-3-031-23804-8_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-23804-8_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23803-1
Online ISBN: 978-3-031-23804-8
eBook Packages: Computer ScienceComputer Science (R0)