Abstract
Numerous NLP applications rely on the accessibility to multilingual, diversified, context-sensitive, and broadly shared lexical semantic information. Standard lexical resources tend to first encode monolithic language-bounded senses which are eventually translated and linked across repositories and languages. In this paper, we propose a novel approach for the representation of lexical-semantic knowledge in - and shared from the origin by - multiple languages, based on the idea of k-Multilingual Concept (\(MC^k\)). \(MC^k\)s consist of multilingual alignments of semantically equivalent words in k different languages, that are generated through a defined linguistic context and linked via empirically determined semantic relations without the use of any sense disambiguation process. The \(MC^k\) model allows to uncover novel layers of lexical knowledge in the form of multifaceted conceptual links between naturally disambiguated sets of words. We first present the conceptualization of the \(MC^k\)s, along with the word alignment methodology that generates them. Secondly, we describe a large-scale automatic acquisition of \(MC^k\)s in English, Italian and German based on the exploitation of corpora. Finally, we introduce MultiAlignNet, an original lexical resource built using the data gathered from the extraction task. Results from both qualitative and quantitative assessments on the generated knowledge demonstrate both the quality and the novelty of the proposed model.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
Therefore they are formally included in dictionaries, being considered as part of the lexicon by lexicographers.
- 3.
Yet synonymy, as a rule, is not complete equivalence - as we are reminded by [22].
- 4.
The same would apply for Italian and German synonyms for the concept bicycle.
- 5.
BabelNet high-quality lexicalizations are those word forms that are not marked as resulting from an automatic translation.
- 6.
TJSI versions used: English (60+ billion words), Italian (8.4+ billion words), German (6.9+ billion words).
- 7.
- 8.
- 9.
- 10.
For EN: iWebCorpus, The Oxford Dictionary https://www.english-corpora.org/iweb, https://www.oxfordlearnersdictionaries.com/wordlists/oxford3000-5000; for IT: NvdB https://www.dropbox.com/s/mkcyo53m15ktbnp/nuovovocabolariodibase.pdf; for DE: [45].
- 11.
- 12.
The annotator who performed the evaluation is however a native Italian speaker with a minimum of C1 both English and German proficiency level. Therefore, the evaluation is assured by a solid accuracy.
References
Apidianaki, M.: LIMSI: cross-lingual word sense disambiguation using translation sense clustering. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 178–182. Association for Computational Linguistics, Atlanta (2013). https://aclanthology.org/S13-2032
Baisa, V., et al.: European union language resources in sketch engine. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 2799–2803 (2016)
Barba, E., Procopio, L., Navigli, R.: Consec: Word sense disambiguation as continuous sense comprehension. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 1492–1503 (2021)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
Bond, F., Foster, R.: Linking and extending an open multilingual wordnet. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1352–1362 (2013)
Bond, F., Foster, R.: Linking and extending an open multilingual Wordnet. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1352–1362. Association for Computational Linguistics, Sofia (2013). https://aclanthology.org/P13-1133
Bond, F., Vossen, P., McCrae, J., Fellbaum, C.: CILI: the collaborative interlingual index. In: Proceedings of the 8th Global WordNet Conference (GWC), pp. 50–57. Global Wordnet Association, Bucharest (2016)
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: Word-sense disambiguation using statistical methods. In: 29th Annual Meeting of the Association for Computational Linguistics, pp. 264–270. Association for Computational Linguistics, Berkeley (1991)
Camacho-Collados, J., Pilehvar, M.T., Navigli, R.: Nasari: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artif. Intell. 240, 36–64 (2016)
Chan, Y.S., Ng, H.T.: Scaling up word sense disambiguation via parallel texts. In: Proceedings of the 20th National Conference on Artificial Intelligence (AAAI’05) - Volume 3, pp. 1037–1042. AAAI Press (2005)
Devereux, B.J., Tyler, L.K., Geertzen, J., Randall, B.: The CSLB concept property norms. Behav. Res. Methods 46(4), 1119–1127 (2014)
Diab, M.T., Resnik, P.: Word Sense Disambiguation within a Multilingual Framework. Ph.D. thesis, USA, aAI3115805 (2003)
Edmonds, P., Kilgarriff, A.: Introduction to the special issue on evaluating word sense disambiguation systems. Nat. Lang. Eng. 8(4), 279–291 (2002)
Gabrilovich, E., Markovitch, S., et al.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJcAI, vol. 7, pp. 1606–1611 (2007)
Gale, W.A., Church, K.W., Yarowsky, D.: Work on statistical methods for word sense disambiguation. In: Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, vol. 54, p. 60 (1992)
Grasso, F., Di Caro, L.: A methodology for large-scale, disambiguated and unbiased lexical knowledge acquisition based on multilingual word alignment. In: Fersini, E., Passarotti, M., Patti, V. (eds.) Proceedings of the Eighth Italian Conference on Computational Linguistics, CLiC-it 2021, Milan, Italy, 26–28 January 2022. CEUR Workshop Proceedings, vol. 3033. CEUR-WS.org (2021)
Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
Hassan, S.H., Mihalcea, R.: Semantic relatedness using salient semantic analysis. In: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of ACL, pp. 873–882 (2012)
Iacobacci, I., Pilehvar, M.T., Navigli, R.: SensEmbed: learning sense embeddings for word and relational similarity. In: Proceedings of ACL, pp. 95–105 (2015)
Ion, R., Tufis, D.: Multilingual word sense disambiguation using aligned wordnets. Romanian J. Inf. Sci. Technol. 7, 183–200 (2004)
Jakobson, R.: 14. On Linguistic Aspects of Translation, pp. 144–151. University of Chicago Press (2012)
Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlỳ, P., Suchomel, V.: The tenten corpus family. In: 7th International Corpus Linguistics Conference CL, pp. 125–127 (2013)
Kilgarriff, A., et al.: The sketch engine: ten years on. Lexicography 1(1), 7–36 (2014)
Kumar, S., Jat, S., Saxena, K., Talukdar, P.: Zero-shot word sense disambiguation using sense definition embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5670–5681 (2019)
Lacerra, C., Bevilacqua, M., Pasini, T., Navigli, R.: CSI: a coarse sense inventory for 85% word sense disambiguation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8123–8130 (2020)
Lefever, E., Hoste, V.: SemEval-2013 task 10: cross-lingual word sense disambiguation. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 158–166. Association for Computational Linguistics, Atlanta (2013). https://aclanthology.org/S13-2029
McRae, K., Cree, G.S., Seidenberg, M.S., McNorgan, C.: Semantic feature production norms for a large set of living and nonliving things. Behav. R. M. 37(4), 547–559 (2005)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a semantic concordance for sense identification. In: Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, 8–11 March 1994 (1994)
Morris, J., Hirst, G.: Non-classical lexical semantic relations. In: Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL 2004, pp. 46–51. Association for Computational Linguistics, Boston (2004). https://aclanthology.org/W04-2607
Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. 41(2), 1–69 (2009)
Navigli, R., Ponzetto, S.P.: BabelNet: building a very large multilingual semantic network. In: Proceedings of ACL, pp. 216–225. Association for Computational Linguistics (2010)
Palmer, M., Dang, H.T., Fellbaum, C.: Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Nat. Lan. Eng. 13(02), 137–163 (2007)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Petricca, P.: SEMANTICA. Forme, Modelli, Problemi (2019)
Pilehvar, M.T., Navigli, R.: A large-scale pseudoword-based evaluation framework for state-of-the-art word sense disambiguation. Comput. Linguist. 40(4), 837–881 (2014)
Scarlini, B., Pasini, T., Navigli, R.: SensEmBERT: context-enhanced sense embeddings for multilingual word sense disambiguation. In: Proceedings of the 34th Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence (2020)
Schütze, H.: Dimensions of meaning. In: SC, pp. 787–796 (1992)
Siegel, M., Bond, F.: OdeNet: compiling a GermanWordNet from other resources. In: Proceedings of the 11th Global Wordnet Conference, pp. 192–198. Global Wordnet Association, University of South Africa (UNISA) (2021). https://aclanthology.org/2021.gwc-1.22
Speer, R., Chin, J., Havasi, C.: Conceptnet 5.5: an open multilingual graph of general knowledge (2017)
Thomas, C.: Lexicalization in Generative Morphology and Conceptual Structure, pp. 45–65. Edinburgh University Press (2013)
Trampuš, M., Novak, B.: Internals of an aggregated web news feed. In: Proceedings of 15th Multiconference on Information Society, pp. 221–224 (2012)
Tschirner, E.: Deutsch nach Themen: Grund-und Aufbauwortschatz: Deutsch als remdsprache nach Themen-Lernwörterbuch. Cornelsen, Berlin (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Grasso, F., Lovera Rulfi, V., Di Caro, L. (2022). MultiAligNet: Cross-lingual Knowledge Bridges Between Words and Senses. In: Corcho, O., Hollink, L., Kutz, O., Troquard, N., Ekaputra, F.J. (eds) Knowledge Engineering and Knowledge Management. EKAW 2022. Lecture Notes in Computer Science(), vol 13514. Springer, Cham. https://doi.org/10.1007/978-3-031-17105-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-17105-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17104-8
Online ISBN: 978-3-031-17105-5
eBook Packages: Computer ScienceComputer Science (R0)