Skip to main content

MultiAligNet: Cross-lingual Knowledge Bridges Between Words and Senses

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13514))

Abstract

Numerous NLP applications rely on the accessibility to multilingual, diversified, context-sensitive, and broadly shared lexical semantic information. Standard lexical resources tend to first encode monolithic language-bounded senses which are eventually translated and linked across repositories and languages. In this paper, we propose a novel approach for the representation of lexical-semantic knowledge in - and shared from the origin by - multiple languages, based on the idea of k-Multilingual Concept (\(MC^k\)). \(MC^k\)s consist of multilingual alignments of semantically equivalent words in k different languages, that are generated through a defined linguistic context and linked via empirically determined semantic relations without the use of any sense disambiguation process. The \(MC^k\) model allows to uncover novel layers of lexical knowledge in the form of multifaceted conceptual links between naturally disambiguated sets of words. We first present the conceptualization of the \(MC^k\)s, along with the word alignment methodology that generates them. Secondly, we describe a large-scale automatic acquisition of \(MC^k\)s in English, Italian and German based on the exploitation of corpora. Finally, we introduce MultiAlignNet, an original lexical resource built using the data gathered from the extraction task. Results from both qualitative and quantitative assessments on the generated knowledge demonstrate both the quality and the novelty of the proposed model.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.media.mit.edu/.

  2. 2.

    Therefore they are formally included in dictionaries, being considered as part of the lexicon by lexicographers.

  3. 3.

    Yet synonymy, as a rule, is not complete equivalence - as we are reminded by [22].

  4. 4.

    The same would apply for Italian and German synonyms for the concept bicycle.

  5. 5.

    BabelNet high-quality lexicalizations are those word forms that are not marked as resulting from an automatic translation.

  6. 6.

    TJSI versions used: English (60+ billion words), Italian (8.4+ billion words), German (6.9+ billion words).

  7. 7.

    https://dev.panlex.org/api/.

  8. 8.

    https://cloud.google.com/translate.

  9. 9.

    https://github.com/vloverar/multialignet.

  10. 10.

    For EN: iWebCorpus, The Oxford Dictionary https://www.english-corpora.org/iweb, https://www.oxfordlearnersdictionaries.com/wordlists/oxford3000-5000; for IT: NvdB https://www.dropbox.com/s/mkcyo53m15ktbnp/nuovovocabolariodibase.pdf; for DE: [45].

  11. 11.

    https://neo4j.com.

  12. 12.

    The annotator who performed the evaluation is however a native Italian speaker with a minimum of C1 both English and German proficiency level. Therefore, the evaluation is assured by a solid accuracy.

References

  1. Apidianaki, M.: LIMSI: cross-lingual word sense disambiguation using translation sense clustering. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 178–182. Association for Computational Linguistics, Atlanta (2013). https://aclanthology.org/S13-2032

  2. Baisa, V., et al.: European union language resources in sketch engine. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 2799–2803 (2016)

    Google Scholar 

  3. Barba, E., Procopio, L., Navigli, R.: Consec: Word sense disambiguation as continuous sense comprehension. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 1492–1503 (2021)

    Google Scholar 

  4. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)

  5. Bond, F., Foster, R.: Linking and extending an open multilingual wordnet. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1352–1362 (2013)

    Google Scholar 

  6. Bond, F., Foster, R.: Linking and extending an open multilingual Wordnet. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1352–1362. Association for Computational Linguistics, Sofia (2013). https://aclanthology.org/P13-1133

  7. Bond, F., Vossen, P., McCrae, J., Fellbaum, C.: CILI: the collaborative interlingual index. In: Proceedings of the 8th Global WordNet Conference (GWC), pp. 50–57. Global Wordnet Association, Bucharest (2016)

    Google Scholar 

  8. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: Word-sense disambiguation using statistical methods. In: 29th Annual Meeting of the Association for Computational Linguistics, pp. 264–270. Association for Computational Linguistics, Berkeley (1991)

    Google Scholar 

  9. Camacho-Collados, J., Pilehvar, M.T., Navigli, R.: Nasari: integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artif. Intell. 240, 36–64 (2016)

    Article  MathSciNet  Google Scholar 

  10. Chan, Y.S., Ng, H.T.: Scaling up word sense disambiguation via parallel texts. In: Proceedings of the 20th National Conference on Artificial Intelligence (AAAI’05) - Volume 3, pp. 1037–1042. AAAI Press (2005)

    Google Scholar 

  11. Devereux, B.J., Tyler, L.K., Geertzen, J., Randall, B.: The CSLB concept property norms. Behav. Res. Methods 46(4), 1119–1127 (2014)

    Article  Google Scholar 

  12. Diab, M.T., Resnik, P.: Word Sense Disambiguation within a Multilingual Framework. Ph.D. thesis, USA, aAI3115805 (2003)

    Google Scholar 

  13. Edmonds, P., Kilgarriff, A.: Introduction to the special issue on evaluating word sense disambiguation systems. Nat. Lang. Eng. 8(4), 279–291 (2002)

    Article  Google Scholar 

  14. Gabrilovich, E., Markovitch, S., et al.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJcAI, vol. 7, pp. 1606–1611 (2007)

    Google Scholar 

  15. Gale, W.A., Church, K.W., Yarowsky, D.: Work on statistical methods for word sense disambiguation. In: Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, vol. 54, p. 60 (1992)

    Google Scholar 

  16. Grasso, F., Di Caro, L.: A methodology for large-scale, disambiguated and unbiased lexical knowledge acquisition based on multilingual word alignment. In: Fersini, E., Passarotti, M., Patti, V. (eds.) Proceedings of the Eighth Italian Conference on Computational Linguistics, CLiC-it 2021, Milan, Italy, 26–28 January 2022. CEUR Workshop Proceedings, vol. 3033. CEUR-WS.org (2021)

    Google Scholar 

  17. Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)

    Article  Google Scholar 

  18. Hassan, S.H., Mihalcea, R.: Semantic relatedness using salient semantic analysis. In: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)

    Google Scholar 

  19. Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of ACL, pp. 873–882 (2012)

    Google Scholar 

  20. Iacobacci, I., Pilehvar, M.T., Navigli, R.: SensEmbed: learning sense embeddings for word and relational similarity. In: Proceedings of ACL, pp. 95–105 (2015)

    Google Scholar 

  21. Ion, R., Tufis, D.: Multilingual word sense disambiguation using aligned wordnets. Romanian J. Inf. Sci. Technol. 7, 183–200 (2004)

    Google Scholar 

  22. Jakobson, R.: 14. On Linguistic Aspects of Translation, pp. 144–151. University of Chicago Press (2012)

    Google Scholar 

  23. Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlỳ, P., Suchomel, V.: The tenten corpus family. In: 7th International Corpus Linguistics Conference CL, pp. 125–127 (2013)

    Google Scholar 

  24. Kilgarriff, A., et al.: The sketch engine: ten years on. Lexicography 1(1), 7–36 (2014)

    Article  Google Scholar 

  25. Kumar, S., Jat, S., Saxena, K., Talukdar, P.: Zero-shot word sense disambiguation using sense definition embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5670–5681 (2019)

    Google Scholar 

  26. Lacerra, C., Bevilacqua, M., Pasini, T., Navigli, R.: CSI: a coarse sense inventory for 85% word sense disambiguation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8123–8130 (2020)

    Google Scholar 

  27. Lefever, E., Hoste, V.: SemEval-2013 task 10: cross-lingual word sense disambiguation. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 158–166. Association for Computational Linguistics, Atlanta (2013). https://aclanthology.org/S13-2029

  28. McRae, K., Cree, G.S., Seidenberg, M.S., McNorgan, C.: Semantic feature production norms for a large set of living and nonliving things. Behav. R. M. 37(4), 547–559 (2005)

    Article  Google Scholar 

  29. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  30. Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  31. Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a semantic concordance for sense identification. In: Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, 8–11 March 1994 (1994)

    Google Scholar 

  32. Morris, J., Hirst, G.: Non-classical lexical semantic relations. In: Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL 2004, pp. 46–51. Association for Computational Linguistics, Boston (2004). https://aclanthology.org/W04-2607

  33. Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. 41(2), 1–69 (2009)

    Article  Google Scholar 

  34. Navigli, R., Ponzetto, S.P.: BabelNet: building a very large multilingual semantic network. In: Proceedings of ACL, pp. 216–225. Association for Computational Linguistics (2010)

    Google Scholar 

  35. Palmer, M., Dang, H.T., Fellbaum, C.: Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Nat. Lan. Eng. 13(02), 137–163 (2007)

    Article  Google Scholar 

  36. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)

    Google Scholar 

  37. Petricca, P.: SEMANTICA. Forme, Modelli, Problemi (2019)

    Google Scholar 

  38. Pilehvar, M.T., Navigli, R.: A large-scale pseudoword-based evaluation framework for state-of-the-art word sense disambiguation. Comput. Linguist. 40(4), 837–881 (2014)

    Article  Google Scholar 

  39. Scarlini, B., Pasini, T., Navigli, R.: SensEmBERT: context-enhanced sense embeddings for multilingual word sense disambiguation. In: Proceedings of the 34th Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence (2020)

    Google Scholar 

  40. Schütze, H.: Dimensions of meaning. In: SC, pp. 787–796 (1992)

    Google Scholar 

  41. Siegel, M., Bond, F.: OdeNet: compiling a GermanWordNet from other resources. In: Proceedings of the 11th Global Wordnet Conference, pp. 192–198. Global Wordnet Association, University of South Africa (UNISA) (2021). https://aclanthology.org/2021.gwc-1.22

  42. Speer, R., Chin, J., Havasi, C.: Conceptnet 5.5: an open multilingual graph of general knowledge (2017)

    Google Scholar 

  43. Thomas, C.: Lexicalization in Generative Morphology and Conceptual Structure, pp. 45–65. Edinburgh University Press (2013)

    Google Scholar 

  44. Trampuš, M., Novak, B.: Internals of an aggregated web news feed. In: Proceedings of 15th Multiconference on Information Society, pp. 221–224 (2012)

    Google Scholar 

  45. Tschirner, E.: Deutsch nach Themen: Grund-und Aufbauwortschatz: Deutsch als remdsprache nach Themen-Lernwörterbuch. Cornelsen, Berlin (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesca Grasso .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Grasso, F., Lovera Rulfi, V., Di Caro, L. (2022). MultiAligNet: Cross-lingual Knowledge Bridges Between Words and Senses. In: Corcho, O., Hollink, L., Kutz, O., Troquard, N., Ekaputra, F.J. (eds) Knowledge Engineering and Knowledge Management. EKAW 2022. Lecture Notes in Computer Science(), vol 13514. Springer, Cham. https://doi.org/10.1007/978-3-031-17105-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17105-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17104-8

  • Online ISBN: 978-3-031-17105-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics