Abstract
The lack of large-scale, freely available and durable lexical resources, and the consequences for NLP, is widely acknowledged but the attempts to cope with usual bottlenecks preventing their development often result in dead-ends. This article introduces a language-independent, semi-automatic and endogenous method for enriching lexical resources, based on collaborative editing and random walks through existing lexical relationships, and shows how this approach enables us to overcome recurrent impediments. It compares the impact of using different data sources and similarity measures on the task of improving synonymy networks. Finally, it defines an architecture for applying the presented method to Wiktionary and explains how it has been implemented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sekine, S.: We desperately need linguistic resources! –based on the users’ point of view. In: FLaReNet Forum 2010, Barcelona, Spain (2010)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Vossen, P. (ed.): EuroWordNet: a Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Norwell (1998)
Tufis, D.: Balkanet Design and Development of a Multilingual Balkan Wordnet. Romanian Journal of Information Science and Technology 7 (2000)
Jacquin, C., Desmontils, E., Monceaux, L.: French EuroWordNet Lexical Database Improvements. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 12–22. Springer, Heidelberg (2007)
Sagot, B., Fišer, D.: Building a Free French Wordnet from Multilingual Resources. In: Proceedings of OntoLex 2008, Marrakech (2008)
Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING), Nantes, pp. 539–545 (1992)
Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: Proceedings of the International Conference on Computational Linguistics, Sydney, pp. 113–120. ACL Press (2006)
Voormann, H., Gut, U.: Agile Corpus Creation. Corpus Linguistics and Lingustic Theory 4, 235–251 (2008)
Brunello, M.: The Creation of Free Linguistic Corpora from the Web. In: Proceedings of WAC5: 5th Workshop on Web As Corpus, San Sebastian, pp. 37–44 (2009)
Giles, J.: Internet Encyclopaedias Go Head to Head. Nature 438, 900–901 (2005)
Encyclopaedia Britannica: Fatally Flawed: Refuting the Recent Study on Encyclopedic Accuracy by the Journal Nature (2006)
Zesch, T., Gurevych, I.: Wisdom of Crowds versus Wisdom of Linguists – Measuring the Semantic Relatedness of Words. Journal of Natural Language Engineering 16, 25–59 (2010)
Lafourcade, M.: Making People Play for Lexical Acquisition with the JeuxDeMots prototype. In: SNLP 2007: 7th International Symposium on Natural Language Processing, Pattaya, Thailand (2007)
Zesch, T., Müller, C., Gurevych, I.: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: Proceedings of the Conference on Language Resources and Evaluation (LREC), Marrakech (2008)
Navarro, E., Sajous, F., Gaume, B., Prévot, L., Hsieh, S., Kuo, I., Magistry, P., Huang, C.R.: Wiktionary and NLP: Improving Synonymy Networks. In: Proceedings of the ACL-IJCNLP Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, Singapore, pp. 19–27 (2009)
Meyer, C.M., Gurevych, I.: Worth its Weight in Gold or Yet Another Resource – A Comparative Study of Wiktionary, OpenThesaurus and GermaNet. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 38–49. Springer, Heidelberg (2010)
Gaume, B., Venant, F., Victorri, B.: Hierarchy in Lexical Organization of Natural Language. In: Pumain, D. (ed.) Hierarchy in Natural and Social Sciences. Methodos series, pp. 121–143. Kluwer Academic Publishers, Dordrecht (2005)
Zesch, T.: What’s the Difference? Comparing Expert-Built and Collaboratively-Built Lexical Semantic Resources. In: FLaReNet Forum 2010, Barcelona, Spain (2010)
Forte, A., Bruckman, A.: Scaling Consensus: Increasing Decentralization in Wikipedia Governance. In: Proceedings of the 41st Hawaii International Conference on System Sciences, Washington DC, p. 157. IEEE Computer Society, Los Alamitos (2008)
Gaume, B., Mathieu, F.: PageRank Induced Topology for Real-World Networks. Complex Systems (2008)
Hughes, T., Ramage, D.: Lexical Semantic Relatedness with Random Graph Walks. In: Proceedings of EMNLP-CoNLL, pp. 581–589 (2007)
Weale, T., Brew, C., Fosler-Lussier, E.: Using the Wiktionary Graph Structure for Synonym Detection. In: Proceedings of the ACL-IJCNLP Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, Singapore, pp. 28–31 (2009)
Huang, C.R., Chen, C.L., Weng, C.X., Lee, H.P., Chen, Y.X., Chen, K.J.: The Sinica Sense Management System: Design and Implementation. Computational Linguistics and Chinese Language Processing 10, 417–430 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sajous, F., Navarro, E., Gaume, B., Prévot, L., Chudy, Y. (2010). Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds) Advances in Natural Language Processing. NLP 2010. Lecture Notes in Computer Science(), vol 6233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14770-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-14770-8_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14769-2
Online ISBN: 978-3-642-14770-8
eBook Packages: Computer ScienceComputer Science (R0)