Abstract
With the growing size of a wordnet, it is becoming more and more difficult to avoid, identify and eliminate errors in it, especially when a group of editors work in parallel. That is the case of plWordNet. Thus we need elaborated tools for both error prevention during editing, and diagnostic tools for error detection after the work was completed. In this paper, first, we present error prevention mechanisms built-in the plWordNet editor application and the system for group-working of a linguistic team. Next, we discuss diagnostic tests and diagnostic tools dedicated to plWordNet – the Polish wordnet. plWordNet has been in steady development for almost ten years and has reached the size of 193 k synsets and 255 k lexical meanings. We propose a typology of the diagnostic levels: describe formal, structural and semantic rules for seeking errors within plWordNet, as well as, a new method of automated induction of the diagnostic rules. Finally, we discuss results and benefits of the approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
According to semanticists lexical relations form a continuum [5, p. 143].
- 2.
The vast majority of dictionaries is significantly different than wordnets, so the comparison is difficult.
- 3.
In the plWordNet model two LUs are synonymous if they share all constitutive relations to other LUs, for details, please, look at [10].
- 4.
The application gives special position to the synonymy relation.
- 5.
Linguists sometimes create several LUs in advance and later forget to add them to synsets.
- 6.
It is called unofficially ‘plWordNet Big Brother’.
- 7.
plWordNet domains follows those of Princeton WordNet that originated from the names of the lexicographer files.
- 8.
There are four guidelines created for the need of the four Parts of Speech covered by plWordNet and several more written for specific tasks: register label applying, multi-word LU recognition, differentiating gerunds from other deverbal nouns, describing adjectives derived from proper nouns etc.
- 9.
Suggestions are not obligatory for the editors and who can choose a different place for the LUS of the given lemma.
- 10.
- 11.
In WordnetLoom 2.0 this problem has been eliminated on the level of editing.
- 12.
For instance, There may also occur instances of relations, where at least one of its sides was deleted without proper removal of the relation.
- 13.
In one case, erroneous modifications in the relation definitions made by a human had the same effect.
- 14.
There are three documents describing the lexico-semantic systems available on the site [2]: for nouns (31 pages), for verbs (66 pages) and for adjectives (32 pages).
- 15.
Here will be a link to a full description of the rules.
- 16.
Glosses appeared in plWordNetsince the version 2.2, but they became numerous in the version 2.3, but still they are intended to be more comments for the users than a tool for defining the LU semantics. In a lexico-semantic network it are relations that should be the primary defining means. Constitutive relations are frequent and shared among groups of LUs, cf. [10].
- 17.
The cross-categorial synonymy was introduced into plWordNet after EuroWordNet [9].
- 18.
- 19.
- 20.
References
Słownik Języka Polskiego. Wydawnictwo Naukowe PWN (2007)
The site of Wroclaw University of Technology Language Technology Group G4.19 (2013). http://www.nlp.pwr.wroc.pl
Broda, B., Maziarz, M., Piasecki, M.: Tools for plWordNet development. Presentation and perspectives. In: Calzolari, N., Choukri, K., Declerck, T., Dovgan, M., Maegaard, B., Mariani, J., JanOdijk, Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resourcesand Evaluation (LREC 2012), pp. 3647–3652. European Language Resources Association (ELRA), Istanbul, Turkey, May 2012
Broda, B., Piasecki, M.: Evaluating LexCSD in a large scale experiment. Cont. Cybern. 40(2), 419–436 (2011)
Cruse, A.: Meaning in Language. An Introduction to Semantics and Pragmatics. Oxford University Press, Oxford (2004)
Huang, C.R., Calzolari, N., Gangemi, A., Oltramari, A., Prévot, L. (eds.): Ontology and the Lexicon. A Natural Languge Processing Perspective. Studies in Natural Languge Processing. Cambridge University Press, Cambridge (2010)
Kubis, M.: A query language for WordNet-like lexical databases. In: Pan, J.-S., Chen, S.-M., Nguyen, N.T. (eds.) ACIIDS 2012, Part III. LNCS, vol. 7198, pp. 436–445. Springer, Heidelberg (2012)
Lohk, A., Vare, K., Võhandu, L.: Visual study of Estonian wordnet using bipartite graphs and minimal crossing algorithm. In: Proceedings of 6th Global Wordnet Conference, Matsue, Japan, January 2012
Maziarz, M., Piasecki, M., Rabiega-Wisniewska, J., Szpakowicz, S.: Semantic relations among nouns in Polish wordnet grounded in lexicographic and semantic tradition. Cogn. Stud. 11, 161–181 (2011). http://www.eecs.uottawa.ca/~szpak/pub/Maziarz_et_al_CS2011a.pdf
Maziarz, M., Piasecki, M., Szpakowicz, S.: The chicken-and-egg problem in WordNet design: synonymy, synsets and constitutive relations. Lang. Resour. Eval. 47(3), 769–796 (2013)
Maziarz, M., Piasecki, M., Szpakowicz, S., Rabiega-Wiśniewska, J., Hojka, B.: Semantic relations between verbs in Polish WordNet 2.0. Cogn. Stud. 11, 183–200 (2011)
Maziarz, M., Szpakowicz, S., Piasecki, M.: Semantic relations among adjectives in Polish WordNet 2.0: a new relation set, discussion and evaluation. Cogn. Stud. 12, 149–179 (2012)
Miłkowski, M.: Open thesaurus - polski thesaurus (2007). http://www.synomix.pl/
Piasecki, M., Marcińczuk, M., Ramocki, R., Maziarz, M.: WordNetLoom: a WordNet development system integrating form-based and graph-based perspectives. Int. J. Data Min. Model. Manage. 5(3), 210–232 (2013)
Piasecki, M., Szpakowicz, S., Broda, B.: A WordNet from the Ground Up. University of Technology Press, Wrocław (2009)
Rizov, B.: Hydra: a modal logic tool for wordnet development, validation and exploration. In: Calzolari, N., et al. (eds.) Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008). European Language Resources Association (ELRA), Marrakech, Morocco, May 2008
SJP.PL, Z.: Słownik języka polskiego [A dictionary of the Polish language] (2015). http://sjp.pl/
Smrž, P.: Quality control and checking for wordnet development: a case study of balkanet. Rom. J. Inf. Sci. Technol. 2004(1), 173–182 (2004)
Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, San Francisco (2011)
Acknowledgments
Work financed by the Polish Ministry of Science and Higher Education, a program in support of scientific units involved in the development of a European research infrastructure for the humanities and social sciences in the scope of the consortia CLARIN ERIC and ESS-ERIC, 2015–2016.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Piasecki, M., Burdka, Ł., Maziarz, M., Kaliński, M. (2016). Diagnostic Tools in plWordNet Development Process. In: Vetulani, Z., Uszkoreit, H., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2013. Lecture Notes in Computer Science(), vol 9561. Springer, Cham. https://doi.org/10.1007/978-3-319-43808-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-43808-5_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43807-8
Online ISBN: 978-3-319-43808-5
eBook Packages: Computer ScienceComputer Science (R0)