Development of Real Size IT Systems with Language Competence as a Challenge for a Less-Resourced Language

Vetulani, Zygmunt; Vetulani, Grażyna

doi:10.1007/978-3-030-63007-2_59

Development of Real Size IT Systems with Language Competence as a Challenge for a Less-Resourced Language

Zygmunt Vetulani¹⁴ &
Grażyna Vetulani¹⁵

Conference paper
First Online: 23 November 2020

1232 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12496))

Abstract

In this paper, based on the example of our early works for Polish, we want to share our experience in the challenging task of developing NLP-based technologies in the situation of initial scarcity of digital language resources that ranked Polish among the Less-Resourced Languages. We present some of our projects aiming at language resources and tools we had to develop in order to be able to process text in Polish and to develop real-scale systems with language understanding competence. The case study we present here is the rule-based system POLINT-112-SMS for improving information management in emergency situations. We argue in favor of the lexicon-grammar approach to formal description of highly inflected languages and present our current work on this grammatical paradigm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
See e.g. (Vetulani 1988) reporting our results obtained already in 1984 at GIA, Marseille.
2.
This system is an outcome of the POLEX Polish Lexicon Project (“POLEX - Polska Leksykalna Baza Danych No KBN8S50301007”) realised by Z. Vetulani, B. Walczak, T. Obrębski, G. Vetulani and other team members during 1994–1996 (Vetulani et al. 1998a).
3.
The resource is distributed through ELRA. ISLRN: 147-211-031-223-4; ID: ELRA-L0047.
4.
GENELEX was continued by LE-PAROLE (1996–1998), LE-SIMPLE (1998–2000) and GRAAL (1992–1996) projects.
5.
See e.g. Maurice Gross (1975) and Polański (1980–1992).
6.
CEGLEX Consortium: AMU, Poznań/Poland/, Charles University, Prague/Czech Rep./, GSI-ERLI, Charenton/France/(coordinator), Lingware, Szeged/Hungary/.
7.
From the GRAMLEX project description by Eric Laporte, project coordinator. Quoted by (Vetulani et al. 1998b).
8.
GRAMLEX Consortium: AMU, Poznań/Poland/, ASSTRIL, Marne la Vallée/France/ (coordinator), CLR, Salerno/Italy/, Hungarian Academy of Science, Budapest/Hungary/.
9.
These tools and applications were: (i) a lemmatizer/tagger (LEXAN) (Vetulani et al. 1997, 1998b), (ii) a generator of inflected forms for simple and compound lexemes (Vetulani et al. 1998b), (iii) a syntactic concordance generator (SCON) (Vetulani et al. 1998b), (iv) a tool for the extraction of compound terms and terminology from texts (EXTRACT) (Vetulani et al. 1998b), (v) an application for the structure analysis of dictionary entries (VERBAN) (Vetulani et al. 1998b), (vi) an application for acquisition of the lexicon from dictionary definitions (NOUNAN) (Vetulani et al. 1998b), (vii) an application for interactive analysis of dictionary definitions (NOUNDAN) (Vetulani et al. 1998b).
10.
Grant of Ministry of Science and Higher Education (MNiSzW) Nr R0002802 (2006–2010).
11.
Emergency telephone service maintained by the Polish Police (equivallent to tel. 112).
12.
See (Vetulani Z. 2014) for the core PolNet bibliography until 2014.
13.
At about the same time, Wrocław Technical University started another successful wordnet project plWordnet (pl. Słowosieć) (Piasecki et al. 2009) based on a different methodology.
14.
In the Princeton WordNet the basic entities are synsets, i.e. classes of synonyms related by semantic relations of which the most important are hyponymy and hyperonymy.
15.
Princeton WordNet (Miller et al. 1990) was used as a formal ontology to implement systems with language understanding functionality. In order to respect a specific Polish conceptualization of the world, we decided to build PolNet from scratch rather than merely translate the Princeton WordNet into Polish.
16.
See (Vetulani et al. 2007) for PolNet development algorithm.
17.
ORBIS, an interface to a database on planets, was implemented in PROLOG to show the qualities of the declarative programming paradigm. The initially bilingual system was extended with a module for Polish by Z. Vetulani while his research fellowship at GIA, the University Aix-Marseille II in 1984 (Vetulani 1988).
18.
In (Vetulani 1994) we described the switch technique which combined with appropriate heuristics permits, on the ground of morphological and valency information, the reduction of the complexity of parsing sometimes down to linear.
19.
This dictionary is described in two monographs. The first one (2000) describes the initial phase of work on a dictionary of verb-noun collocations together their usage in sentences as predicates (2862 predicative nouns). This work was done manually. The extension of the resource to 14.600 collocations was described in the second book (2012) reporting further, computer-assisted work. A part of this resource was integrated with PolNet.
20.
In (Vetulani et al. 2010) we described a computer-assisted algorithm to extract collocations directly from text corpora. Still, involvement of qualified lexicographers is necessary.
21.
The term support verb was first introduced to linguistics by Harris (1964), on the occasion of his research on nominalisation. Gross (1975) used the term verb support (French) in his work on lexicon-grammars, while Ch. Fillmore (Filmore et al. 2002) preferred to use the word light verb in the project VerbNet. G. Vetulani (Vetulani 2000) uses the term czasownik podporowy (Polish).
22.
Here support verb + predicative noun, but more generally any other predicative word (such as an adjective or adverb) instead of a noun.
23.
Eg. in order to seed-up parsing (Vetulani 1991).
24.
In Polish we observe the phenomenon of syntactic synonymy (Jędrzejko 1993) where for some predicative verbs their morpho-syntactic structure varies from the morpho-syntactic structure of their semantic synonyms in the form of verb-noun collocation (e.g. for the direct complement). For consistency with our methodological assumptions, we range these synonymous forms in distinct synsets interconnected by a special semantic similarity relation.
25.
From14.400 in PolNet 2.0 to 17.564 in PolNet 3.0.
26.
In Kraków, Poznań, Rzeszów, Śląsk (Silesia), Warszawa, Wrocłąw, and other places.

References

Antoni-Lay, M.-H., Francopoulo, G., Zaysser, L.: A Generic Model for Reusable Lexicons: The Genelex Project. Literary and Linguistic Computing, vol. 9, no 1, pp. 47–54. University Press, Oxford (1994)
Google Scholar
Colmerauer, A., Kittredge, R.: ORBIS. In: J. Horecký, J. (ed.) Proceedings of the 9th COLING Conference (1982)
Google Scholar
Fillmore Ch.J., Baker C.F., Sato H.: Seeing arguments through transparent structures. In: Proceedings of Third International Conference on Language Resources and Evaluation, Proceedings, vol. III, Las Palmas, pp. 787–791 (2002)
Google Scholar
Gross, M.: Méthodes en syntaxe. Hermann, Paris (1975)
Google Scholar
Harris, Z.S.: The Elementary Transformations. Transformations and Discourse Analysis Papers No. 54. University of Pennsylvania, Philadelphia (1964)
Google Scholar
Jędrzejko, E.: Nominalizacje w systemie i w tekstach współczesnej polszczyzny. Wydawnictwo Uniwersytetu Śląskiego, Katowice (1993)
Google Scholar
Kubis, M.: A semantic similarity measurement tool for WordNet-like databases. In: Vetulani, Z., Mariani, J. (eds.) Proceedings of the 7th Language and Technology Conference, Poznań, Poland, 27–29 November 2015. FUAM, Poznań, pp. 150–154 (2015)
Google Scholar
Miller, G.A., Beckwith, R., Fellbaum, C.D., Gross, D., Miller, K.: WordNet: an online lexical database. Int. J. Lexicograph. 3(4), 235–244 (1990)
Article Google Scholar
Palmer, M.: Semlink: Linking PropBank. VerbNet and FrameNet. In: Proceedings of the Generative Lexicon Conference, Pisa, Italy, September 2009
Google Scholar
Piasecki, M., Szpakowicz, S., Broda, B.: A Wordnet from the Ground Up. Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław (2009)
Google Scholar
Polański, K. (ed.): Słownik syntaktyczno-generatywny czasowników polskich. vol. I-IV, Ossolineum, Wrocław, 1980–1990; vol. V, IJP PAN, Kraków (1992)
Google Scholar
Przepiórkowski, A.: Korpus IPI PAN. Wersja wstępna (The IPI PAN Corpus: Preliminary version). IPI PAN, Warszawa (2004)
Google Scholar
Szymczak, M. (ed.): Słownik Języka Polskiego (Polish Dictionary). Państwowe Wydawnictwo Naukowe, Warszawa (1995)
Google Scholar
Vetulani, G.: Rzeczowniki predykatywne języka polskiego. W kierunku syntaktycznego słownika rzeczowników predykatywnych na tle porównawczym. (Predicate nouns of Polish. Towards a syntactic dictionary of predicate nouns), AMU Press, Poznań (2000)
Google Scholar
Vetulani, G.: Kolokacje werbo-nominalne jako samodzielne jednostki języka. Syntaktyczny słownik kolokacji werbo-nominalnych języka polskiego na potrzeby zastosowań informatycznych. Część I (Verb-noun collocations as language units. Syntactic dictionary of Polish verb-noun collocations for NLP applications. Part I). AMU Press, Poznań (2012)
Google Scholar
Vetulani, Z.: PROLOG implementation of an access in polish to a data base. In: Studia z automatyki, XII, Państwowe Wydawnictwo Naukowe, pp. 5–23 (1988)
Google Scholar
Vetulani, Z.: Linguistic problems in the theory of man-machine communication in natural language. A study of consultative question answering dialogs. Empirical approach. Brockmeyer, Bochum (1989)
Google Scholar
Vetulani, Z.: Corpus of consultative dialogs. Experimentally collected source data for AI applications. Adam Mickiewicz University Press, Poznań (1990)
Google Scholar
Vetulani, Z.: Lexical preanalysis in a DCG parser of POLISH. In: Klein, E., et al. (eds.) Betriebslinguistik und Linguistikbetrieb. Akten des 24 Linguistischen Kolloquiums, Bremen 1989, (Ling. Arbeiten 260/261), Max Niemeyer, Tübingen, pp. 389–395 (1991)
Google Scholar
Vetulani, Z.: SWITCHes for making Prolog more Dynamic Programming Language, Logic Programming, The Newsletter of the Association for Logic Programming, vol 7/1, p. 10, February 1994
Google Scholar
Vetulani, Z., Martinek, J., Vetulani, G.: The CEGLEX dictionary model for Polish. In: Bazylewicz, R., Kossak, O. (eds.) Proceedings of the 4th and 5th International Conferences UKRSOFT (Lviv, 1994, 1995), SP «BaK», Lviv, 1995, pp. 144–150 (1995)
Google Scholar
Vetulani, Z., Martinek, J., Obrębski, T., Vetulani, G.: Lexical Resources and Tools for Tagging Polish Texts within GRAMLEX. In: Linguisticae Investigationes, XXI:2, John Benjamins B.V, Amsaterdam 401–416 (1997)
Google Scholar
Vetulani, Z., Walczak, B., Obrębski, T., Vetulani, G.: Unambiguous coding of the inflection of Polish nouns and its application in the electronic dictionaries - format POLEX. Adam Mickiewicz University Press, Poznań (1998a)
Google Scholar
Vetulani, Z., Martinek, J., Obrębski, T., Vetulani, G.: Dictionary Based Methods and Tools for Language Engineering. Adam Mickiewicz University Press, Poznań (1998b)
Google Scholar
Vetulani, Z.: Komunikacja człowieka z maszyną. Komputerowe modelowanie kompetencji językowej (Man-machine communication. Computer modelling of language competence), Akademicka Oficyna Wydawnicza EXIT, Warszawa (2004)
Google Scholar
Vetulani, Z., Walkowska, J., Obrębski, T., Marciniak, J., Konieczka, P., Rzepecki, P.: An algorithm for building lexical semantic network and its application to PolNet - polish WordNet project. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS (LNAI), vol. 5603, pp. 369–381. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04235-5_32
Chapter Google Scholar
Vetulani, Z., et al.: Zasoby językowe i technologie przetwarzania tekstu. POLINT-112-SMS jako przykład aplikacji z zakresu bezpieczeństwa publicznego (Language resources and text processing technologies. POLINT-112-SMS as example of homeland security oriented application). AMU Press, Poznań (2010)
Google Scholar
Vetulani, Z.: Language resources in a public security application with text understanding competence. A Case Study: POLINT-112-SMS. In: Proceedings of the LRPS Workshop at LREC 2012, May 27, 2012. Istanbul, Turkey, ELRA, Paris, pp. 54–63 (2012)
Google Scholar
Vetulani, Z.: PolNet – Polish WordNet. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNCS (LNAI), vol. 8387, pp. 408–416. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08958-4_33
Chapter Google Scholar
Vetulani, Z., Vetulani, G.: Through Wordnet to Lexicon Grammar. In: Kakoyianni Doa, F. (ed.) Penser le lexique grammaire: perspectives actuelles, pp. 531–543. Editions Honoré Champion, Paris (2014a)
Google Scholar
Vetulani, Z., Vetulani, G.: Verb-Noun Collocations in PolNet 2.0. In: Henrich, V., Hinrichs, E., (eds.) Proceedings of the Workshop on Computational Cognitive and Linguistic Approaches to the Analysis of Complex Words and Collocations (CCLCC 2014), Tübingen, Germany, pp. 73–77 (2014b)
Google Scholar
Vetulani, Z., Vetulani, G.: Synonymie et granularité dans les bases lexicales du type Wordnet. Studia Romanica Posnaniensia, vol. 42/1, WN UAM, Poznań, pp. 113–127 (2015)
Google Scholar
Vetulani, Z., Vetulani, G., Kochanowski, B.: Recent advances in development of a lexicon-grammar of polish: PolNet 3.0. In: Calzolari, N., et al. (eds.) The 10th Conference on Language Resources and Evaluation, pp. 2851–2854. Paris, France, ELRA (2016)
Google Scholar
Vetulani, Z., Osiński, J.: Intelligent information bypass for more efficient emergency management. Comp. Methods in Science and Technology 23(2), 105–123 (2017)
Google Scholar
Vossen, P., Bloksma, L., Rodriguez, H., Climent, S., Calzolari, N., Peters, W.: The euro WordNet base concepts and top ontology, final version (1998). https://www.researchgate.net/publication/228594694_The_EuroWordNet_Base_Concepts_and_Top_Ontology
Walkowska, J.: Gathering and analysis of a corpus of polish SMS dialogs. In: Kłopotek, M.A., et al. (eds.) Challenging Problems of Science, pp. 145–157. Publishing House EXIT, Warszawa, Computer Science. Recent Advances in Intelligent Information Systems (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, Adam Mickiewicz University in Poznań, Ul. Uniwersytetu Poznańskiego 4, 61-614, Poznań, Poland
Zygmunt Vetulani
Faculty of Modern Languages and Literatures, Adam Mickiewicz University in Poznań, Al. Niepodległości 4, 61-874, Poznań, Poland
Grażyna Vetulani

Authors

Zygmunt Vetulani
View author publications
You can also search for this author in PubMed Google Scholar
Grażyna Vetulani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zygmunt Vetulani .

Editor information

Editors and Affiliations

Department of Applied Informatics, Wrocław University of Science and Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Thua Thien Hue Center of Information Technology, Hue, Vietnam
Bao Hung Hoang
Vietnam - Korea University of Information and Communication Technology, University of Da Nang, Da Nang, Vietnam
Cong Phap Huynh
Department of Computer Engineering, Yeungnam University, Gyeungsan, Korea (Republic of)
Dosam Hwang
Department of Applied Informatics, Wrocław University of Science and Technology, Wroclaw, Poland
Bogdan Trawiński
Department of Information Systems, University of Münster, Münster, Germany
Gottfried Vossen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vetulani, Z., Vetulani, G. (2020). Development of Real Size IT Systems with Language Competence as a Challenge for a Less-Resourced Language. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2020. Lecture Notes in Computer Science(), vol 12496. Springer, Cham. https://doi.org/10.1007/978-3-030-63007-2_59

Download citation

DOI: https://doi.org/10.1007/978-3-030-63007-2_59
Published: 23 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63006-5
Online ISBN: 978-3-030-63007-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics