Skip to main content

Development of Real Size IT Systems with Language Competence as a Challenge for a Less-Resourced Language

  • Conference paper
  • First Online:
  • 1232 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12496))

Abstract

In this paper, based on the example of our early works for Polish, we want to share our experience in the challenging task of developing NLP-based technologies in the situation of initial scarcity of digital language resources that ranked Polish among the Less-Resourced Languages. We present some of our projects aiming at language resources and tools we had to develop in order to be able to process text in Polish and to develop real-scale systems with language understanding competence. The case study we present here is the rule-based system POLINT-112-SMS for improving information management in emergency situations. We argue in favor of the lexicon-grammar approach to formal description of highly inflected languages and present our current work on this grammatical paradigm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    See e.g. (Vetulani 1988) reporting our results obtained already in 1984 at GIA, Marseille.

  2. 2.

    This system is an outcome of the POLEX Polish Lexicon Project (“POLEX - Polska Leksykalna Baza Danych No KBN8S50301007”) realised by Z. Vetulani, B. Walczak, T. Obrębski, G. Vetulani and other team members during 1994–1996 (Vetulani et al. 1998a).

  3. 3.

    The resource is distributed through ELRA. ISLRN: 147-211-031-223-4; ID: ELRA-L0047.

  4. 4.

    GENELEX was continued by LE-PAROLE (1996–1998), LE-SIMPLE (1998–2000) and GRAAL (1992–1996) projects.

  5. 5.

    See e.g. Maurice Gross (1975) and Polański (1980–1992).

  6. 6.

    CEGLEX Consortium: AMU, Poznań/Poland/, Charles University, Prague/Czech Rep./, GSI-ERLI, Charenton/France/(coordinator), Lingware, Szeged/Hungary/.

  7. 7.

    From the GRAMLEX project description by Eric Laporte, project coordinator. Quoted by (Vetulani et al. 1998b).

  8. 8.

    GRAMLEX Consortium: AMU, Poznań/Poland/, ASSTRIL, Marne la Vallée/France/ (coordinator), CLR, Salerno/Italy/, Hungarian Academy of Science, Budapest/Hungary/.

  9. 9.

    These tools and applications were: (i) a lemmatizer/tagger (LEXAN) (Vetulani et al. 1997, 1998b), (ii) a generator of inflected forms for simple and compound lexemes (Vetulani et al. 1998b), (iii) a syntactic concordance generator (SCON) (Vetulani et al. 1998b), (iv) a tool for the extraction of compound terms and terminology from texts (EXTRACT) (Vetulani et al. 1998b), (v) an application for the structure analysis of dictionary entries (VERBAN) (Vetulani et al. 1998b), (vi) an application for acquisition of the lexicon from dictionary definitions (NOUNAN) (Vetulani et al. 1998b), (vii) an application for interactive analysis of dictionary definitions (NOUNDAN) (Vetulani et al. 1998b).

  10. 10.

    Grant of Ministry of Science and Higher Education (MNiSzW) Nr R0002802 (2006–2010).

  11. 11.

    Emergency telephone service maintained by the Polish Police (equivallent to tel. 112).

  12. 12.

    See (Vetulani Z. 2014) for the core PolNet bibliography until 2014.

  13. 13.

    At about the same time, Wrocław Technical University started another successful wordnet project plWordnet (pl. Słowosieć) (Piasecki et al. 2009) based on a different methodology.

  14. 14.

    In the Princeton WordNet the basic entities are synsets, i.e. classes of synonyms related by semantic relations of which the most important are hyponymy and hyperonymy.

  15. 15.

    Princeton WordNet (Miller et al. 1990) was used as a formal ontology to implement systems with language understanding functionality. In order to respect a specific Polish conceptualization of the world, we decided to build PolNet from scratch rather than merely translate the Princeton WordNet into Polish.

  16. 16.

    See (Vetulani et al. 2007) for PolNet development algorithm.

  17. 17.

    ORBIS, an interface to a database on planets, was implemented in PROLOG to show the qualities of the declarative programming paradigm. The initially bilingual system was extended with a module for Polish by Z. Vetulani while his research fellowship at GIA, the University Aix-Marseille II in 1984 (Vetulani 1988).

  18. 18.

    In (Vetulani 1994) we described the switch technique which combined with appropriate heuristics permits, on the ground of morphological and valency information, the reduction of the complexity of parsing sometimes down to linear.

  19. 19.

    This dictionary is described in two monographs. The first one (2000) describes the initial phase of work on a dictionary of verb-noun collocations together their usage in sentences as predicates (2862 predicative nouns). This work was done manually. The extension of the resource to 14.600 collocations was described in the second book (2012) reporting further, computer-assisted work. A part of this resource was integrated with PolNet.

  20. 20.

    In (Vetulani et al. 2010) we described a computer-assisted algorithm to extract collocations directly from text corpora. Still, involvement of qualified lexicographers is necessary.

  21. 21.

    The term support verb was first introduced to linguistics by Harris (1964), on the occasion of his research on nominalisation. Gross (1975) used the term verb support (French) in his work on lexicon-grammars, while Ch. Fillmore (Filmore et al. 2002) preferred to use the word light verb in the project VerbNet. G. Vetulani (Vetulani 2000) uses the term czasownik podporowy (Polish).

  22. 22.

    Here support verb + predicative noun, but more generally any other predicative word (such as an adjective or adverb) instead of a noun.

  23. 23.

    Eg. in order to seed-up parsing (Vetulani 1991).

  24. 24.

    In Polish we observe the phenomenon of syntactic synonymy (Jędrzejko 1993) where for some predicative verbs their morpho-syntactic structure varies from the morpho-syntactic structure of their semantic synonyms in the form of verb-noun collocation (e.g. for the direct complement). For consistency with our methodological assumptions, we range these synonymous forms in distinct synsets interconnected by a special semantic similarity relation.

  25. 25.

    From14.400 in PolNet 2.0 to 17.564 in PolNet 3.0.

  26. 26.

    In Kraków, Poznań, Rzeszów, Śląsk (Silesia), Warszawa, Wrocłąw, and other places.

References

  • Antoni-Lay, M.-H., Francopoulo, G., Zaysser, L.: A Generic Model for Reusable Lexicons: The Genelex Project. Literary and Linguistic Computing, vol. 9, no 1, pp. 47–54. University Press, Oxford (1994)

    Google Scholar 

  • Colmerauer, A., Kittredge, R.: ORBIS. In: J. Horecký, J. (ed.) Proceedings of the 9th COLING Conference (1982)

    Google Scholar 

  • Fillmore Ch.J., Baker C.F., Sato H.: Seeing arguments through transparent structures. In: Proceedings of Third International Conference on Language Resources and Evaluation, Proceedings, vol. III, Las Palmas, pp. 787–791 (2002)

    Google Scholar 

  • Gross, M.: Méthodes en syntaxe. Hermann, Paris (1975)

    Google Scholar 

  • Harris, Z.S.: The Elementary Transformations. Transformations and Discourse Analysis Papers No. 54. University of Pennsylvania, Philadelphia (1964)

    Google Scholar 

  • Jędrzejko, E.: Nominalizacje w systemie i w tekstach współczesnej polszczyzny. Wydawnictwo Uniwersytetu Śląskiego, Katowice (1993)

    Google Scholar 

  • Kubis, M.: A semantic similarity measurement tool for WordNet-like databases. In: Vetulani, Z., Mariani, J. (eds.) Proceedings of the 7th Language and Technology Conference, Poznań, Poland, 27–29 November 2015. FUAM, Poznań, pp. 150–154 (2015)

    Google Scholar 

  • Miller, G.A., Beckwith, R., Fellbaum, C.D., Gross, D., Miller, K.: WordNet: an online lexical database. Int. J. Lexicograph. 3(4), 235–244 (1990)

    Article  Google Scholar 

  • Palmer, M.: Semlink: Linking PropBank. VerbNet and FrameNet. In: Proceedings of the Generative Lexicon Conference, Pisa, Italy, September 2009

    Google Scholar 

  • Piasecki, M., Szpakowicz, S., Broda, B.: A Wordnet from the Ground Up. Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław (2009)

    Google Scholar 

  • Polański, K. (ed.): Słownik syntaktyczno-generatywny czasowników polskich. vol. I-IV, Ossolineum, Wrocław, 1980–1990; vol. V, IJP PAN, Kraków (1992)

    Google Scholar 

  • Przepiórkowski, A.: Korpus IPI PAN. Wersja wstępna (The IPI PAN Corpus: Preliminary version). IPI PAN, Warszawa (2004)

    Google Scholar 

  • Szymczak, M. (ed.): Słownik Języka Polskiego (Polish Dictionary). Państwowe Wydawnictwo Naukowe, Warszawa (1995)

    Google Scholar 

  • Vetulani, G.: Rzeczowniki predykatywne języka polskiego. W kierunku syntaktycznego słownika rzeczowników predykatywnych na tle porównawczym. (Predicate nouns of Polish. Towards a syntactic dictionary of predicate nouns), AMU Press, Poznań (2000)

    Google Scholar 

  • Vetulani, G.: Kolokacje werbo-nominalne jako samodzielne jednostki języka. Syntaktyczny słownik kolokacji werbo-nominalnych języka polskiego na potrzeby zastosowań informatycznych. Część I (Verb-noun collocations as language units. Syntactic dictionary of Polish verb-noun collocations for NLP applications. Part I). AMU Press, Poznań (2012)

    Google Scholar 

  • Vetulani, Z.: PROLOG implementation of an access in polish to a data base. In: Studia z automatyki, XII, Państwowe Wydawnictwo Naukowe, pp. 5–23 (1988)

    Google Scholar 

  • Vetulani, Z.: Linguistic problems in the theory of man-machine communication in natural language. A study of consultative question answering dialogs. Empirical approach. Brockmeyer, Bochum (1989)

    Google Scholar 

  • Vetulani, Z.: Corpus of consultative dialogs. Experimentally collected source data for AI applications. Adam Mickiewicz University Press, Poznań (1990)

    Google Scholar 

  • Vetulani, Z.: Lexical preanalysis in a DCG parser of POLISH. In: Klein, E., et al. (eds.) Betriebslinguistik und Linguistikbetrieb. Akten des 24 Linguistischen Kolloquiums, Bremen 1989, (Ling. Arbeiten 260/261), Max Niemeyer, Tübingen, pp. 389–395 (1991)

    Google Scholar 

  • Vetulani, Z.: SWITCHes for making Prolog more Dynamic Programming Language, Logic Programming, The Newsletter of the Association for Logic Programming, vol 7/1, p. 10, February 1994

    Google Scholar 

  • Vetulani, Z., Martinek, J., Vetulani, G.: The CEGLEX dictionary model for Polish. In: Bazylewicz, R., Kossak, O. (eds.) Proceedings of the 4th and 5th International Conferences UKRSOFT (Lviv, 1994, 1995), SP «BaK», Lviv, 1995, pp. 144–150 (1995)

    Google Scholar 

  • Vetulani, Z., Martinek, J., Obrębski, T., Vetulani, G.: Lexical Resources and Tools for Tagging Polish Texts within GRAMLEX. In: Linguisticae Investigationes, XXI:2, John Benjamins B.V, Amsaterdam 401–416 (1997)

    Google Scholar 

  • Vetulani, Z., Walczak, B., Obrębski, T., Vetulani, G.: Unambiguous coding of the inflection of Polish nouns and its application in the electronic dictionaries - format POLEX. Adam Mickiewicz University Press, Poznań (1998a)

    Google Scholar 

  • Vetulani, Z., Martinek, J., Obrębski, T., Vetulani, G.: Dictionary Based Methods and Tools for Language Engineering. Adam Mickiewicz University Press, Poznań (1998b)

    Google Scholar 

  • Vetulani, Z.: Komunikacja człowieka z maszyną. Komputerowe modelowanie kompetencji językowej (Man-machine communication. Computer modelling of language competence), Akademicka Oficyna Wydawnicza EXIT, Warszawa (2004)

    Google Scholar 

  • Vetulani, Z., Walkowska, J., Obrębski, T., Marciniak, J., Konieczka, P., Rzepecki, P.: An algorithm for building lexical semantic network and its application to PolNet - polish WordNet project. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS (LNAI), vol. 5603, pp. 369–381. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04235-5_32

    Chapter  Google Scholar 

  • Vetulani, Z., et al.: Zasoby językowe i technologie przetwarzania tekstu. POLINT-112-SMS jako przykład aplikacji z zakresu bezpieczeństwa publicznego (Language resources and text processing technologies. POLINT-112-SMS as example of homeland security oriented application). AMU Press, Poznań (2010)

    Google Scholar 

  • Vetulani, Z.: Language resources in a public security application with text understanding competence. A Case Study: POLINT-112-SMS. In: Proceedings of the LRPS Workshop at LREC 2012, May 27, 2012. Istanbul, Turkey, ELRA, Paris, pp. 54–63 (2012)

    Google Scholar 

  • Vetulani, Z.: PolNet – Polish WordNet. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNCS (LNAI), vol. 8387, pp. 408–416. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08958-4_33

    Chapter  Google Scholar 

  • Vetulani, Z., Vetulani, G.: Through Wordnet to Lexicon Grammar. In: Kakoyianni Doa, F. (ed.) Penser le lexique grammaire: perspectives actuelles, pp. 531–543. Editions Honoré Champion, Paris (2014a)

    Google Scholar 

  • Vetulani, Z., Vetulani, G.: Verb-Noun Collocations in PolNet 2.0. In: Henrich, V., Hinrichs, E., (eds.) Proceedings of the Workshop on Computational Cognitive and Linguistic Approaches to the Analysis of Complex Words and Collocations (CCLCC 2014), Tübingen, Germany, pp. 73–77 (2014b)

    Google Scholar 

  • Vetulani, Z., Vetulani, G.: Synonymie et granularité dans les bases lexicales du type Wordnet. Studia Romanica Posnaniensia, vol. 42/1, WN UAM, Poznań, pp. 113–127 (2015)

    Google Scholar 

  • Vetulani, Z., Vetulani, G., Kochanowski, B.: Recent advances in development of a lexicon-grammar of polish: PolNet 3.0. In: Calzolari, N., et al. (eds.) The 10th Conference on Language Resources and Evaluation, pp. 2851–2854. Paris, France, ELRA (2016)

    Google Scholar 

  • Vetulani, Z., Osiński, J.: Intelligent information bypass for more efficient emergency management. Comp. Methods in Science and Technology 23(2), 105–123 (2017)

    Google Scholar 

  • Vossen, P., Bloksma, L., Rodriguez, H., Climent, S., Calzolari, N., Peters, W.: The euro WordNet base concepts and top ontology, final version (1998). https://www.researchgate.net/publication/228594694_The_EuroWordNet_Base_Concepts_and_Top_Ontology

  • Walkowska, J.: Gathering and analysis of a corpus of polish SMS dialogs. In: Kłopotek, M.A., et al. (eds.) Challenging Problems of Science, pp. 145–157. Publishing House EXIT, Warszawa, Computer Science. Recent Advances in Intelligent Information Systems (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zygmunt Vetulani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vetulani, Z., Vetulani, G. (2020). Development of Real Size IT Systems with Language Competence as a Challenge for a Less-Resourced Language. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2020. Lecture Notes in Computer Science(), vol 12496. Springer, Cham. https://doi.org/10.1007/978-3-030-63007-2_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63007-2_59

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63006-5

  • Online ISBN: 978-3-030-63007-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics