Skip to main content

Towards Full Lexical Recognition

  • Conference paper
  • 861 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3206))

Abstract

Text processing in Serbian is based on the Intex format system of electronic dictionaries. Although lexical recognition is successful for 75% to 90% of word forms (depending on the type of text), some categories of words remain unrecognized. In this paper we present two aspects of e-dictionary enhancement that provide for additional recognition of two important categories of words: named entities and words generally not recorded in traditional dictionaries. We first describe the structure and content of dictionaries of proper names, both personal and geographic, developed to recognize the corresponding classes of named entities. Then we present a set of lexical transducers expressing morphological rules governing word formation, developed for the recognition of unknown words. The resources presented significantly improve the lexical recognition process.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Erjavec, T., Džeroski, S.: Machine Learning of Morphosyntactic Structure: Lemmatising Unknown Slovene Words. Appl. Artificial Intelligence 18(1), 17–40 (2004)

    Article  Google Scholar 

  2. Krstev, C., Pavlović-Lažetić, G., Obradović, I., Vitas, D.: Using Textual and Lexical Resources in Developing Serbian Wordnet, Romanina. Journal for Information Science & Technology (2004) [in print]

    Google Scholar 

  3. Grass, T., Maurel, D., Piton, O., Eggert, E.: Description of a Multilingual Database of Proper Names. In: Ranchhod, E., Mamede, N.J. (eds.) PorTAL 2002. LNCS (LNAI), vol. 2389, pp. 137–140. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Pala, K., Sedláček, R., Veber, M.: Relations between Inflectional and Derivation Patterns. In: Proc. of Workshop Morphological Processing of Slavic languages, EACL 2003, Budapest, pp. 1–8 (2003)

    Google Scholar 

  5. Silberztein, M.D.: Le dictionaire électronique et analyse automatique de textes: Le systeme INTEX. Masson, Paris (1993)

    Google Scholar 

  6. Vitas, D., et al.: An Overview of Resources and Basic Tools for Processing of SerbianWritten Texts. In: Proc. of the Workshop on Balkan Language Resources and Tools, 1st Balkan Conference in Informatics (2003), http://iit.demokritos.gr/skel/bci03_workshop/pages/programme.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pavlović-Lažetić, G., Vitas, D., Krstev, C. (2004). Towards Full Lexical Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30120-2_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23049-6

  • Online ISBN: 978-3-540-30120-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics