Skip to main content

High Performance Part-of-Speech Tagging of Bulgarian

  • Conference paper
Artificial Intelligence: Methodology, Systems, and Applications (AIMSA 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3192))

  • 738 Accesses

Abstract

This paper presents an accurate and highly efficient rule-based part-of-speech tagger for Bulgarian. All four stages – tokenization, dictionary application, unknown words guessing and contextual part-of-speech disambiguation – are implemented as a pipeline of a couple deterministic finite state bimachines and transducers. We present a description of the Bulgarian ambiguity classes and a detailed evaluation and error analysis of our tagger. The overall precision of the tagger is over 98.4% for full disambiguation and the processing speed is over 34K words/sec on a personal computer. The same methodology has been applied for English as well. The presented realization conforms to the specific demands of the semantic web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abney, S.P.: Part-of-Speech Tagging and Partial Parsing. In: Church, K., Young, S., Bloothooft, G. (eds.) Corpus-Based Methods in Language and Speech, Kluwer Academic Publishers, Dordrecht (1996)

    Google Scholar 

  2. Brill, E.: Some advances in rule-based part of speech tagging. In: Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI 1994), Seattle, Wa (1994)

    Google Scholar 

  3. Chanod, J.-P., Tapanainen, P.: Tagging French - comparing a statistical and a constraint-based method. In: Proceedings of Seventh Conference of the European Chapter of the Association for Computational Linguistics (1995)

    Google Scholar 

  4. Church, K.: A stochastic parts program and noun phrase parser for unrestricted texts. In: Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas (1988)

    Google Scholar 

  5. Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In: Proceedings of Third Conference on Applied Natural Language Processing (ANLP 1992), pp. 133–140 (1992)

    Google Scholar 

  6. Ganchev, H., Mihov, S., Schulz, K.U.: One-Letter Automata: How to Reduce k Tapes to One. CIS-Bericht, Centrum fur Informations- und Sprachver-arbeitung, Universitat Munchen (2003)

    Google Scholar 

  7. Gerdemann, D., van Noord, G.: Transducers from Rewrite Rules with Backreferences. In: Proceedings of EACL 1999, Bergen Norway (1999)

    Google Scholar 

  8. Kaplan, R., Kay, M.: Regular Models of Phonological Rule Systems. Computational Linguistics 20(3), 331–378 (1994)

    Google Scholar 

  9. Koeva, S.: Grammar Dictionary of the Bulgarian Language Description of the principles of organization of the linguistic data, Bulgarian language magazine, book 6 (1998)

    Google Scholar 

  10. Mihov, S., Schulz, K.U.: Efficient Dictionary-Based Text Rewriting using Sequential Transducers, CIS-Bericht, Centrum fur Informations- und Sprachverarbeitung, Universitat Munchen (2004) (to appear)

    Google Scholar 

  11. Roche, E., Schabes, Y.: Deterministic Part-of-Speech Tagging with Finite-State Transducers. Computational Linguistics 21(2) (June 1995)

    Google Scholar 

  12. Roche, E., Schabes, Y.: Introduction. In: Roche, E., Schabes, Y. (eds.) Finite-State language processing, MIT Press, Cambridge (1997)

    Google Scholar 

  13. Simov, K., Osenova, P.: A Hybrid System for MorphoSyntactic Disambiguation in Bulgarian. In: Proceedings of the RANLP 2001 Conference, Tzigov Chark, Bulgaria (September 5-7, 2001)

    Google Scholar 

  14. Tanev, H., Mitkov, R.: Shallow Language Processing Architecture for Bulgarian. In: Proceedings of COLING 2002: The 17th International Conference on Computational Linguistics (2002)

    Google Scholar 

  15. Voutilainen, A.: A syntax-based part-of-speech analyser. In: Proceedings of Seventh Conference of the European Chapter of the Association for Computational Linguistics (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Doychinova, V., Mihov, S. (2004). High Performance Part-of-Speech Tagging of Bulgarian. In: Bussler, C., Fensel, D. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2004. Lecture Notes in Computer Science(), vol 3192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30106-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30106-6_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22959-9

  • Online ISBN: 978-3-540-30106-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics