Abstract
This paper presents an accurate and highly efficient rule-based part-of-speech tagger for Bulgarian. All four stages – tokenization, dictionary application, unknown words guessing and contextual part-of-speech disambiguation – are implemented as a pipeline of a couple deterministic finite state bimachines and transducers. We present a description of the Bulgarian ambiguity classes and a detailed evaluation and error analysis of our tagger. The overall precision of the tagger is over 98.4% for full disambiguation and the processing speed is over 34K words/sec on a personal computer. The same methodology has been applied for English as well. The presented realization conforms to the specific demands of the semantic web.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abney, S.P.: Part-of-Speech Tagging and Partial Parsing. In: Church, K., Young, S., Bloothooft, G. (eds.) Corpus-Based Methods in Language and Speech, Kluwer Academic Publishers, Dordrecht (1996)
Brill, E.: Some advances in rule-based part of speech tagging. In: Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI 1994), Seattle, Wa (1994)
Chanod, J.-P., Tapanainen, P.: Tagging French - comparing a statistical and a constraint-based method. In: Proceedings of Seventh Conference of the European Chapter of the Association for Computational Linguistics (1995)
Church, K.: A stochastic parts program and noun phrase parser for unrestricted texts. In: Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas (1988)
Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In: Proceedings of Third Conference on Applied Natural Language Processing (ANLP 1992), pp. 133–140 (1992)
Ganchev, H., Mihov, S., Schulz, K.U.: One-Letter Automata: How to Reduce k Tapes to One. CIS-Bericht, Centrum fur Informations- und Sprachver-arbeitung, Universitat Munchen (2003)
Gerdemann, D., van Noord, G.: Transducers from Rewrite Rules with Backreferences. In: Proceedings of EACL 1999, Bergen Norway (1999)
Kaplan, R., Kay, M.: Regular Models of Phonological Rule Systems. Computational Linguistics 20(3), 331–378 (1994)
Koeva, S.: Grammar Dictionary of the Bulgarian Language Description of the principles of organization of the linguistic data, Bulgarian language magazine, book 6 (1998)
Mihov, S., Schulz, K.U.: Efficient Dictionary-Based Text Rewriting using Sequential Transducers, CIS-Bericht, Centrum fur Informations- und Sprachverarbeitung, Universitat Munchen (2004) (to appear)
Roche, E., Schabes, Y.: Deterministic Part-of-Speech Tagging with Finite-State Transducers. Computational Linguistics 21(2) (June 1995)
Roche, E., Schabes, Y.: Introduction. In: Roche, E., Schabes, Y. (eds.) Finite-State language processing, MIT Press, Cambridge (1997)
Simov, K., Osenova, P.: A Hybrid System for MorphoSyntactic Disambiguation in Bulgarian. In: Proceedings of the RANLP 2001 Conference, Tzigov Chark, Bulgaria (September 5-7, 2001)
Tanev, H., Mitkov, R.: Shallow Language Processing Architecture for Bulgarian. In: Proceedings of COLING 2002: The 17th International Conference on Computational Linguistics (2002)
Voutilainen, A.: A syntax-based part-of-speech analyser. In: Proceedings of Seventh Conference of the European Chapter of the Association for Computational Linguistics (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Doychinova, V., Mihov, S. (2004). High Performance Part-of-Speech Tagging of Bulgarian. In: Bussler, C., Fensel, D. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2004. Lecture Notes in Computer Science(), vol 3192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30106-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-30106-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22959-9
Online ISBN: 978-3-540-30106-6
eBook Packages: Springer Book Archive