Abstract
We compare the accuracy of several single and combination part-of-speech tagging methods applied to Polish and evaluated on the modified corpus of Frequency Dictionary of Contemporary Polish (m-FDCP). Three well known combination methods (weighted voting, distributed voting, and stacked) are analyzed, as well as two new weighted voting methods: MorphCatPrecision and AmbClassPrecision methods are proposed. The MorphCatPrecision method achieves the highest accuracy among all considered weighted voting methods. The best combination method achieves 11.9% error reduction with respect to the best baseline tagger.
We report also the statistical significance of the difference in accuracy between various methods measured by means of the McNemar test.
Selection of the best algorithms was conducted on a multiprocessor supercomuter due to the high time and memory requirements of most of these algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kuta, M., Chrzaszcz, P., Kitowski, J.: Increasing quality of the Corpus of Frequency Dictionary of Contemporary Polish for morphosyntactic tagging of the Polish language. Computing and Informatics 28(3), 2009
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proc. of the 1st Conf. on Empirical Methods in Natural Language Processing, pp. 133–142 (1996)
Daelemans, W., Zavrel, J., Berck, P., Gillis, S.: MBT: A memory-based part of speech tagger-generator. In: Proc. of the 4th Workshop on Very Large Corpora, pp. 14–27 (1996)
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proc. of the Int. Conf. on New Methods in Language Processing, pp. 44–49 (1994)
Brants, T.: TnT - a statistical part-of-speech tagger. In: Proc. of the 6th Applied Natural Language Processing Conf., pp. 224–231 (2000)
Florian, R., Ngai, G.: Fast Transformation-Based Learning Toolkit manual. John Hopkins Univ., USA (2001), http://nlp.cs.jhu.edu/~rflorian/fntbl
Giménez, J., Màrquez, L.: SVMTool: A general POS tagger generator based on Support Vector Machines. In: Proc. of the 4th Int. Conf. on Language Resources and Evaluation, pp. 43–46 (2004)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. of the 18th Int. Conf. on Machine Learning (ICML 2001), pp. 282–289 (2001)
van Halteren, H., Zavrel, J., Daelemans, W.: Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics 27(2), 199–229 (2001)
Brill, E., Wu, J.: Classifier combination for improved lexical disambiguation. In: Proc. of the 7th Int. Conf. on Computational Linguistics, pp. 191–195 (1998)
Kuta, M., Wrzeszcz, M., Chrzaszcz, P., Kitowski, J.: Accuracy of baseline and complex methods applied to morphosyntactic tagging of Polish. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2008, Part I. LNCS, vol. 5101, pp. 903–912. Springer, Heidelberg (2008)
Hajič, J.: Morphological tagging: Data vs. Dictionaries. In: Proc. of the First Conf. North American Chapter of the Association for Computational Linguistics, pp. 94–101 (2000)
Kuta, M., Wójcik, W., Wrzeszcz, M., Kitowski, J.: Application of weighted voting taggers to languages described with large tagsets. Computing and Informatics (2009) (submitted)
Mihalcea, R.: Performance analysis of a part of speech tagging task. In: Proc. of the 4th Int. Conf. on Computational Linguistics and Intelligent Text Processing, pp. 158–166 (2003)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Publishers, San Francisco (2005)
Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory Based Learner. Technical report ILK 07-07, Tilburg University (2007)
Przepiórkowski, A., Woliński, M.: A flexemic tagset for Polish. In: Proc. of the Workshop on Morphological Processing of Slavic Languages (EACL 2003), pp. 33–40 (2003)
Bień, J., Woliński, M.: Enriched Corpus of Frequency Dictionary of Contemporary Polish. Warsaw University, Poland (2001) (in Polish), http://www.mimuw.edu.pl/polszczyzna
Kuta, M., Chrzaszcz, P., Kitowski, J.: A case study of algorithms for morphosyntactic tagging of Polish language. Computing and Informatics 26(6), 627–647 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kuta, M., Wójcik, W., Wrzeszcz, M., Kitowski, J. (2010). Application of Stacked Methods to Part-of-Speech Tagging of Polish. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2009. Lecture Notes in Computer Science, vol 6067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14390-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-14390-8_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14389-2
Online ISBN: 978-3-642-14390-8
eBook Packages: Computer ScienceComputer Science (R0)