Skip to main content

Application of Stacked Methods to Part-of-Speech Tagging of Polish

  • Conference paper
Parallel Processing and Applied Mathematics (PPAM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6067))

Abstract

We compare the accuracy of several single and combination part-of-speech tagging methods applied to Polish and evaluated on the modified corpus of Frequency Dictionary of Contemporary Polish (m-FDCP). Three well known combination methods (weighted voting, distributed voting, and stacked) are analyzed, as well as two new weighted voting methods: MorphCatPrecision and AmbClassPrecision methods are proposed. The MorphCatPrecision method achieves the highest accuracy among all considered weighted voting methods. The best combination method achieves 11.9% error reduction with respect to the best baseline tagger.

We report also the statistical significance of the difference in accuracy between various methods measured by means of the McNemar test.

Selection of the best algorithms was conducted on a multiprocessor supercomuter due to the high time and memory requirements of most of these algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kuta, M., Chrzaszcz, P., Kitowski, J.: Increasing quality of the Corpus of Frequency Dictionary of Contemporary Polish for morphosyntactic tagging of the Polish language. Computing and Informatics 28(3), 2009

    Google Scholar 

  2. Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proc. of the 1st Conf. on Empirical Methods in Natural Language Processing, pp. 133–142 (1996)

    Google Scholar 

  3. Daelemans, W., Zavrel, J., Berck, P., Gillis, S.: MBT: A memory-based part of speech tagger-generator. In: Proc. of the 4th Workshop on Very Large Corpora, pp. 14–27 (1996)

    Google Scholar 

  4. Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21(4), 543–565 (1995)

    Google Scholar 

  5. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proc. of the Int. Conf. on New Methods in Language Processing, pp. 44–49 (1994)

    Google Scholar 

  6. Brants, T.: TnT - a statistical part-of-speech tagger. In: Proc. of the 6th Applied Natural Language Processing Conf., pp. 224–231 (2000)

    Google Scholar 

  7. Florian, R., Ngai, G.: Fast Transformation-Based Learning Toolkit manual. John Hopkins Univ., USA (2001), http://nlp.cs.jhu.edu/~rflorian/fntbl

  8. Giménez, J., Màrquez, L.: SVMTool: A general POS tagger generator based on Support Vector Machines. In: Proc. of the 4th Int. Conf. on Language Resources and Evaluation, pp. 43–46 (2004)

    Google Scholar 

  9. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. of the 18th Int. Conf. on Machine Learning (ICML 2001), pp. 282–289 (2001)

    Google Scholar 

  10. van Halteren, H., Zavrel, J., Daelemans, W.: Improving accuracy in word class tagging through the combination of machine learning systems. Computational Linguistics 27(2), 199–229 (2001)

    Article  Google Scholar 

  11. Brill, E., Wu, J.: Classifier combination for improved lexical disambiguation. In: Proc. of the 7th Int. Conf. on Computational Linguistics, pp. 191–195 (1998)

    Google Scholar 

  12. Kuta, M., Wrzeszcz, M., Chrzaszcz, P., Kitowski, J.: Accuracy of baseline and complex methods applied to morphosyntactic tagging of Polish. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2008, Part I. LNCS, vol. 5101, pp. 903–912. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Hajič, J.: Morphological tagging: Data vs. Dictionaries. In: Proc. of the First Conf. North American Chapter of the Association for Computational Linguistics, pp. 94–101 (2000)

    Google Scholar 

  14. Kuta, M., Wójcik, W., Wrzeszcz, M., Kitowski, J.: Application of weighted voting taggers to languages described with large tagsets. Computing and Informatics (2009) (submitted)

    Google Scholar 

  15. Mihalcea, R.: Performance analysis of a part of speech tagging task. In: Proc. of the 4th Int. Conf. on Computational Linguistics and Intelligent Text Processing, pp. 158–166 (2003)

    Google Scholar 

  16. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)

    Google Scholar 

  17. Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Publishers, San Francisco (2005)

    MATH  Google Scholar 

  18. Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory Based Learner. Technical report ILK 07-07, Tilburg University (2007)

    Google Scholar 

  19. Przepiórkowski, A., Woliński, M.: A flexemic tagset for Polish. In: Proc. of the Workshop on Morphological Processing of Slavic Languages (EACL 2003), pp. 33–40 (2003)

    Google Scholar 

  20. Bień, J., Woliński, M.: Enriched Corpus of Frequency Dictionary of Contemporary Polish. Warsaw University, Poland (2001) (in Polish), http://www.mimuw.edu.pl/polszczyzna

    Google Scholar 

  21. Kuta, M., Chrzaszcz, P., Kitowski, J.: A case study of algorithms for morphosyntactic tagging of Polish language. Computing and Informatics 26(6), 627–647 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kuta, M., Wójcik, W., Wrzeszcz, M., Kitowski, J. (2010). Application of Stacked Methods to Part-of-Speech Tagging of Polish. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2009. Lecture Notes in Computer Science, vol 6067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14390-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14390-8_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14389-2

  • Online ISBN: 978-3-642-14390-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics