Skip to main content

Improved Hungarian Morphological Disambiguation with Tagger Combination

  • Conference paper
Book cover Text, Speech, and Dialogue (TSD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

  • 2381 Accesses

Abstract

In case of morphologically rich languages, full morphological disambiguation is a fundamental task that is more difficult than just providing PoS tags. In our paper, we overview Hungarian morphological disambiguation tools, and evaluate some common tagger combination techniques in order to improve annotation accuracy. Following an error analysis of the existing tools, we introduce a method that independently selects the proper tag and lemma and harmonizes them achieving a 28.90% error rate reduction compared to PurePos, a state-of-the-art Hungarian morphological annotation tool.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)

    Google Scholar 

  2. Brill, E., Wu, J.: Classifier combination for improved lexical disambiguation. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1, pp. 191–195. Association for Computational Linguistics, Stroudsburg (1998)

    Chapter  Google Scholar 

  3. Csendes, D., Csirik, J., Gyimóthy, T.: The Szeged Corpus: A POS tagged and syntactically annotated Hungarian natural language corpus. In: Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora LINC 2004 at The 20th International Conference on Computational Linguistics COLING 2004, pp. 19–23 (2004)

    Google Scholar 

  4. Hajič, J., Krbec, P., Květoň, Oliva, K., Petkevič, V.: Serial combination of rules and statistics: A case study in Czech tagging. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 268–275. Association for Computational Linguistics (2001)

    Google Scholar 

  5. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software. ACM SIGKDD Explorations Newsletter 11(1), 10 (2009)

    Article  Google Scholar 

  6. Van Halteren, H., Zavrel, J., Daelemans, W.: Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems. Computational Linguistics 27(2), 199–229 (2001)

    Article  Google Scholar 

  7. Laki, L.: Investigating the Possibilities of Using SMT for Text Annotation. In: Simões, A., Queirós, R., da Cruz, D. (eds.) 1st Symposium on Languages, Applications and Technologies. OpenAccess Series in Informatics (OASIcs), vol. 21, pp. 267–283. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl (2012)

    Google Scholar 

  8. Orosz, G., Novák, A.: PurePos – an open source morphological disambiguator. In: Sharp, B., Zock, M. (eds.) Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science, Wroclaw, pp. 53–63 (2012)

    Google Scholar 

  9. Prószéky, G.: Industrial applications of unification morphology. Association for Computational Linguistics, Morristown (1994)

    Google Scholar 

  10. Prószéky, G., Novák, A.: Computational Morphologies for Small Uralic Languages. In: Inquiries into Words, Constraints and Contexts, Stanford, California, pp. 150–157 (2005)

    Google Scholar 

  11. Trón, V., Halácsy, P., Rebrus, P., Rung, A., Vajda, P., Simon, E.: Morphdb.hu: Hungarian lexical database and morphological grammar. In: Proceedings of the Fifth Conference on International Language Resources and Evaluation, Genoa, pp. 1670–1673 (2006)

    Google Scholar 

  12. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. (2011)

    Google Scholar 

  13. Zsibrita, J., Vincze, V., Farkas, R.: magyarlanc 2.0: szintaktikai elemzés és felgyorsított szófaji egyértelműsítés. In: IX. Magyar Számítógépes Nyelvészeti Konferencia, pp. 238–374. Szegedi Tudományegyetem, Szeged (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Orosz, G., Laki, L.J., Novák, A., Siklósi, B., Wenszky, N. (2013). Improved Hungarian Morphological Disambiguation with Tagger Combination. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40585-3_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40584-6

  • Online ISBN: 978-3-642-40585-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics