Word Sense Disambiguation of Czech Texts

Cikhart, Ondřej; Hajič, Jan

doi:10.1007/3-540-48239-3_20

Ondřej Cikhart³ &
Jan Hajič³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1692))

Included in the following conference series:

International Workshop on Text, Speech and Dialogue

489 Accesses

Abstract

This contribution refers to the project of BYLL Software Ltd. that uses human aided WSD for the annotation of a fulltext database of the Czech law system named ASPI. We used about 3 mil. words of annotated texts from the law system of the Czech Republic since the 60’s. The annotated law corpus provides certain text regularity, but at the same time it covers wide range of subjects. The goal has been to save as much of the human intervention during text indexing as possi- ble, measured by the number of queries posed to the human annotator, whilst retaining truly minimal error rate (∼0.5 %) in the automatically disambiguated cases. A combination of Naive Bayes, Decision Lists and (minimal number) of manually written rules has been used. The statisti- cal methods showed up to be appropriate for our intention. The results show that we have saved 80% of queries to the human annotator, which proved to be enough to warrant the inclusion of the software into a pro- duction system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Analysis of Word Sense Disambiguation (WSD)

A Survey of Different Approaches for Word Sense Disambiguation

Word Sense Disambiguation in Bengali: An Auto-updated Learning Set Increases the Accuracy of the Result

References

Cikhart, O. Lexikáalní disambiguace českých textů. Master thesis,MFFUKPraha, 1998.
Google Scholar
Fujii, Atsushi. Corpus-Based Word Sense Disambiguation. PhD thesis, Report No. TR98-0003, University of Library and Information Science, Tokyo Institute of Technology, Japan, 1998.
Google Scholar
Gale, William A., Kenneth W. Church, and David Yarowsky. Amethod for disambiguating word senses in a large corpus. Computers and Humanities, 26:415–439, 1992.
Article Google Scholar
Laciga, Z. Praktická aplikace lingvistické analýzy při vyhledávání v česky psaných textech. Sbornik konference EurOpen CZ’ 99, 1999.
Google Scholar
Yarowsky, D. Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In Proceedings of Coling-92, 1992.
Google Scholar
Yarowsky, D. Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In Proceedings of 32nd meeting of the ACL, Las Cruces NM, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Formal and Applied Linguistics, MFF UK, Malostranské náam. 25, Praha, CZ-11800, Czech Republic
Ondřej Cikhart & Jan Hajič

Authors

Ondřej Cikhart
View author publications
You can also search for this author in PubMed Google Scholar
Jan Hajič
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineerig, Faculty of Applied Sciences, University of West Bohemia in Plzeň, Universitní 22, 306 14, Pizeň, Czech Republic
Václav Matousek , Pavel Mautner & Jana Ocelíková , &
Department of Programming Systems and Communication, Faculty of Informatics, Masaryk University Brno, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cikhart, O., Hajič, J. (1999). Word Sense Disambiguation of Czech Texts. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_20

Download citation

DOI: https://doi.org/10.1007/3-540-48239-3_20
Published: 01 October 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics