skip to main content
10.1145/2595188.2595220acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdatechConference Proceedingsconference-collections
research-article

Data processing and lemmatization in digitized 19th-century Czech texts

Published:19 May 2014Publication History

ABSTRACT

The paper describes the processing of linguistic data obtained through OCR, namely their use for the construction of dictionary databases and subsequent lemmatization. The process is demonstrated on the Czech prints from the 19th century.

References

  1. IMPACT 2011 Project Periodic Report, 5. http://wwwimpact-project.eu/uploads/media/IMPACT_Annual_report_2011_Publishable_summary_01.pdf.Google ScholarGoogle Scholar
  2. Part of the of the Applied Research and Development of National and Cultural Identity Programme (NAKI) funded by the Czech Ministry of Education. For details see http://www.isvav.cz/programmeDetail.do?rowId=DF and http://kramerius-info.nkp.cz/projekt-naki.Google ScholarGoogle Scholar
  3. Schulz, K., Gotscharek, A., Depuydt, K., Bień, J. S., Erjavec, T., Kučera, K., Martinez, I., Mhov, S., Souvay, G. 2011. Cross-language perspective on lexicon building and deployment in IMPACT http://bc.klf.uw.edu.pl/280/Google ScholarGoogle Scholar
  4. Jungmann, J. 1834--1839. Slovník česko-německý (Czech-German dictionary). Praha.Google ScholarGoogle Scholar
  5. Kott, F., Š. 1878--1893. Česko-německý slovník (Czech-German dictionary). Praha. http://kott.ujc.cas.cz/Google ScholarGoogle Scholar
  6. Hujer, O., Smetánka, E., Weingart, M., Havránek, B., Šmilauer, V., Získal, A., (eds.). 1935--1957. Příruční slovník jazyka českého (Desk Dictionary of the Czech Language -- PSJČ). Praha. http://bara.ujc.cas.cz/psjc/Google ScholarGoogle Scholar
  7. Havránek, B., Bělič, J., Helcl, M., Jedlička, A., (eds.). 1960--1971. Slovník spisovného jazyka českého (Dictionary of the literary Czech language -- SSJČ). Praha. http://ssjc.ujc.cas.cz/Google ScholarGoogle Scholar
  8. See www.korpus.cz.Google ScholarGoogle Scholar

Index Terms

  1. Data processing and lemmatization in digitized 19th-century Czech texts

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage
          May 2014
          200 pages
          ISBN:9781450325882
          DOI:10.1145/2595188

          Copyright © 2014 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 May 2014

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          DATeCH '14 Paper Acceptance Rate31of49submissions,63%Overall Acceptance Rate60of86submissions,70%
        • Article Metrics

          • Downloads (Last 12 months)1
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader