Skip to main content

Dictionary Compression and Information Source Correction

  • Conference paper
  • 2097 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5910))

Abstract

This paper introduces a method to compress, store, and search a dictionary of a natural language. The dictionary can be represented as groups of words derived form a stem. We describe how to represent and store a word group in a way that is compact and efficiently searchable. The compression efficiency of the used algorithm highly depends on the quality of the information source. The currently available tools and data sources contain several mistakes, which can be cleaned by the introduced method. The paper also analyzes the efficiency of XML and two binary formats, and proposes two methods: directed acyclic graph transformation and word group regrouping that can be used to increase efficiency.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buneman, P., Grohe, M., Koch, C.: Path queries on compressed XML. In: Freytag, J.C., et al. (eds.) Proc. VLDB 2003, pp. 141–152. Morgan Kaufmann, San Francisco (2003)

    Chapter  Google Scholar 

  2. Németh, D.: Parallel dictionary compression using grid technologies. In: Lirkov, I., Margenov, S., Waśniewski, J. (eds.) LSSC 2007. LNCS, vol. 4818, pp. 492–499. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Németh, L., Trón, V., Halácsy, P., Kornai, A., Rung, A., Szakadát, I.: Leveraging the open-source ispell codebase for minority language analysis. In: Proceedings of SALTMIL 2004. European Language Resources Association, pp. 56–59 (2004)

    Google Scholar 

  4. Viktor, T., Gyögy, G., Péter, H., András, K., László, N., Dániel, V.: Hunmorph: open source word analysis. In: Proceedings of the ACL Workshop on Software, Association for Computational Linguistics, Ann Arbor, Michigan, pp. 77–85 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Németh, D., Lakat, M., Szeberényi, I. (2010). Dictionary Compression and Information Source Correction. In: Lirkov, I., Margenov, S., Waśniewski, J. (eds) Large-Scale Scientific Computing. LSSC 2009. Lecture Notes in Computer Science, vol 5910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12535-5_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12535-5_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12534-8

  • Online ISBN: 978-3-642-12535-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics