Dictionary Compression and Information Source Correction

Németh, Dénes; Lakat, Máté; Szeberényi, Imre

doi:10.1007/978-3-642-12535-5_61

Dictionary Compression and Information Source Correction

Dénes Németh¹⁹,
Máté Lakat¹⁹ &
Imre Szeberényi¹⁹

Conference paper

2097 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5910))

Abstract

This paper introduces a method to compress, store, and search a dictionary of a natural language. The dictionary can be represented as groups of words derived form a stem. We describe how to represent and store a word group in a way that is compact and efficiently searchable. The compression efficiency of the used algorithm highly depends on the quality of the information source. The currently available tools and data sources contain several mistakes, which can be cleaned by the introduced method. The paper also analyzes the efficiency of XML and two binary formats, and proposes two methods: directed acyclic graph transformation and word group regrouping that can be used to increase efficiency.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Buneman, P., Grohe, M., Koch, C.: Path queries on compressed XML. In: Freytag, J.C., et al. (eds.) Proc. VLDB 2003, pp. 141–152. Morgan Kaufmann, San Francisco (2003)
Chapter Google Scholar
Németh, D.: Parallel dictionary compression using grid technologies. In: Lirkov, I., Margenov, S., Waśniewski, J. (eds.) LSSC 2007. LNCS, vol. 4818, pp. 492–499. Springer, Heidelberg (2008)
Chapter Google Scholar
Németh, L., Trón, V., Halácsy, P., Kornai, A., Rung, A., Szakadát, I.: Leveraging the open-source ispell codebase for minority language analysis. In: Proceedings of SALTMIL 2004. European Language Resources Association, pp. 56–59 (2004)
Google Scholar
Viktor, T., Gyögy, G., Péter, H., András, K., László, N., Dániel, V.: Hunmorph: open source word analysis. In: Proceedings of the ACL Workshop on Software, Association for Computational Linguistics, Ann Arbor, Michigan, pp. 77–85 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Budapest University of Technology, Magyar Tudósok körútja 2, H-1117, Budapest, Hungary
Dénes Németh, Máté Lakat & Imre Szeberényi

Authors

Dénes Németh
View author publications
You can also search for this author in PubMed Google Scholar
Máté Lakat
View author publications
You can also search for this author in PubMed Google Scholar
Imre Szeberényi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Parallel Processing, Bulgarian Academy of Sciences, Acad. G. Bonchev, Bl. 25A, 1113, Sofia, Bulgaria
Ivan Lirkov
Institute for Parallel Processing, Bulgarian Academy of Sciences, Acad. G. Bonchev Str. Bl. 25-A, 1113, Sofia, Bulgaria
Svetozar Margenov
Department of Informatics and Mathematical Modelling, Technical University of Denmark, Richard Petersens Plads - Building 321, 2800, Kongens Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Németh, D., Lakat, M., Szeberényi, I. (2010). Dictionary Compression and Information Source Correction. In: Lirkov, I., Margenov, S., Waśniewski, J. (eds) Large-Scale Scientific Computing. LSSC 2009. Lecture Notes in Computer Science, vol 5910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12535-5_61

Download citation

DOI: https://doi.org/10.1007/978-3-642-12535-5_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12534-8
Online ISBN: 978-3-642-12535-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics