Skip to main content
Log in

A compression method of double-array structures using linear functions

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A trie is one of the data structures for keyword search algorithms and is utilized in natural language processing, reserved words search for compilers and so on. The double-array and LOUDS are efficient representation methods for the trie. The double-array provides fast traversal at time complexity of O(1), but the space usage of the double-array is larger than that of LOUDS. LOUDS is a succinct data structure with bit-string, and its space usage is extremely compact. However, its traversal speed is not so fast. This paper presents a new compression method of the double-array with keeping the retrieval speed. Our new method compresses the double-array by dividing the double-array into blocks and by using linear functions. Experimental results for varied keywords show that our new method reduced space usage of the double-array up to about 44 %, and the retrieval speed of the new method was 9–14 times faster than that of LOUDS. Moreover, the results show that the construction speed of the new method was faster than that of the conventional method for a large keyword set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The base of logarithm is 2 throughout this paper.

  2. Darts: Double-ARray Trie System. http://chasen.org/~taku/software/darts/.

  3. Darts-clone: A clone of the Darts. https://code.google.com/p/darts-clone/.

  4. ChaSen legacy: an old morphological analyzer. http://chasen-legacy.sourceforge.jp/.

  5. MeCab: Yet Another Part-of-Speech and Morphological Analyzer. http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html.

  6. In this paper, “traversal” in the trie means a transition from a parent node to a child node.

  7. Strictly speaking, the number of blocks is \(\lceil (n+m)/\textit{bsize}\rceil \), but the ceil function is omitted for simplicity. Likewise, the extra space for rank / select operations is calculated in LOUDS.

  8. Tx: Succinct Trie Data structure. https://code.google.com/p/tx-trie/.

  9. WordNet 3.0. http://wordnetcode.princeton.edu/3.0/WordNet-3.0.tar.gz.

  10. jawiki dump progress on 20150118. http://dumps.wikimedia.org/jawiki/20150118/jawiki-20150118-all-titles-in-ns0.gz.

  11. enwiki dump progress on 20150205. http://dumps.wikimedia.org/enwiki/20150205/enwiki-20150205-all-titles-in-ns0.gz.

References

  1. Aho AV, Corasick MJ (1975) Efficient string matching: an aid to bibliographic search. Commun ACM 18(6):333–340

    Article  MathSciNet  MATH  Google Scholar 

  2. Aho AV, Lam MS, Sethi R et al (2006) Compilers: principles, techniques, and tools, chaps 3 and 4, 2nd edn. Addison-Wesley, Boston

  3. Aoe J (1989) An efficient digital search algorithm by using a double-array structure. IEEE Trans Softw Eng 15(9):1066–1077

    Article  Google Scholar 

  4. Aoe J, Morimoto K, Sato T (1992) An efficient implementation of trie structures. Softw Pract Exp 22(9):695–721

    Article  Google Scholar 

  5. Aoe J, Morimoto K, Shishibori M et al (1996) A trie compaction algorithm for a large set of keys. IEEE Trans Knowl Data Eng 8(3):476–491

    Article  Google Scholar 

  6. Arroyuelo D, Cnovas R, Navarro G et al (2010) Succinct trees in practice. In: ALENEX, pp 84–97

  7. Baeza-Yates RA, Gonnet GH (1996) Fast text searching for regular expressions or automaton searching on tries. J ACM 43(6):915–936

    Article  MathSciNet  MATH  Google Scholar 

  8. Benoit D, Demaine ED, Munro JI et al (2005) Representing trees of higher degree. Algorithmica 43:275–292

    Article  MathSciNet  MATH  Google Scholar 

  9. Brain M, Tharp A (1994) Using tries to eliminate pattern collisions in perfect hashing. IEEE Trans Knowl Data Eng 6(2):239–247

    Article  Google Scholar 

  10. Delpratt O, Rahman N, Raman R (2006) Engineering the louds succinct tree representation. Proc WEA 2006:134–145

    MATH  Google Scholar 

  11. Fredkin E (1960) Trie memory. Commun ACM 3(9):490–499

    Article  Google Scholar 

  12. Fu J, Hagsand O, Karlsson G (2007) Improving and analyzing LC-trie performance for IP-address lookup. J Netw 2(3):18–27

    Google Scholar 

  13. Fuketa M, Kitagawa H, Ogawa T et al (2014) Compression of double array structures for fixed length keywords. Inf Process Manag 50(5):796–806

    Article  Google Scholar 

  14. Huang K, Xie G, Li Y, et al (2011) Offset addressing approach to memory-efficient IP address lookup. In: Proceedings of the IEEE INFOCOM, pp 306–310

  15. Jacobson G (1989) Space-efficient static trees and graphs. In: 30th annual symposium on foundations of computer science, pp 549–554

  16. Jansson J, Sadakane K, Sung W (2007) Ultra-succinct representation of ordered trees. In: ACM–SIAM symposium on discrete algorithms, pp 575–584

  17. Liu H, Nuo M, Ma L et al (2011) Compression methods by code mapping and code dividing for Chinese dictionary stored in a double-array trie. In: IJCNLP, pp 1189–1197

  18. Morita K, Fuketa M, Yamakawa Y et al (2001) Fast insertion methods of a double-array structure. Softw Pract Exp 31(1):43–65

    Article  MathSciNet  MATH  Google Scholar 

  19. Morita K, Atlam E, Fuketa M et al (2004) Fast and compact updating algorithms of a double-array structure. Inf Sci 159(12):53–67

    Article  MathSciNet  Google Scholar 

  20. Munro J, Raman V (2001) Succinct representation of balanced parentheses and static trees. SIAM J Comput 31:762–776

    Article  MathSciNet  MATH  Google Scholar 

  21. Navarro G (2004) Indexing text using the zivlempel trie. J Discret Algorithms 2(1):87–114

    Article  MATH  Google Scholar 

  22. Peterson J (1980) Computer programs for spelling correction: an experiment in program design. Springer, Berlin

    Google Scholar 

  23. Sadakane K, Navarro G (2010) Fully-functional succinct trees. In: Proceedings of the 21st annual ACM–SIAM symposium on discrete algorithms, pp 134–149

  24. Srinivasan V, Varghese G, Suri S et al (1998) Fast and scalable layer four switching. In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communication (ACM SIGCOMM ’98), pp 191–202

  25. Yang L, Xu L, Shi Z (2012) An enhanced dynamic hash trie algorithm for lexicon search. Enterp Inf Syst 6(4):419–432

    Article  MathSciNet  Google Scholar 

  26. Yata S, Oono M, Morita K et al (2007a) An efficient deletion method for a minimal prefix double array. Softw Pract Exp 37(5):523–534

    Article  Google Scholar 

  27. Yata S, Oono M, Morita K et al (2007b) A compact static double-array keeping character codes. Inf Process Manag 43(1):237–247

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shunsuke Kanda.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kanda, S., Fuketa, M., Morita, K. et al. A compression method of double-array structures using linear functions. Knowl Inf Syst 48, 55–80 (2016). https://doi.org/10.1007/s10115-015-0873-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0873-0

Keywords

Navigation