Skip to main content

Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

  • Conference paper
  • First Online:
Combinatorial Algorithms (IWOCA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9843))

Included in the following conference series:

Abstract

We present a new data structure called the packed compact trie (packed c-trie) which stores a set S of k strings of total length n in \(n \log \sigma + O(k \log n)\) bits of space and supports fast pattern matching queries and updates, where \(\sigma \) is the alphabet size. Assume that \(\alpha = \log _\sigma n\) letters are packed in a single machine word on the standard word RAM model, and let f(kn) denote the query and update times of the dynamic predecessor/successor data structure of our choice which stores k integers from universe [1, n] in \(O(k \log n)\) bits of space. Then, given a string of length m, our packed c-tries support pattern matching queries and insert/delete operations in \(O(\frac{m}{\alpha } f(k,n))\) worst-case time and in \(O(\frac{m}{\alpha } + f(k,n))\) expected time. Our experiments show that our packed c-tries are faster than the standard compact tries (a.k.a. Patricia trees) on real data sets. We also discuss applications of our packed c-tries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The \(O(\log m)\) expected bound for insertion/deletion stated in [4] assumes that the prefix search for the string has already been performed.

  2. 2.

    For sufficiently long patterns of length \(m = \varTheta (n)\), our packed c-trie achieves worst-case sublinear o(n) time while the wexponential search tree requires O(n) time.

  3. 3.

    In the literature the locus is represented by (uch) where c is the first letter of the label of e. Since our packed c-trie does not maintain a search structure for branches, we represent the locus directly on e.

  4. 4.

    Since \(kM \ge n\) always hods, the n term is hidden in the time complexity.

  5. 5.

    Since all the factors of the LZDF are distinct, \(k = O(\frac{n}{\log _\sigma n})\) holds [22].

  6. 6.

    Pizza&Chili Corpus, http://pizzachili.dcc.uchile.cl.

  7. 7.

    Laboratory for webalgorithmics, uk-2005.urls.gz, http://law.di.unimi.it/datasets.php.

  8. 8.

    jawiki, https://dumps.wikimedia.org/jawiki/.

References

  1. Alstrup, S., Gavoille, C., Kaplan, H., Rauhe, T.: Nearest common ancestors: a survey and a new distributed algorithm. Theory Comp. Sys. 37, 441–456 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  2. Andersson, A., Thorup, M.: Dynamic ordered sets with exponential search trees. J. ACM 54(3), 13 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  3. Beame, P., Fich, F.E.: Optimal bounds for the predecessor problem and related problems. J. Comput. Syst. Sci. 65(1), 38–72 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. Belazzougui, D., Boldi, P., Vigna, S.: Dynamic Z-Fast tries. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 159–172. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Ben-Kiki, O., Bille, P., Breslauer, D., Gasieniec, L., Grossi, R., Weimann, O.: Optimal packed string matching. In: FSTTCS 2011, pp. 423–432 (2011)

    Google Scholar 

  6. Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the STOC 2004, pp. 91–100 (2004)

    Google Scholar 

  7. Cole, R., Hariharan, R.: Dynamic LCA queries on trees. SIAM J. Comput. 34(4), 894–923 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  8. Ferragina, P., Grossi, R.: The string B-tree: a new data structure for string search in external memory and its applications. J. ACM 46(2), 236–280 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  9. Fischer, J., Gawrychowski, P.: Alphabet-dependent string searching with wexponential search trees. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 160–171. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  10. Fredman, M.L., Willard, D.E.: Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci. 47(3), 424–436 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  11. Goto, K., Bannai, H., Inenaga, S., Takeda, M.: LZD factorization: simple and practical online grammar compression with variable-to-fixed encoding. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 219–230. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  12. Hon, W.-K., Lam, T.-W., Shah, R., Tam, S.-L., Vitter, J.S.: Succinct index for dynamic dictionary matching. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1034–1043. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  13. Inenaga, S., Takeda, M.: On-line linear-time construction of word suffix trees. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 60–71. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Jansson, J., Sadakane, K., Sung, W.: Linked dynamic tries with applications to LZ-compression in sublinear time and space. Algorithmica 71(4), 969–988 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  15. K"arkk"ainen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  16. Morrison, D.R.: PATRICIA: practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)

    Article  Google Scholar 

  17. Uemura, T., Arimura, H.: Sparse and truncated suffix trees on variable-length codes. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 246–260. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  18. Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 13(3), 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  19. Weiner, P.: Linear pattern-matching algorithms. In: Proceedings of 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

  20. Willard, D.E.: Log-logarithmic worst-case range queries are possible in space \(\varTheta (N)\). Inf. Process. Lett. 17, 81–84 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  21. Willard, D.E.: New trie data sturucture which support very fast search operations. J. Comput. Syst. Sci. 28, 379–394 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  22. Ziv, J., Lempel, A.: Compression of individual sequences via variable-length coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takuya Takagi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Takagi, T., Inenaga, S., Sadakane, K., Arimura, H. (2016). Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing. In: Mäkinen, V., Puglisi, S., Salmela, L. (eds) Combinatorial Algorithms. IWOCA 2016. Lecture Notes in Computer Science(), vol 9843. Springer, Cham. https://doi.org/10.1007/978-3-319-44543-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44543-4_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44542-7

  • Online ISBN: 978-3-319-44543-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics