Skip to main content

Practical Implementation of Space-Efficient Dynamic Keyword Dictionaries

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2017)

Abstract

A keyword dictionary is an associative array with string keys. Although it is a classical data structure, recent applications require the management of massive string data using the keyword dictionary in main memory. Therefore, its space-efficient implementation is very important. If limited to static applications, there are a number of very compact dictionary implementations; however, existing dynamic implementations consume much larger space than static ones. In this paper, we propose a new practical implementation of space-efficient dynamic keyword dictionaries. Our implementation uses path decomposition, which is proposed for constructing cache-friendly trie structures, for dynamic construction in compact space with a different approach. Using experiments on real-world datasets, we show that our implementation can construct keyword dictionaries in spaces up to 2.8x smaller than the most compact existing dynamic implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    All the results are provided at https://github.com/kamp78/dynpdt/wiki.

References

  1. Askitis, N., Sinha, R.: Engineering scalable, cache and space efficient tries for strings. VLDB J. 19(5), 633–660 (2010)

    Article  Google Scholar 

  2. Askitis, N., Zobel, J.: Cache-conscious collision resolution in string hash tables. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 91–102. Springer, Heidelberg (2005). doi:10.1007/11575832_11

    Chapter  Google Scholar 

  3. Baskins, D.: Judy IV Shop Manual (2002)

    Google Scholar 

  4. Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Ubicrawler: a scalable fully distributed web crawler. Softw. Pract. Exp. 34(8), 711–726 (2004)

    Google Scholar 

  5. Darragh, J.J., Cleary, J.G., Witten, I.H.: Bonsai: a compact representation of trees. Softw. Pract. Exp. 23(3), 277–291 (1993)

    Google Scholar 

  6. Ferragina, P., Grossi, R., Gupta, A., Shah, R., Vitter, J.S.: On searching compressed string collections cache-obliviously. In: Proceedings of 27th Symposium on Principles of Database Systems (PODS), pp. 181–190 (2008)

    Google Scholar 

  7. González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Poster Proceedings of 4th Workshop on Experimental and Efficient Algorithms (WEA), pp. 27–38 (2005)

    Google Scholar 

  8. Grossi, R., Ottaviano, G.: Fast compressed tries through path decompositions. ACM J. Exp. Algorithmics 19(1) (2014). Article 1.8

    Google Scholar 

  9. Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. Web Semant. Sci. Serv. Agents World Wide Web 3(2), 158–182 (2005)

    Article  Google Scholar 

  10. Hirai, J., Raghavan, S., Garcia-Molina, H., Paepcke, A.: WebBase: a repository of web pages. Comput. Netw. 33(1), 277–293 (2000)

    Article  Google Scholar 

  11. Hsu, B.J.P., Ottaviano, G.: Space-efficient data structures for top-k completion. In: Proceedings of 22nd International Conference on World Wide Web (WWW), pp. 583–594 (2013)

    Google Scholar 

  12. Kanda, S., Morita, K., Fuketa, M.: Compressed double-array tries for string dictionaries supporting fast lookup. Knowl. Inf. Syst. 51(3), 1023–1042 (2017)

    Article  Google Scholar 

  13. Kanda, S., Morita, K., Fuketa, M.: Practical string dictionary compression using string dictionary encoding. In: Proceedings of 3rd International Conference on Big Data Innovations and Applications (Innovate-Data), pp. 1–8 (2017)

    Google Scholar 

  14. Knuth, D.E.: The Art of Computer Programming: Volume 3: Sorting and Searching, 2nd edn. Addison Wesley, Redwood City (1998)

    Google Scholar 

  15. Leis, V., Kemper, A., Neumann, T.: The adaptive radix tree: ARTful indexing for main-memory databases. In: Proceedings of IEEE 29th International Conference on Data Engineering (ICDE), pp. 38–49 (2013)

    Google Scholar 

  16. Martínez-Prieto, M.A., Brisaboa, N., Cánovas, R., Claude, F., Navarro, G.: Practical compressed string dictionaries. Inf. Syst. 56, 73–108 (2016)

    Article  Google Scholar 

  17. Mavlyutov, R., Wylot, M., Cudre-Mauroux, P.: A comparison of data structures to manage URIs on the web of data. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 137–151. Springer, Cham (2015). doi:10.1007/978-3-319-18818-8_9

  18. Morrison, D.R.: PATRICIA: practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)

    Article  Google Scholar 

  19. Poyias, A., Raman, R.: Improved practical compact dynamic tries. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds.) SPIRE 2015. LNCS, vol. 9309, pp. 324–336. Springer, Cham (2015). doi:10.1007/978-3-319-23826-5_31

    Chapter  Google Scholar 

  20. Takagi, T., Inenaga, S., Sadakane, K., Arimura, H.: Packed compact tries: a fast and efficient data structure for online string processing. In: Mäkinen, V., Puglisi, S.J., Salmela, L. (eds.) IWOCA 2016. LNCS, vol. 9843, pp. 213–225. Springer, Cham (2016). doi:10.1007/978-3-319-44543-4_17

    Chapter  Google Scholar 

  21. Williams, H.E., Zobel, J.: Compressing integers for fast file access. Comput. J. 42(3), 193–201 (1999)

    Article  Google Scholar 

  22. Yoshinaga, N., Kitsuregawa, M.: A self-adaptive classifier for efficient text-stream processing. In: Proceedings of 24th International Conference on Computational Linguistics (COLING), pp. 1091–1102 (2014)

    Google Scholar 

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number 17J07555. We would like to thank Editage (www.editage.jp) for English language editing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shunsuke Kanda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kanda, S., Morita, K., Fuketa, M. (2017). Practical Implementation of Space-Efficient Dynamic Keyword Dictionaries. In: Fici, G., Sciortino, M., Venturini, R. (eds) String Processing and Information Retrieval. SPIRE 2017. Lecture Notes in Computer Science(), vol 10508. Springer, Cham. https://doi.org/10.1007/978-3-319-67428-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67428-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67427-8

  • Online ISBN: 978-3-319-67428-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics