Abstract
A keyword dictionary is an associative array with string keys. Although it is a classical data structure, recent applications require the management of massive string data using the keyword dictionary in main memory. Therefore, its space-efficient implementation is very important. If limited to static applications, there are a number of very compact dictionary implementations; however, existing dynamic implementations consume much larger space than static ones. In this paper, we propose a new practical implementation of space-efficient dynamic keyword dictionaries. Our implementation uses path decomposition, which is proposed for constructing cache-friendly trie structures, for dynamic construction in compact space with a different approach. Using experiments on real-world datasets, we show that our implementation can construct keyword dictionaries in spaces up to 2.8x smaller than the most compact existing dynamic implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
All the results are provided at https://github.com/kamp78/dynpdt/wiki.
References
Askitis, N., Sinha, R.: Engineering scalable, cache and space efficient tries for strings. VLDB J. 19(5), 633–660 (2010)
Askitis, N., Zobel, J.: Cache-conscious collision resolution in string hash tables. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 91–102. Springer, Heidelberg (2005). doi:10.1007/11575832_11
Baskins, D.: Judy IV Shop Manual (2002)
Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Ubicrawler: a scalable fully distributed web crawler. Softw. Pract. Exp. 34(8), 711–726 (2004)
Darragh, J.J., Cleary, J.G., Witten, I.H.: Bonsai: a compact representation of trees. Softw. Pract. Exp. 23(3), 277–291 (1993)
Ferragina, P., Grossi, R., Gupta, A., Shah, R., Vitter, J.S.: On searching compressed string collections cache-obliviously. In: Proceedings of 27th Symposium on Principles of Database Systems (PODS), pp. 181–190 (2008)
González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Poster Proceedings of 4th Workshop on Experimental and Efficient Algorithms (WEA), pp. 27–38 (2005)
Grossi, R., Ottaviano, G.: Fast compressed tries through path decompositions. ACM J. Exp. Algorithmics 19(1) (2014). Article 1.8
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. Web Semant. Sci. Serv. Agents World Wide Web 3(2), 158–182 (2005)
Hirai, J., Raghavan, S., Garcia-Molina, H., Paepcke, A.: WebBase: a repository of web pages. Comput. Netw. 33(1), 277–293 (2000)
Hsu, B.J.P., Ottaviano, G.: Space-efficient data structures for top-k completion. In: Proceedings of 22nd International Conference on World Wide Web (WWW), pp. 583–594 (2013)
Kanda, S., Morita, K., Fuketa, M.: Compressed double-array tries for string dictionaries supporting fast lookup. Knowl. Inf. Syst. 51(3), 1023–1042 (2017)
Kanda, S., Morita, K., Fuketa, M.: Practical string dictionary compression using string dictionary encoding. In: Proceedings of 3rd International Conference on Big Data Innovations and Applications (Innovate-Data), pp. 1–8 (2017)
Knuth, D.E.: The Art of Computer Programming: Volume 3: Sorting and Searching, 2nd edn. Addison Wesley, Redwood City (1998)
Leis, V., Kemper, A., Neumann, T.: The adaptive radix tree: ARTful indexing for main-memory databases. In: Proceedings of IEEE 29th International Conference on Data Engineering (ICDE), pp. 38–49 (2013)
Martínez-Prieto, M.A., Brisaboa, N., Cánovas, R., Claude, F., Navarro, G.: Practical compressed string dictionaries. Inf. Syst. 56, 73–108 (2016)
Mavlyutov, R., Wylot, M., Cudre-Mauroux, P.: A comparison of data structures to manage URIs on the web of data. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 137–151. Springer, Cham (2015). doi:10.1007/978-3-319-18818-8_9
Morrison, D.R.: PATRICIA: practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)
Poyias, A., Raman, R.: Improved practical compact dynamic tries. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds.) SPIRE 2015. LNCS, vol. 9309, pp. 324–336. Springer, Cham (2015). doi:10.1007/978-3-319-23826-5_31
Takagi, T., Inenaga, S., Sadakane, K., Arimura, H.: Packed compact tries: a fast and efficient data structure for online string processing. In: Mäkinen, V., Puglisi, S.J., Salmela, L. (eds.) IWOCA 2016. LNCS, vol. 9843, pp. 213–225. Springer, Cham (2016). doi:10.1007/978-3-319-44543-4_17
Williams, H.E., Zobel, J.: Compressing integers for fast file access. Comput. J. 42(3), 193–201 (1999)
Yoshinaga, N., Kitsuregawa, M.: A self-adaptive classifier for efficient text-stream processing. In: Proceedings of 24th International Conference on Computational Linguistics (COLING), pp. 1091–1102 (2014)
Acknowledgments
This work was supported by JSPS KAKENHI Grant Number 17J07555. We would like to thank Editage (www.editage.jp) for English language editing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kanda, S., Morita, K., Fuketa, M. (2017). Practical Implementation of Space-Efficient Dynamic Keyword Dictionaries. In: Fici, G., Sciortino, M., Venturini, R. (eds) String Processing and Information Retrieval. SPIRE 2017. Lecture Notes in Computer Science(), vol 10508. Springer, Cham. https://doi.org/10.1007/978-3-319-67428-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-67428-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67427-8
Online ISBN: 978-3-319-67428-5
eBook Packages: Computer ScienceComputer Science (R0)