Abstract
Semi-structured data is prevalent and typically stored in formats like XML and JSON. The most common type of queries on such data are Content-and-Structure (CAS) queries, and a number of CAS indexes have been developed to speed up these queries. The state-of-the-art is the RCAS index, which properly interleaves content and structure, but does not support insertions of single keys. We propose several insertion techniques that explore the trade-off between insertion and query performance. Our exhaustive experimental evaluation shows that the techniques are efficient and preserve RCAS’s good query performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We measure the cache misses with the perf command on Linux, which relies on the Performance Monitoring Unit (PMU) in modern processors to record hardware events like cache accesses and misses in the CPU.
References
Abramatic, J., Cosmo, R.D., Zacchiroli, S.: Building the universal archive of source code. Commun. ACM 61(10), 29–31 (2018). https://doi.org/10.1145/3183558
Arge, L.: The buffer tree: a technique for designing batched external data structures. Algorithmica 37(1), 1–24 (2003). https://doi.org/10.1007/s00453-003-1021-x
Askitis, N., Zobel, J.: B-tries for disk-based string management. VLDB J. 18(1), 157–179 (2009). https://doi.org/10.1007/s00778-008-0094-1
Di Cosmo, R., Zacchiroli, S.: Software heritage: why and how to preserve software source code. In: iPRES (2017)
Heinz, S., Zobel, J., Williams, H.E.: Burst tries: a fast, efficient data structure for string keys. ACM Trans. Inf. Syst. 20(2), 192–223 (2002). https://doi.org/10.1145/506309.506312
Jagadish, H.V., Narayan, P.P.S., Seshadri, S., Sudarshan, S., Kanneganti, R.: Incremental organization for data recording and warehousing. In: VLDB, pp. 16–25 (1997)
Leis, V., Kemper, A., Neumann, T.: The adaptive radix tree: artful indexing for main-memory databases. In: ICDE, pp. 38–49 (2013). https://doi.org/10.1109/ICDE.2013.6544812
Luo, C., Carey, M.J.: LSM-based storage techniques: a survey. VLDB J. 29(1), 393–418 (2019). https://doi.org/10.1007/s00778-019-00555-y
Mathis, C., Härder, T., Schmidt, K., Bächle, S.: XML indexing and storage: fulfilling the wish list. Comput. Sci. Res. Dev. 30(1), 51–68 (2012). https://doi.org/10.1007/s00450-012-0204-6
Morrison, D.R.: PATRICIA - practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968). https://doi.org/10.1145/321479.321481
Nishimura, S., Yokota, H.: QUILTS: multidimensional data partitioning framework based on query-aware and skew-tolerant space-filling curves. In: SIGMOD, pp. 1525–1537 (2017). https://doi.org/10.1145/3035918.3035934
O’Neil, P.E., Cheng, E., Gawlick, D., O’Neil, E.J.: The log-structured merge-tree (LSM-tree). Acta Informatica 33(4), 351–385 (1996). https://doi.org/10.1007/s002360050048
Orenstein, J.A., Merrett, T.H.: A class of data structures for associative searching. In: PODS 1984, New York, NY, USA, pp. 181–190 (1984). https://doi.org/10.1145/588011.588037
Piatov, D., Helmer, S., Dignös, A.: An interval join optimized for modern hardware. In: ICDE, pp. 1098–1109 (2016). https://doi.org/10.1109/ICDE.2016.7498316
Rousseau, G., Di Cosmo, R., Zacchiroli, S.: Software provenance tracking at the scale of public source code. Empirical Softw. Eng. 25(4), 2930–2959 (2020). https://doi.org/10.1007/s10664-020-09828-5
Severance, D.G., Lohman, G.M.: Differential files: their application to the maintenance of large databases. ACM Trans. Database Syst. 1(3), 256–267 (1976). https://doi.org/10.1145/320473.320484
Wellenzohn, K., Böhlen, M.H., Helmer, S.: Dynamic interleaving of content and structure for robust indexing of semi-structured hierarchical data. In: PVLDB, vol. 13, no. 10, pp. 1641–1653 (2020). https://doi.org/10.14778/3401960.3401963
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wellenzohn, K., Popovic, L., Böhlen, M., Helmer, S. (2021). Inserting Keys into the Robust Content-and-Structure (RCAS) Index. In: Bellatreche, L., Dumas, M., Karras, P., Matulevičius, R. (eds) Advances in Databases and Information Systems. ADBIS 2021. Lecture Notes in Computer Science(), vol 12843. Springer, Cham. https://doi.org/10.1007/978-3-030-82472-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-82472-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82471-6
Online ISBN: 978-3-030-82472-3
eBook Packages: Computer ScienceComputer Science (R0)