Abstract
Generating synthetic databases that capture essential data characteristics of client databases is a common requirement for database vendors. We recently proposed Hydra, a workload-aware and scale-free data regenerator that provides statistical fidelity on the volumetric similarity metric. A limitation, however, is that it suffers poor accuracy on unseen queries. In this paper, we present HF-Hydra (HiFi-Hydra), which extends Hydra to provide better support to unseen queries through (a) careful choices among the candidate synthetic databases and (b) incorporation of metadata constraints. Our experimental study validates the improved fidelity and efficiency of HF-Hydra.
The work of Anupam Sanghi was supported by an IBM PhD Fellowship Award.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arasu, A., Kaushik, R., Li, J.: Data generation using declarative constraints. In: ACM SIGMOD Conference, pp. 685–696 (2011)
Binnig, C., Kossmann, D., Lo, E., Özsu, M.T.: QAGen: generating query-aware test databases. In: ACM SIGMOD Conference, pp. 341–352 (2007)
Chen, C., Twycross, J., Garibaldi, J.M.: A new accuracy measure based on bounded relative error for time series forecasting. PLoS ONE 12(3), e0174202 (2017)
Li, Y., Zhang, R., Yang, X., Zhang, Z., Zhou, A.: Touchstone: generating enormous query-aware test databases. In: USENIX ATC, pp. 575–586 (2018)
Sanghi, A., Sood, R., Haritsa, J.R., Tirthapura, S.: Scalable and dynamic regeneration of big data volumes. In: 21st EDBT Conference, pp. 301–312 (2018)
Sanghi, A., Sood, R., Singh, D., Haritsa, J.R., Tirthapura, S.: HYDRA: a dynamic big data regenerator. PVLDB 11(12), 1974–1977 (2018)
Sanghi, A., Rajkumar, S., Haritsa, J.R.: High fidelity database generators. Technical report TR-2021-01, DSL/CDS, IISc (2021). dsl.cds.iisc.ac.in/publications/report/TR/TR-2021-01.pdf
Acknowledgements
We thank Tarun Kumar Patel and Shadab Ahmed for their valuable inputs in the implementation of this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sanghi, A., Santhanam, R., Haritsa, J.R. (2021). Towards Generating HiFi Databases. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12681. Springer, Cham. https://doi.org/10.1007/978-3-030-73194-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-73194-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73193-9
Online ISBN: 978-3-030-73194-6
eBook Packages: Computer ScienceComputer Science (R0)