Skip to main content

Towards Generating HiFi Databases

  • Conference paper
  • First Online:
Book cover Database Systems for Advanced Applications (DASFAA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12681))

Included in the following conference series:

Abstract

Generating synthetic databases that capture essential data characteristics of client databases is a common requirement for database vendors. We recently proposed Hydra, a workload-aware and scale-free data regenerator that provides statistical fidelity on the volumetric similarity metric. A limitation, however, is that it suffers poor accuracy on unseen queries. In this paper, we present HF-Hydra (HiFi-Hydra), which extends Hydra to provide better support to unseen queries through (a) careful choices among the candidate synthetic databases and (b) incorporation of metadata constraints. Our experimental study validates the improved fidelity and efficiency of HF-Hydra.

The work of Anupam Sanghi was supported by an IBM PhD Fellowship Award.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arasu, A., Kaushik, R., Li, J.: Data generation using declarative constraints. In: ACM SIGMOD Conference, pp. 685–696 (2011)

    Google Scholar 

  2. Binnig, C., Kossmann, D., Lo, E., Özsu, M.T.: QAGen: generating query-aware test databases. In: ACM SIGMOD Conference, pp. 341–352 (2007)

    Google Scholar 

  3. Chen, C., Twycross, J., Garibaldi, J.M.: A new accuracy measure based on bounded relative error for time series forecasting. PLoS ONE 12(3), e0174202 (2017)

    Article  Google Scholar 

  4. Li, Y., Zhang, R., Yang, X., Zhang, Z., Zhou, A.: Touchstone: generating enormous query-aware test databases. In: USENIX ATC, pp. 575–586 (2018)

    Google Scholar 

  5. Sanghi, A., Sood, R., Haritsa, J.R., Tirthapura, S.: Scalable and dynamic regeneration of big data volumes. In: 21st EDBT Conference, pp. 301–312 (2018)

    Google Scholar 

  6. Sanghi, A., Sood, R., Singh, D., Haritsa, J.R., Tirthapura, S.: HYDRA: a dynamic big data regenerator. PVLDB 11(12), 1974–1977 (2018)

    Google Scholar 

  7. Sanghi, A., Rajkumar, S., Haritsa, J.R.: High fidelity database generators. Technical report TR-2021-01, DSL/CDS, IISc (2021). dsl.cds.iisc.ac.in/publications/report/TR/TR-2021-01.pdf

  8. Z3. https://github.com/Z3Prover/z3

Download references

Acknowledgements

We thank Tarun Kumar Patel and Shadab Ahmed for their valuable inputs in the implementation of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anupam Sanghi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sanghi, A., Santhanam, R., Haritsa, J.R. (2021). Towards Generating HiFi Databases. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12681. Springer, Cham. https://doi.org/10.1007/978-3-030-73194-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73194-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73193-9

  • Online ISBN: 978-3-030-73194-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics