Skip to main content

Storage and Recreation Trade-Off for Multi-version Data Management

  • Conference paper
  • First Online:
Book cover Web and Big Data (APWeb-WAIM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10988))

  • 1601 Accesses

Abstract

With the tremendous development of data acquisition technology, massive observation data have been accumulated in scientific disciplines. As the difference between the successive observations only changes slightly, it is critical to utilize multi-version data management technology to compress data to minimize both storage and recreation. However, the existing work on this field only optimizes the total storage and recreation costs, but ignores the recreation cost of some special versions. Consequently, in this paper, we investigate the trade-off among all of three metrics, including total storage cost, total recreation cost, and the maximum recreation cost for each version. We formulate two problems, including (1) discover a storage plan to lower the total recreation and the individual recreation if the total storage is limited; (2) find a storage plan to minimize the total storage with restricted total recreation and individual recreation. To solve above problems, we model all versions with a directed graph and then devise two efficient algorithms based on spanning tree. A series of experiments indicate that our proposals are effective and efficient in dealing with the problems.

Supported by the National Key Research and Development Program of China (2016YFB1000905), NSFC (61532021, U1501252, U1401256 and 61402180).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. GTgraph. http://www.cse.psu.edu/~kxm85/software/GTgraph/

  2. Large Synoptic Survey Telescope. http://www.lsst.org/

  3. Baumann, P.: Standardizing big earth datacubes. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 67–73. IEEE (2017)

    Google Scholar 

  4. Bhattacherjee, S., Chavan, A., Huang, S., Deshpande, A., Parameswaran, A.: Principles of dataset versioning: exploring the recreation/storage tradeoff. VLDB Endow. 8(12), 1346–1357 (2015)

    Article  Google Scholar 

  5. Chan, T.N., Yiu, M.L., Hua, K.A.: Efficient sub-window nearest neighbor search on matrix. IEEE Trans. Knowl. Data Eng. 29(4), 784–797 (2017)

    Article  Google Scholar 

  6. Chavan, A., Deshpande, A.: DEX: query execution in a delta-based storage system. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 171–186. ACM (2017)

    Google Scholar 

  7. Cormen, T.H.: Introduction to Algorithms. MIT press, Cambridge (2009)

    MATH  Google Scholar 

  8. Cudré-Mauroux, P., Kimura, H., Lim, K.T., Rogers, J., Simakov, R., Soroush, E., Velikhov, P., Wang, D.L., Balazinska, M., Becla, J., et al.: A demonstration of SciDB: a science-oriented DBMS. VLDB Endow. 2(2), 1534–1537 (2009)

    Article  Google Scholar 

  9. Gosain, A., Saroha, K.: Storage structure for handling schema versions in temporal data warehouses. In: Sa, P.K., Sahoo, M.N., Murugappan, M., Wu, Y., Majhi, B. (eds.) Progress in Intelligent Computing Techniques: Theory, Practice, and Applications. AISC, vol. 518, pp. 501–511. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-3373-5_50

    Chapter  Google Scholar 

  10. Li, J., Kawashima, H., Tatebe, O.: Efficient window aggregate method on array database system. J. Inf. Process. 24(6), 867–877 (2016)

    Google Scholar 

  11. Prim, R.C.: Shortest connection networks and some generalizations. Bell Labs Tech. J. 36(6), 1389–1401 (1957)

    Article  Google Scholar 

  12. Rusu, F., Cheng, Y.: A survey on array storage, query languages, and systems. arXiv preprint arXiv:1302.0103 (2013)

  13. Seering, A., Cudre-Mauroux, P., Madden, S., Stonebraker, M.: Efficient versioning for scientific array databases. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 1013–1024. IEEE (2012)

    Google Scholar 

  14. Soroush, E., Balazinska, M.: Time travel in a scientific array database. In: 29th Data Engineering (ICDE), pp. 98–109. IEEE (2013)

    Google Scholar 

  15. Soroush, E., Balazinska, M., Krughoff, S., Connolly, A.: Efficient iterative processing in the SciDB parallel array engine. In: 27th International Conference on Scientific and Statistical Database Management, p. 39. ACM (2015)

    Google Scholar 

  16. Soroush, E., Balazinska, M., Wang, D.: ArrayStore: a storage manager for complex parallel array processing. In: 2011 ACM SIGMOD International Conference on Management of data, pp. 253–264. ACM (2011)

    Google Scholar 

  17. Tansel, A.U., Clifford, J., Gadia, S.K., Jajodia, S., Segev, A., Snodgrass, R.T. (eds.): Temporal Databases: Theory, Design, and Implementation. Benjamin/Cummings, San Francisco

    Google Scholar 

  18. Zhang, Y., Xu, F., Frise, E., Wu, S., Yu, B., Xu, W.: DataLab: a version data management and analytics system. In: 2nd International Workshop on BIG Data Software Engineering, pp. 12–18. ACM (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cheqing Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Y., Liu, H., Jin, C., Guo, Y. (2018). Storage and Recreation Trade-Off for Multi-version Data Management. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10988. Springer, Cham. https://doi.org/10.1007/978-3-319-96893-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96893-3_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96892-6

  • Online ISBN: 978-3-319-96893-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics