Skip to main content
Log in

Efficient time-interval data extraction in MVCC-based RDBMS

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Account reconciliation is the core business in banks and game companies. It regularly examines the account balance with the bank or expense statement for every user and reports the daily, weekly, or monthly balance. Once an account imbalance occurs, it is necessary to efficiently trace the transactions that possibly destroy the account balances. To help efficiently trace this kind of transactions, in this paper, we investigate the problem of doing efficient time-interval data extraction in MVCC-based RDBMS, i.e., extracting the incremental data that are valid between a given time interval in MVCC-based RDBMS. To this end, we propose a snapshot-based method to extract incremental data based on the fact that each record is inherently associated with lifetime, indicating whether the record can be accessed or not for a given time interval. We elaborate how to integrate our method into MySQL, an open-sourced RDBMS, and propose a declarative way to fetch the incremental data. Several optimization techniques are proposed to boost the extraction performance. Extensive experiments are conducted over the standardized Sysbench benchmark to show that our proposed method is robust and efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13

Similar content being viewed by others

References

  1. Bernstein, P.A., Goodman, N.: Concurrency control in distributed database systems. ACM Comput. Surv. (CSUR) 13(2), 185–221 (1981)

    Article  MathSciNet  Google Scholar 

  2. Cahill, M.J., Röhm, U., Fekete, A.D.: Serializable isolation for snapshot databases. ACM Trans. Datab. Syst. (TODS) 34(4), 20 (2009)

    Google Scholar 

  3. Canal. https://github.com/alibaba/canal

  4. Doan, A., Naughton, J. F., Ramakrishnan, R., Baid, A., Chai, X., Chen, F., Chen, T., Chu, E., DeRose, P., Gao, B., et al.: Information extraction challenges in managing unstructured data. ACM SIGMOD Rec. 37(4), 14–20 (2009)

    Article  Google Scholar 

  5. Labio, W., Garcia-Molina, H.: Efficient Snapshot Differential Algorithms in Data Warehousing. Tech. rep., Stanford InfoLab (1996)

  6. Li, H., Feng, Y., Fan, P.: The art of Database Transaction Processiong: Transaction Management and Concurrency Control. China Machine Press (2017)

  7. Lu, W., Fung, G.P.C., Du, X., Zhou, X., Chen, L., Deng, K.: Approximate entity extraction in temporal databases. World Wide Web 14(2), 157–186 (2011)

    Article  Google Scholar 

  8. Lu, W., Hou, J., Yan, Y., Zhang, M., Du, X., Moscibroda, T.: MSQL: efficient similarity search in metric spaces using SQL. VLDB J. 26(6), 829–854 (2017)

    Article  Google Scholar 

  9. Ma, K., Yang, B.: Log-based change data capture from schema-free document stores using mapreduce. In: 2015 International Conference on Cloud Technologies and Applications (CloudTech), pp. 1–6 (2015).

  10. McWherter, D.T., Schroeder, B., Ailamaki, A., Harchol-Balter, M.: Priority mechanisms for OLTP and transactional Web applications. In: ICDE. IEEE Computer Society, pp. 535–546 (2004)

  11. Meehan, J., Tatbul, N., Zdonik, S., Aslantas, C., Cetintemel, U., Du, J., Kraska, T., Madden, S., Maier, D., Pavlo, A., Stonebraker, M., Tufte, K., Wang, H.: S-store: Streaming meets transaction processing. Proc. VLDB Endow. 8(13), 2134–2145 (2015)

    Article  Google Scholar 

  12. Melnik, S., Gubarev, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T.: Dremel: Interactive analysis of Web-scale datasets. Proc. VLDB Endow. 3(1-2), 330–339 (2010)

    Article  Google Scholar 

  13. Ports, D.R.K., Grittner, K.: Serializable snapshot isolation in postgresql. Proc. VLDB Endow. 5, 1850–1861 (2012)

    Article  Google Scholar 

  14. QQ. https://im.qq.com

  15. Ram, P., Do, L.: Extracting delta for incremental data warehouse maintenance. In: Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073), pp. 220–229 (2000).

  16. Reed, D. P.: Naming and Synchronization in a Decentralized Computer System. Ph.D. thesis Massachusetts Institute of Technology (1978)

  17. Revilak, S., O’Neil, P., O’Neil, E.: Precisely serializable snapshot isolation (pssi). In: 2011 IEEE 27th International Conference on Data Engineering, pp. 482–493 (2011)

  18. Stonebraker, M.: The design of the postgres storage system. In: Proceedings of the 13th International Conference on Very Large Data Bases, VLDB ’87, pp 289–300. Morgan Kaufmann Publishers Inc., San Francisco (1987)

  19. Stonebraker, M., Rowe, L.A., Hirohama, M.: The implementation of postgres. IEEE Trans. Knowl. Data Eng. 2(1), 125–142 (1990)

    Article  Google Scholar 

  20. Sysbench Benchmark. https://github.com/akopytov/sysbench

  21. Tencent Distributed SQL System (TDSQL). http://tdsql.org

  22. WeChat. https://weixin.qq.com

  23. Wu, S, Ren, W, Yu, C, Chen, G, Zhang, D, Zhu, J: Personal recommendation using deep recurrent neural networks in NetEase. In: 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016. pp. 1218–1229 (2016)

  24. Yabandeh, M., Gómez Ferro, D.: A critique of snapshot isolation. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp. 155–168. ACM (2012)

  25. Zhang, C., Sterck, H.D.: Supporting multi-row distributed transactions with global snapshot isolation using bare-bones hbase. In: 2010 11th IEEE/ACM International Conference on Grid Computing, pp. 177–184 (2010)

  26. Zhang, D., Li, Y., Cao, X., Shao, J., Shen, H.T.: Augmented keyword search on spatial entity databases. VLDB J. https://doi.org/10.1007/s00778-018-0497-6 (2018)

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank the anonymous reviewers for their valuable comments. This work was supported by the National Natural Science Foundation of China (61502504, 61732014) and the Tencent Research Grant for Renmin University of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Lu.

Additional information

This article belongs to the Topical Collection: Special Issue on Web and Big Data

Guest Editors: Junjie Yao, Bin Cui, Christian S. Jensen, and Zhe Zhao

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Zhao, Z., Cheng, Y. et al. Efficient time-interval data extraction in MVCC-based RDBMS. World Wide Web 22, 2633–2653 (2019). https://doi.org/10.1007/s11280-018-0552-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-018-0552-7

Keywords

Navigation