Skip to main content

A Framework of Write Optimization on Read-Optimized Out-of-Core Column-Store Databases

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9261))

Abstract

The column-store database features a faster data reading speed and higher data compression efficiency compared with traditional row-based databases. However, optimizing write operations in the column-store database is one of the well-known challenges. Most existing works on write performance optimization focus on main-memory column-store databases. In this work, we investigate optimizing write operation (update and deletion) on out-of-core (OOC, or external memory) column-store databases. We propose a general framework to work for both normal OOC storage or big data storage, such as Hadoop Distributed File System (HDFS). On normal OOC storage, we propose an innovative data storage format called Timestamped Binary Association Table (or TBAT). Based on TBAT, a new update method, called Asynchronous Out-of-Core Update (or AOC Update), is designed to replace the traditional update. On big data storage, we further extend TBAT onto HDFS and propose the Asynchronous Map-Only Update (or AMO Update) to replace the traditional update. Fast selection methods are developed in both contexts to improve data retrieving speed. A significant improvement in speed performance is shown in the extensive experiments when performing write operations on TBAT in normal and Map-Reduce environment.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://dev.mysql.com/doc/refman/5.0/en/datetime.html.

  2. 2.

    https://github.com/YSU-Data-Lab/TBAT-DEXA15.

References

  1. Abadi, D.J., Boncz, P.A., Harizopoulos, S.: Column-oriented database systems. Proc. VLDB Endow. 2(2), 1664–1665 (2009)

    Article  MATH  Google Scholar 

  2. Aiyer, A.S., Bautin, M., Chen, G.J., Damania, P., Khemani, P., Muthukkaruppan, K., Ranganathan, K., Spiegelberg, N., Tang, L., Vaidya, M.: Storage infrastructure behind facebook messages: using HBase at scale. IEEE Data Eng. Bull. 35(2), 4–13 (2012)

    Google Scholar 

  3. Boncz, P.: Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications. Ph.D. thesis, Universiteit van Amsterdam, Amsterdam, The Netherlands, May 2002

    Google Scholar 

  4. Boncz, P., Grust, T., Van Keulen, M., Manegold, S., Rittinger, J., Teubner, J.: Monetdb/xquery: a fast xquery processor powered by a relational engine. In: ACM SIGMOD, pp. 479–490 (2006)

    Google Scholar 

  5. Brill, R.: The Taxir Primer. ERIC, Washington, D.C (1971)

    Google Scholar 

  6. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4:1–4:26 (2008)

    Article  Google Scholar 

  7. Copeland, G.P., Khoshafian, S.N.: A decomposition storage model. In: Proceedings of ACM SIGMOD Record, vol. 14, pp. 268–279. ACM (1985)

    Google Scholar 

  8. Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  9. Estabrook, G.F., Brill, R.C.: The theory of the taxir accessioner. Math. Biosci. 5(3), 327–340 (1969)

    Article  Google Scholar 

  10. Färber, F., Cha, S.K., Primsch, J., Bornhövd, C., Sigg, S., Lehner, W.: SAP HANA database: data management for modern business applications. SIGMOD Rec. 40(4), 45–51 (2012)

    Article  Google Scholar 

  11. Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database - an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)

    Google Scholar 

  12. George, L.: HBase: The Definitive Guide. O’Reilly Media Inc., CA (2011)

    MATH  Google Scholar 

  13. Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003)

    Article  MATH  Google Scholar 

  14. Gluche, D., Grust, T., Mainberger, C., Scholl, M.: Incremental updates for materialized OQL views. In: Bry, François (ed.) DOOD 1997. LNCS, vol. 1341, pp. 52–66. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  15. Khoshafian, S., Copeland, G.P., Jagodis, T., Boral, H., Valduriez, P.: A query processing strategy for the decomposed storage model. In: Proceedings, pp. 636. Order from IEEE Computer Society (1987)

    Google Scholar 

  16. Krueger, J., Grund, M., Tinnefeld, C., Plattner, H., Zeier, A., Faerber, F.: Optimizing write performance for read optimized databases. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 291–305. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  17. Krueger, J., Kim, C., Grund, M., Satish, N., Schwalb, D., Chhugani, J., Plattner, H., Dubey, P., Zeier, A.: Fast updates on read-optimized databases using multi-core cpus. Proc. VLDB Endow. 5(1), 61–72 (2011)

    Article  Google Scholar 

  18. Ladwig, G., Harth, A.: Cumulusrdf: linked data management on nested key-value stores. In: The 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2011), p. 30 (2011)

    Google Scholar 

  19. Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., Bear, C.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)

    Article  Google Scholar 

  20. White, T.: Hadoop: The Definitive Guide, 2nd edn. O’Reilly, CA (2010)

    Google Scholar 

  21. Zukowski, M., Nes, N., Boncz, P.: Dsm vs. nsm: Cpu performance tradeoffs in block-oriented query processing. In: DaMoN 2008, pp. 47–54. ACM, New York (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Yu, F., Hou, WC. (2015). A Framework of Write Optimization on Read-Optimized Out-of-Core Column-Store Databases. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22849-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22848-8

  • Online ISBN: 978-3-319-22849-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics