Abstract
The column-store database features a faster data reading speed and higher data compression efficiency compared with traditional row-based databases. However, optimizing write operations in the column-store database is one of the well-known challenges. Most existing works on write performance optimization focus on main-memory column-store databases. In this work, we investigate optimizing write operation (update and deletion) on out-of-core (OOC, or external memory) column-store databases. We propose a general framework to work for both normal OOC storage or big data storage, such as Hadoop Distributed File System (HDFS). On normal OOC storage, we propose an innovative data storage format called Timestamped Binary Association Table (or TBAT). Based on TBAT, a new update method, called Asynchronous Out-of-Core Update (or AOC Update), is designed to replace the traditional update. On big data storage, we further extend TBAT onto HDFS and propose the Asynchronous Map-Only Update (or AMO Update) to replace the traditional update. Fast selection methods are developed in both contexts to improve data retrieving speed. A significant improvement in speed performance is shown in the extensive experiments when performing write operations on TBAT in normal and Map-Reduce environment.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abadi, D.J., Boncz, P.A., Harizopoulos, S.: Column-oriented database systems. Proc. VLDB Endow. 2(2), 1664–1665 (2009)
Aiyer, A.S., Bautin, M., Chen, G.J., Damania, P., Khemani, P., Muthukkaruppan, K., Ranganathan, K., Spiegelberg, N., Tang, L., Vaidya, M.: Storage infrastructure behind facebook messages: using HBase at scale. IEEE Data Eng. Bull. 35(2), 4–13 (2012)
Boncz, P.: Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications. Ph.D. thesis, Universiteit van Amsterdam, Amsterdam, The Netherlands, May 2002
Boncz, P., Grust, T., Van Keulen, M., Manegold, S., Rittinger, J., Teubner, J.: Monetdb/xquery: a fast xquery processor powered by a relational engine. In: ACM SIGMOD, pp. 479–490 (2006)
Brill, R.: The Taxir Primer. ERIC, Washington, D.C (1971)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4:1–4:26 (2008)
Copeland, G.P., Khoshafian, S.N.: A decomposition storage model. In: Proceedings of ACM SIGMOD Record, vol. 14, pp. 268–279. ACM (1985)
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)
Estabrook, G.F., Brill, R.C.: The theory of the taxir accessioner. Math. Biosci. 5(3), 327–340 (1969)
Färber, F., Cha, S.K., Primsch, J., Bornhövd, C., Sigg, S., Lehner, W.: SAP HANA database: data management for modern business applications. SIGMOD Rec. 40(4), 45–51 (2012)
Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database - an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)
George, L.: HBase: The Definitive Guide. O’Reilly Media Inc., CA (2011)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003)
Gluche, D., Grust, T., Mainberger, C., Scholl, M.: Incremental updates for materialized OQL views. In: Bry, François (ed.) DOOD 1997. LNCS, vol. 1341, pp. 52–66. Springer, Heidelberg (1997)
Khoshafian, S., Copeland, G.P., Jagodis, T., Boral, H., Valduriez, P.: A query processing strategy for the decomposed storage model. In: Proceedings, pp. 636. Order from IEEE Computer Society (1987)
Krueger, J., Grund, M., Tinnefeld, C., Plattner, H., Zeier, A., Faerber, F.: Optimizing write performance for read optimized databases. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 291–305. Springer, Heidelberg (2010)
Krueger, J., Kim, C., Grund, M., Satish, N., Schwalb, D., Chhugani, J., Plattner, H., Dubey, P., Zeier, A.: Fast updates on read-optimized databases using multi-core cpus. Proc. VLDB Endow. 5(1), 61–72 (2011)
Ladwig, G., Harth, A.: Cumulusrdf: linked data management on nested key-value stores. In: The 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2011), p. 30 (2011)
Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., Bear, C.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)
White, T.: Hadoop: The Definitive Guide, 2nd edn. O’Reilly, CA (2010)
Zukowski, M., Nes, N., Boncz, P.: Dsm vs. nsm: Cpu performance tradeoffs in block-oriented query processing. In: DaMoN 2008, pp. 47–54. ACM, New York (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yu, F., Hou, WC. (2015). A Framework of Write Optimization on Read-Optimized Out-of-Core Column-Store Databases. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-22849-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22848-8
Online ISBN: 978-3-319-22849-5
eBook Packages: Computer ScienceComputer Science (R0)