PosDB: An Architecture Overview

Chernishev, G. A.; Galaktionov, V. A.; Grigorev, V. D.; Klyuchikov, E. S.; Smirnov, K. K.

doi:10.1134/S0361768818010024

PosDB: An Architecture Overview

Published: 09 March 2018

Volume 44, pages 62–74, (2018)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

G. A. Chernishev^1,2,
V. A. Galaktionov¹,
V. D. Grigorev¹,
E. S. Klyuchikov¹ &
…
K. K. Smirnov¹

126 Accesses
4 Citations
Explore all metrics

Abstract

PosDB is an engine of a disk-based column-store DBMS designed for processing OLAP queries in a shared nothing environment. It is written completely from scratch and aims to become a platform for studying the distributed query processing in column-stores. This paper presents the first comprehensive description of the system. The presentation begins with the history of column-stores in order to clarify the reasons of their success. Next, the creation of a new system is justified, and an overview of its architecture is given. Finally, all its components are described in detail. Currently, query execution in PosDB is based on the Volcano model with block-oriented processing and late materialization. Various physical operators have been developed for relational operations such as join, aggregation, and selection. Some auxiliary operators were developed to support intraquery parallelism and network communication. Data distribution is achieved using horizontal range partitioning and data replication. The current version of PosDB can execute all queries from the Star Schema Benchmark in both centralized and distributed environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PosDB: A Distributed Column-Store Engine

Revisiting Data Compression in Column-Stores

A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns

References

Harizopoulos, S., Abadi, D., and Boncz, P., Column-Oriented Database Systems, VLDB 2009, Tutorial, 2009.
Google Scholar
Manegold, S., Boncz, P., Nes, N., and Kersten, M., Cache-conscious radix-decluster projections, in Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB’04, Toronto: VLDB Endowment, 2004, vol. 30, pp. 684–695.
Google Scholar
Abadi, D.J., Madden, S.R., and Hachem, N., Column-stores vs. row-stores: How different are they really? in Proc. of the 2008 ACM SIGMOD Int. Conf. on Management of Data, 2008, pp. 967–980.
Chapter Google Scholar
Abadi, D.J., Myers, D.S., DeWitt, D.J., and Madden, S., Materialization strategies in a column-oriented DBMS, in Proceedings of ICDE, Istanbul, 2007, Chirkova, R., Dogac, A., Özsu, M.T., and Timos K. Sellis, T.K., Eds., pp. 466–475.
Boncz, P.A., Zukowski, M., and Nes, N., MonetDB/x100: Hyper-pipelining query execution, in CIDR 2005, Second Biennial Conference on Innovative Data Systems Research, Asilomar, Calif., 2005, Online Proceedings, pp. 225–237. www.cidrdb.org, 2005.
Ivanova, I.E. and Sokolinsky, L. B., Parallel processing of very large databases using distributed column indexes, Program. Comput. Software, 2017, vol. 43, no. 3, pp. 131–144.
Article MathSciNet Google Scholar
Idreos, S., Kersten, M.L., and Manegold, S., Database cracking, in CIDR, pp. 68–78. www.cidrdb.org, 2007.
Graefe, G. and Kuno, H., Self-selecting, self-tuning, incrementally optimized indexes, in Proceedings of the 13th International Conference on Extending Database Technology, EDBT’10, New York: ACM, 2010, pp. 371–381.
Chapter Google Scholar
Abadi, D., Madden, S., and Ferreira, M., Integrating compression and execution in column-oriented data-base systems, in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD’ 06, New York: ACM, 2006, pp. 671–682.
Chapter Google Scholar
Holloway, A.L., Raman, V., Swart, G., and DeWitt, D.J., How to barter bits for chronons: Compression and bandwidth trade offs for database scans, in Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD’07, New York: ACM, 2007, pp. 389–400.
Chapter Google Scholar
Ivanova, M., Kersten, M.L., and Nes, N., Self-organizing strategies for a columnstore database, in Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, EDBT’ 08, New York: ACM, 2008, pp. 157–168.
Chapter Google Scholar
Shrinivas, L., Bodagala, S., Varadarajan, R., Cary, A., Bharathan, V., and Bear, C., Materialization strategies in the vertica analytic database: Lessons learned, in 2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013, pp. 1196–1207.
Chapter Google Scholar
Tsirogiannis, D., Harizopoulos, S., Shah, M.A., Wiener, J.L., and Graefe, G., Query processing techniques for solid state drives, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD’ 09, New York: ACM, 2009, pp. 59–72.
Google Scholar
Hankins R.A. and Patel, J.M., Data morphing: an adaptive, cache-conscious storage technique, in Proceedings of the 29th international conference on Very large data bases, VLDB’ 2003, VLDB Endowment, 2003, vol. 29, pp. 417–428.
Article Google Scholar
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., and Zdonik, S., Cstore: A column-oriented DBMS, in Proceedings of the 31st International Conference on Very Large Data Bases, VLDB’ 05, VLDB Endowment, 2005, pp. 553–564.
Google Scholar
Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., and Kersten, M.L., MonetDB: Two decades of research in column-oriented database architectures, IEEE Data Eng. Bull., 2012, vol. 35, no. 1, pp. 40–45.
Google Scholar
Chernishev, G., Towards Self-management in a distributed column-store system, Cham: Springer, 2015, pp. 97–107.
Google Scholar
Chernishev, G. The Design of an Adaptive Column-Store System, J. Big Data, 2017, vol. 4, no. 1, 2017.
Article Google Scholar
Graefe, G., Query evaluation techniques for large databases, ACM Comput. Surv., 1993, vol. 25, no. 2, pp. 73–169.
Article Google Scholar
O’Neil, P.E., O’Neil, E.J., and Chen, X., The star schema benchmark (SSB). http://www.cs.umb.edu/~poneil/StarSchemaB.PDF, 2009. Accessed September 10, 2017.
Chernishev, G., Galaktionov, V., Grigorev, V., Klyuchikov, E., and Smirnov, K. A study of PosDB Performance in a Distributed Environment, in Proceedings of the 2017 Software Engineering and Information Management, SEIM’ 17, 2017.
Google Scholar
Karasalo, I. and Svensson, P., The design of cantor: A new system for data analysis, in Proceedings of the 3rd international workshop on Statistical and scientific database management, Berkeley, 1986, pp. 224–244.
Google Scholar
Copeland, G.P. and Khoshafian, S.N., A decomposition storage model, SIGMOD Rec., 1985, vol. 14, no. 4, pp. 268–279.
Article Google Scholar
Khoshafian, S., Copeland, G.P., Jagodis, T., Boral, H., and Valduriez, P., A query processing strategy for the decomposed storage model, in Proceedings of the Third International Conference on Data Engineering, Washington, 1987, pp. 636–643.
Google Scholar
Shao, M., Schindler, J., Schlosser, S.W., Ailamaki, A., and Ganger. G.R., Clotho: Decoupling memory page layout from storage organization, in Proceedings of the Thirtieth international conference on Very large data bases, VLDB’ 04, VLDB Endowment, 2004, vol. 30, pp. 696–707.
Google Scholar
Ailamaki, A., DeWitt, D.J., Hill, M.D., and Skounakis, M., Weaving relations for cache performance, in Proceedings of the 27th International Conference on Very Large Data Bases, VLDB’ 01, San Francisco, 2001, pp. 169–180.
Google Scholar
Abadi, D., Boncz, P., and Harizopoulos, S., The Design and Implementation of Modern Column-Oriented Database Systems, Hanover, Mass.: Now, 2013.
Google Scholar
Chernyshev, G., Physical Design Approaches for Column-Stores, Tr.St. Petersburg Inst. Infor. Avtom. Ross. Akad. Nauk SPIIRAN, 2013, vol. 7, pp. 204–222.
Google Scholar
Abadi, D., Boncz, P., and Harizopoulos, S., Columnoriented database systems, VLDB Endowment, 2009, vol. 2, no. 2, pp. 1664–1665.
Article Google Scholar
OLAP, in editors, Encyclopedia of Database Systems, Liu, Ling and Özsu, M.T., Eds., Springer, 2009, pp. 1947–1947. doi 10.1007/978-0-387-39940-9_3191
Bellatreche, L. and Benkrid, S., A joint design approach of partitioning and allocation in parallel data warehouses, in Data Warehousing and Knowledge Discovery, Pedersen, T., Mohania, M., and Tjoa, A., Eds., Lecture Notes in Computer Science, vol. 5691, pp. 99–110, Berlin: Springer, 2009. doi 10.1007/978-3-642-03730-6_9
Zhang, Y., Xiao, Y., Wang, Z., Ji, X., Huang, Y., and Wang, S., ScaMMDB: Facing Challenge of Mass Data Processing with MMDB, Berlin: Springer, 2009, pp. 1–12.
Google Scholar
Liu, Y., Cao, F., Mortazavi, M., Chen, M., Yan, N., Ku, C., Adnaik, A., Morgan, S., Shi, G., Wang, Y., and Fang, F., DCODE: A Distributed Column-Oriented Database Engine for Big Data Analytics, Cham: Springer, 2015, pp. 289–299
Google Scholar
Arulraj, J., Pavlo, A., and Menon, P., Bridging the archipelago between row-stores and column-stores for hybrid workloads, in Proceedings of the 2016 International Conference on Management of Data, SIGMOD’16, 2016, pp. 583–598.
Chapter Google Scholar
Google. Supersonic library. https://code.google.com/archive/p/supersonic/, 2017. Accessed February 12, 2017.
DeWitt, D. and Gray, J., Parallel database systems: The future of high performance database systems, Commun. ACM, 1992, vol. 35, no. 6, pp. 85–98.
Article Google Scholar
Kossmann, D., The state of the art in distributed query processing, ACM Comput. Surv., 2000, vol. 32, no. 4, pp. 422–469.
Article Google Scholar
Tran, N., Lamb, A., Shrinivas, L., Bodagala, S., and Dave, J., The Vertica query optimizer: The case for specialized query optimizers, in IEEE 30th International Conference on Data Engineering, 2014, pp. 1108–1119.
Google Scholar
Graefe, G., Volcano—an extensible and parallel query evaluation system, IEEE Trans. Knowl. Data Eng., 1994, no. 1, pp. 120–135.
Article Google Scholar
Neumann, T., Efficiently compiling efficient query plans for modern hardware, VLDB Endowment, 2011, Vol. 4, no. p, pp. 539–550.
Article Google Scholar
Padmanabhan, S., Malkemus, T., Agarwal, R.C., and Jhingran, A., Block oriented processing of relational database operations in modern computer architectures, in Proceedings of the 17th International Conference on Data Engineering, Washington, 2001, pp. 567–574.
Chapter Google Scholar
Zukowski, M., Nes, N. and Boncz, P., Dsm vs. nsm: Cpu performance tradeoffs in block-oriented query processing, in Proceedings of the 4th International Workshop on Data Management on New Hardware, DaMoN’ 08, New York, 2008, pp. 47–54.
Chapter Google Scholar
Jacobs, A., The pathologies of big data, Commun. ACM, 2009, vol. 52, no. 8, pp. 36–44.
Article Google Scholar
Li Zhe and Ross, K.A., Fast joins using join indices, VLDB J., 1999, vol. 8, no. pp. 1–24.
Article Google Scholar
Neumann, T., Efficient generation and execution of DAG-structured query graphs, Doctoral Dissertation, 2005.
Google Scholar

Download references

Author information

Authors and Affiliations

St. Petersburg State University, St. Petersburg, Russia
G. A. Chernishev, V. A. Galaktionov, V. D. Grigorev, E. S. Klyuchikov & K. K. Smirnov
JetBrains Research, St. Petersburg, Russia
G. A. Chernishev

Authors

G. A. Chernishev
View author publications
You can also search for this author in PubMed Google Scholar
V. A. Galaktionov
View author publications
You can also search for this author in PubMed Google Scholar
V. D. Grigorev
View author publications
You can also search for this author in PubMed Google Scholar
E. S. Klyuchikov
View author publications
You can also search for this author in PubMed Google Scholar
K. K. Smirnov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. A. Chernishev.

Additional information

Original Russian Text © G.A. Chernishev, V.A. Galaktionov, V.D. Grigorev, E.S. Klyuchikov, K.K. Smirnov, 2018, published in Programmirovanie, 2018, Vol. 44, No. 1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chernishev, G.A., Galaktionov, V.A., Grigorev, V.D. et al. PosDB: An Architecture Overview. Program Comput Soft 44, 62–74 (2018). https://doi.org/10.1134/S0361768818010024

Download citation

Received: 10 September 2017
Published: 09 March 2018
Issue Date: January 2018
DOI: https://doi.org/10.1134/S0361768818010024

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PosDB: An Architecture Overview

Abstract

Access this article

Similar content being viewed by others

PosDB: A Distributed Column-Store Engine

Revisiting Data Compression in Column-Stores

A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Navigation

PosDB: An Architecture Overview

Abstract

Access this article

Similar content being viewed by others

PosDB: A Distributed Column-Store Engine

Revisiting Data Compression in Column-Stores

A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation