Parallel processing of very large databases using distributed column indexes

Ivanova, E. V.; Sokolinsky, L. B.

doi:10.1134/S0361768817030069

Parallel processing of very large databases using distributed column indexes

Published: 26 May 2017

Volume 43, pages 131–144, (2017)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

E. V. Ivanova¹ &
L. B. Sokolinsky¹

168 Accesses
4 Citations
Explore all metrics

Abstract

The development and investigation of efficient methods of parallel processing of very large databases using the columnar data representation designed for computer cluster is discussed. An approach that combines the advantages of relational and column-oriented DBMSs is proposed. A new type of distributed column indexes fragmented based on the domain-interval principle is introduced. The column indexes are auxiliary structures that are constantly stored in the distributed main memory of a computer cluster. To match the elements of a column index to the tuples of the original relation, surrogate keys are used. Resource hungry relational operations are performed on the corresponding column indexes rather than on the original relations of the database. As a result, a precomputation table is obtained. Using this table, the DBMS reconstructs the resulting relation. For basic relational operations on column indexes, methods for their parallel decomposition that do not require massive data exchanges between the processor nodes are proposed. This approach improves the class OLAP query performance by hundreds of times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Article Open access 15 January 2021

A Batched Jacobi SVD Algorithm on GPUs and Its Application to Quantum Lattice Systems

Comparing Oracle and PostgreSQL, Performance and Optimization

References

Turner, V., Gantz, J.F., Reinsel, D., et al., The Digital Universe of Opportunities: Rich Data and the creasing Value of the Internet of Things: IDC white paper, 2014. http://www.idcdocserv.com/1678.
Google Scholar
Big Data Insights. Microsoft, 2013. https://blogs.msdn.microsoft.com/microsoftenterpriseinsight/2013/ 04/12/big-data-insights/.
Stonebraker, M., Madden, S., and Dubey, P., Intel “big data” science and technology center vision and execution plan, ACM SIGMOD Record, 2013, vol. 42, no. 1, pp. 44–49.
Article Google Scholar
Harizopoulos S., Abadi D., Madden S., and Stonebraker, M., OLTP through the looking glass, and what we found there, in Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 2008, pp. 981–992.
Google Scholar
Williams, M.H. and Zhou, S., Data placement in parallel database systems, Parallel database techniques, 1998, pp. 203–218.
Google Scholar
TOP500: 500 most powerful computer systems in the world. http://top500.org.
Kostenetskii, P.S. and Sokolinsky, L.B., Simulation of hierarchical multiprocessor database systems, Program. Comput. Software, 2013, vol. 39, no. 1, pp. 10–24.
Article MathSciNet MATH Google Scholar
Lepikhov, A.V. and Sokolinsky, L.B., Query processing in a DBMS for cluster systems, Program. Comput. Software, 2010, vol. 36, no. 4, pp. 205–215.
Article MathSciNet MATH Google Scholar
Lima, A.A., Furtado, C., Valduriez, P., and Mattoso, M., Parallel OLAP query processing in database clusters with data replication, Distributed Parallel Databases, 2009, vol. 25, no. 1–2, pp. 97–123.
Article Google Scholar
Pukdesree, S., Lacharoj, V., and Sirisang, P., Performance evaluation of distributed database on PCcluster computers, WSEAS Trans. Comput., 2011, vol. 10, no. 1, pp. 21–30.
Google Scholar
Sokolinsky, L.B., Parallel Database Systems. Moscow: Mosk. Gos. Univ., 2013.
MATH Google Scholar
Taniar, D., Leung, C.H.C., Rahayu, W., and Goel, S., High Performance Parallel Database Processing and Grid Databases, Wiley, 2008.
Book Google Scholar
Sokolinsky, L.B., Survey of architectures of parallel database systems, Program. Comput. Software, 2004, vol. 30, no. 6, pp. 337–346.
Article Google Scholar
Deshmukh, P.A., Review on main memory database, Int. J. Comput. Commun. Technol., 2011. vol. 2, no. 7, pp. 54–58.
Google Scholar
Garcia-Molina, H. and Salem, K., Main memory database systems: An overview, IEEE Trans. Knowl. Data Eng., 1992, vol. 4, no. 6, pp. 509–516.
Article Google Scholar
Plattner, H. and Zeier, A., In-Memory Data Management: An Inflection Point for Enterprise Applications, Springer, 2011.
Book Google Scholar
LeHong, H., Fenn, J., Hype Cycle for Emerging Technologies, Research Report, Gartner, 2013.
Google Scholar
Chaudhuri, S. and Dayal, U., An overview of data warehousing and OLAP technology, SIGMOD Record, 1997, vol. 26, no. 1, pp. 65–74.
Article Google Scholar
Furtado, P., A survey of parallel and distributed data warehouses, Int. J. Data Warehousing Mining, 2009, vol. 5, no. 5, pp. 57–77.
Article Google Scholar
Golfarelli, M. and Rizzi, S., A survey on temporal data warehousing, Int. J. Data Warehousing Mining, 2009, vol. 5, no. 1, pp. 1–17.
Article Google Scholar
Oueslati, W. and Akaichi, J., A survey on data warehouse evolution, Int. J. Database Management Syst., 2010, vol. 2, no. 4, pp. 11–24.
Article Google Scholar
Boncz, P.A. and Kersten, M.L., MIL primitives for querying a fragmented world, VLDB J., 1999, vol. 8, no. 2, pp. 101–119.
Article Google Scholar
Boncz, P.A., Zukowski, M., and Nes, N., MonetDB/X100: Hyper-pipelining query execution, in Proc. of the Second Biennial Conference on Innovative Data Systems Research (CIDR), 2005, pp. 225–237.
Google Scholar
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S.R., O’Neil E.J., O’Neil, P.E., Rasin, A., Tran, N., and Zdonik, S.B., C-Store: A column-oriented DBMS in Proc. of the 31st Int. Conf. on Very Large Data Bases (VLDB’05), 2005, pp. 553–564.
Google Scholar
Abadi, D.J., Madden, S.R., and Hachem, N., Column-stores vs. row-stores: How different are they really? in Proc. of the 2008 ACM SIGMOD Int. Conf. on Management of Data, 2008, pp. 967–980.
Chapter Google Scholar
Abadi, D.J., Madden, S.R., and Ferreira, M., Integrating compression and execution in column-oriented database systems, in Proc. of the 2006 ACM SIGMOD Int. Conf. on Management of Data, 2006, pp. 671–682.
Chapter Google Scholar
Chernyshev, G.A., Organization of the physical level of column-oriented DBMSs, Tr. St. Petersburg Inst. Infor. Avtom. Ross. Akad. Nauk SPIIRAN, 2013, no. 7 (30), pp. 204–222. http://www.proceedings.spiiras.nw.ru/ojs/index.php/sp/index.
Google Scholar
Abadi, D.J., Boncz, P.A., and Harizopoulos, S., Column-oriented database Systems, in Proc. of the VLDB Endowment, 2009, vol. 2, no. 2, pp. 1664–1665.
Article Google Scholar
Abadi, D.J., Boncz, P.A., Harizopoulos, S., Idreos, S., and Madden S., The design and implementation of modern column-oriented database systems, Foundations Trends Databases, 2013, vol. 5, no. 3, pp. 197–280.
Article Google Scholar
Plattner, H., A common database approach for OLTP and OLAP using an in-memory column database, in Proc. of the 2009 ACM SIGMOD Int. Conf. on Management of Data, 2009, pp. 1–2.
Google Scholar
Copeland, G.P. and Khoshafian, S. N., A decomposition storage model, in Proc. of the 1985 ACM SIGMOD Int. Conf. on Management of Data, 1985, pp. 268–279.
Chapter Google Scholar
Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, S., and Kersten, M.L., MonetDB: Two decades of research in column-oriented database architectures, IEEE Data Eng. Bull., 2012, vol. 35, no. 1, pp. 40–45.
Google Scholar
Zukowski, M., Heman, S., Nes, N., and Boncz, P., Super-scalar RAM-CPU cache compression, Proc. of the 22nd Int. Conf. on Data Engineering, 2006, pp. 59–71.
Google Scholar
Chen, Z., Gehrke, J., and Korn, F., Query optimization in compressed database systems, in Proc. of the 2001 ACM SIGMOD International Conference on Management of Data, 2001, pp. 271–282.
Chapter Google Scholar
Westmann, T., Kossmann, D., Helmer, S., and Moerkotte, G., The implementation and performance of compressed databases, ACM SIGMOD Record, 2000. vol. 29, no. 3, pp. 55–67.
Article Google Scholar
Aghav, S., Database compression techniques for performance optimization, in Proc. of the 2010 2nd Int. Conf. on Computer Engineering and Technology (ICCET), 2010, pp. 714–717.
Google Scholar
Lemke, C., Sattler, K.-U., Faerber, F., Zeier, A., Speeding up queries in column stores: A case for compression, Proc. of the 12th Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK’10), 2010, pp. 117–129.
Chapter Google Scholar
Ramamurthy, R., Dewitt, D., and Su, Q., A case for fractured mirrors, in Proc. of the VLDB Endowment, 2002, vol. 12, no. 2. pp. 89–101.
Google Scholar
Khoshafian, S., Copeland, G., Jagodis, T., Boral, H., and Valduriez, P., A query processing strategy for the decomposed storage model, in Proc. of the Third Int. Conf. on Data Engineering, 1987, pp. 636–643.
Google Scholar
Bruno, N., Teaching an old elephant new tricks, in Online Proc. of the Fourth Biennial Conf. on Innovative Data Systems Research (CIDR 2009), 2009. http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_2.pdf.
Google Scholar
El-Helw, A., Ross, K.A., Bhattacharjee, B., Lang, C.A., and Mihaila, G.A., Column-oriented query processing for row stores, Proc. of the ACM 14th Int. Workshop on Data Warehousing and OLAP (DOLAP’ 11), 2011, pp. 67–74.
Chapter Google Scholar
Larson, P.-A., Clinciu, C., Hanson, E.N., Oks, A., Price, S.L., Rangarajan, S., Surna, A., and Zhou, Q., SQL server column store indexes, in Proc. of the 2011 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD’ 11), 2011, pp. 1177–1184.
Chapter Google Scholar
TPC Benchmark DS–Standard Specification, Transaction Processing Performance Council, 2015. http://www.tpc.org/TPC_Documents_Current_Versions/pdf/tpc-ds_v2.1.0.pdf.
Shapiro, M. and Miller, E., Managing databases with binary large objects, in 16th IEEE Symp. on Mass Storage Systems, 1999, pp. 185–193.
Google Scholar
Padmanabhan, S., Malkemus, T., Agarwal, R., and Jhingran A., Block oriented processing of relational database operations in modern computer architectures, in Proc. of the 17th Int. Conf. on Data Engineering, 2001, pp. 567–574.
Chapter Google Scholar
O’Neil, P.E., Chen, X., and O’Neil, E.J., Adjoined dimension column index to improve star schema query performance, in Proc. of the 24th Int. Conf. on Data Engineering (ICDE 2008), 2008, pp. 1409–1411.
Google Scholar
O’Neil, P.E., O’Neil, E.J., and Chen, X., The Star Schema Benchmark (SSB), Revision 3, June 5, 2009. http://www.cs.umb.edu/ poneil/StarSchemaB.PDF.
Google Scholar
O’Neil, P.E., O’Neil, E.J., Chen, X., and Revilak, S., The star schema benchmark and augmented fact table indexing: performance evaluation and benchmarking, in First TPC Technology Conference (TPCTC 2009), 2009, pp. 237–252.
Google Scholar
Garcia-Molina, H., Ullman, J.D., and Widom, J., Database Systems: The Complete Book, Upper Saddle River, NJ: Prentice Hall, 2002.
Google Scholar
Ivanova, E. and Sokolinsky, L.B., Join decomposition based on fragmented column indices, Lobachevskii J. Math., 2016, vol. 37, no. 3, pp. 255–260.
Article MathSciNet MATH Google Scholar
Ivanova, E.V. and Sokolinsky, Using Intel Xeon Phi Coprocessors for execution of natural join on compressed data, Vychisl. Metody Program: Novye Vychisl. Tekhnol, 2015, vol. 16, no. 4, pp. 534–542.
Google Scholar
Ivanova, E.V. and Sokolinsky, L.B., Decomposition of the grouping operation based on distributed column indexes, Nauka YurGU: Materialy 67 nauchnoi konferentsii professorsko-prepodavatel’skogo sostava, aspirantov i sotrudnikov, Sec. Estestvennykh nauk (Proc. of the Conf. of the faculty and postgraduates of Yuzhno-Ural’sk State Unversity, Ser. Natural Sciences), 2015, pp. 15–22.
Google Scholar
Ivanova, E.V. and Sokolinsky, L.B., Decomposition of intersection and join operations based on the domain interval fragmented column indexes, Vestn. Yuzhno-Ural’sk. Gos. Univ., Ser. Vychisl. Mat. Inform., 2015, vol. 4, no. 1, pp. 44–56.
Google Scholar
Ivanova, E.V. and Sokolinsky, L.B., Parallel decomposition of relational operations based on fragmented column indexes, Vestn. Yuzhno-Ural’sk. Gos. Univ., Ser. Vychisl. Mat. Inform., 2015, vol. 4, no. 4, pp. 80–100.
Google Scholar
Deutsch, P. and Gailly, J.-L., ZLIB Compressed Data Format Specification version 3.3. RFC Editor, 1996. https://www.ietf.org/rfc/rfc1950.txt.
Google Scholar
Roelofs, G., Gailly, J., and Adler, M., Zlib: A Massively Spiffy Yet Delicately Unobtrusive Compression Library. http://www.zlib.net/.
Deutsch, P., DEFLATE Compressed Data Format Specification version 1.3. RFC Editor, 1996. https:// www.ietf.org/rfc/rfc1951.txt.
Book Google Scholar
Kostenetskiy, P.S. and Safonov, A.Y., SUSU Supercomputer Resources, in Proc. of the 10th Annual Int. Scientific Conf. on Parallel Computing Technologies (PCT 2016), CEUR Workshop Proceedings, Vol. 1576, CEUR-WS 2015, pp. 561–573.
Google Scholar
Massively Parallel Supercomputer RSC PetaStream.http://rscgroup.ru/ru/our-solutions/massivno-parallelnyy-superkompyuter-rsc-petastream.
TPC Benchmark H–Standard Specification. Transaction Processing Performance Council, 2014. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf.
Ivanova, E.V. and Sokolinsky, L.B., Columnar database coprocessor for computing cluster system, Vestn. Yuzhno-Ural’sk. Gos. Univ., Ser. Vychisl. Mat. Inform., 2015, vol. 4, no. 4, pp. 5–31.
Google Scholar
Gray, J., Sundaresan, P., Englert, S., Baclawski, K.,and Weinberger, P.J., Quickly generating billion-record synthetic databases in Proc. of the 1994 ACM SIGMOD Int. Conf. on Management of Data, 1994, pp. 243–252.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

South Ural State University, Chelyabinsk, 454080, Russia
E. V. Ivanova & L. B. Sokolinsky

Authors

E. V. Ivanova
View author publications
You can also search for this author in PubMed Google Scholar
L. B. Sokolinsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to E. V. Ivanova.

Additional information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ivanova, E.V., Sokolinsky, L.B. Parallel processing of very large databases using distributed column indexes. Program Comput Soft 43, 131–144 (2017). https://doi.org/10.1134/S0361768817030069

Download citation

Received: 20 April 2016
Published: 26 May 2017
Issue Date: May 2017
DOI: https://doi.org/10.1134/S0361768817030069

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel processing of very large databases using distributed column indexes

Abstract

Access this article

Similar content being viewed by others

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

A Batched Jacobi SVD Algorithm on GPUs and Its Application to Quantum Lattice Systems

Comparing Oracle and PostgreSQL, Performance and Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Parallel processing of very large databases using distributed column indexes

Abstract

Access this article

Similar content being viewed by others

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

A Batched Jacobi SVD Algorithm on GPUs and Its Application to Quantum Lattice Systems

Comparing Oracle and PostgreSQL, Performance and Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation