Abstract
The development and investigation of efficient methods of parallel processing of very large databases using the columnar data representation designed for computer cluster is discussed. An approach that combines the advantages of relational and column-oriented DBMSs is proposed. A new type of distributed column indexes fragmented based on the domain-interval principle is introduced. The column indexes are auxiliary structures that are constantly stored in the distributed main memory of a computer cluster. To match the elements of a column index to the tuples of the original relation, surrogate keys are used. Resource hungry relational operations are performed on the corresponding column indexes rather than on the original relations of the database. As a result, a precomputation table is obtained. Using this table, the DBMS reconstructs the resulting relation. For basic relational operations on column indexes, methods for their parallel decomposition that do not require massive data exchanges between the processor nodes are proposed. This approach improves the class OLAP query performance by hundreds of times.
Similar content being viewed by others
References
Turner, V., Gantz, J.F., Reinsel, D., et al., The Digital Universe of Opportunities: Rich Data and the creasing Value of the Internet of Things: IDC white paper, 2014. http://www.idcdocserv.com/1678.
Big Data Insights. Microsoft, 2013. https://blogs.msdn.microsoft.com/microsoftenterpriseinsight/2013/ 04/12/big-data-insights/.
Stonebraker, M., Madden, S., and Dubey, P., Intel “big data” science and technology center vision and execution plan, ACM SIGMOD Record, 2013, vol. 42, no. 1, pp. 44–49.
Harizopoulos S., Abadi D., Madden S., and Stonebraker, M., OLTP through the looking glass, and what we found there, in Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 2008, pp. 981–992.
Williams, M.H. and Zhou, S., Data placement in parallel database systems, Parallel database techniques, 1998, pp. 203–218.
TOP500: 500 most powerful computer systems in the world. http://top500.org.
Kostenetskii, P.S. and Sokolinsky, L.B., Simulation of hierarchical multiprocessor database systems, Program. Comput. Software, 2013, vol. 39, no. 1, pp. 10–24.
Lepikhov, A.V. and Sokolinsky, L.B., Query processing in a DBMS for cluster systems, Program. Comput. Software, 2010, vol. 36, no. 4, pp. 205–215.
Lima, A.A., Furtado, C., Valduriez, P., and Mattoso, M., Parallel OLAP query processing in database clusters with data replication, Distributed Parallel Databases, 2009, vol. 25, no. 1–2, pp. 97–123.
Pukdesree, S., Lacharoj, V., and Sirisang, P., Performance evaluation of distributed database on PCcluster computers, WSEAS Trans. Comput., 2011, vol. 10, no. 1, pp. 21–30.
Sokolinsky, L.B., Parallel Database Systems. Moscow: Mosk. Gos. Univ., 2013.
Taniar, D., Leung, C.H.C., Rahayu, W., and Goel, S., High Performance Parallel Database Processing and Grid Databases, Wiley, 2008.
Sokolinsky, L.B., Survey of architectures of parallel database systems, Program. Comput. Software, 2004, vol. 30, no. 6, pp. 337–346.
Deshmukh, P.A., Review on main memory database, Int. J. Comput. Commun. Technol., 2011. vol. 2, no. 7, pp. 54–58.
Garcia-Molina, H. and Salem, K., Main memory database systems: An overview, IEEE Trans. Knowl. Data Eng., 1992, vol. 4, no. 6, pp. 509–516.
Plattner, H. and Zeier, A., In-Memory Data Management: An Inflection Point for Enterprise Applications, Springer, 2011.
LeHong, H., Fenn, J., Hype Cycle for Emerging Technologies, Research Report, Gartner, 2013.
Chaudhuri, S. and Dayal, U., An overview of data warehousing and OLAP technology, SIGMOD Record, 1997, vol. 26, no. 1, pp. 65–74.
Furtado, P., A survey of parallel and distributed data warehouses, Int. J. Data Warehousing Mining, 2009, vol. 5, no. 5, pp. 57–77.
Golfarelli, M. and Rizzi, S., A survey on temporal data warehousing, Int. J. Data Warehousing Mining, 2009, vol. 5, no. 1, pp. 1–17.
Oueslati, W. and Akaichi, J., A survey on data warehouse evolution, Int. J. Database Management Syst., 2010, vol. 2, no. 4, pp. 11–24.
Boncz, P.A. and Kersten, M.L., MIL primitives for querying a fragmented world, VLDB J., 1999, vol. 8, no. 2, pp. 101–119.
Boncz, P.A., Zukowski, M., and Nes, N., MonetDB/X100: Hyper-pipelining query execution, in Proc. of the Second Biennial Conference on Innovative Data Systems Research (CIDR), 2005, pp. 225–237.
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S.R., O’Neil E.J., O’Neil, P.E., Rasin, A., Tran, N., and Zdonik, S.B., C-Store: A column-oriented DBMS in Proc. of the 31st Int. Conf. on Very Large Data Bases (VLDB’05), 2005, pp. 553–564.
Abadi, D.J., Madden, S.R., and Hachem, N., Column-stores vs. row-stores: How different are they really? in Proc. of the 2008 ACM SIGMOD Int. Conf. on Management of Data, 2008, pp. 967–980.
Abadi, D.J., Madden, S.R., and Ferreira, M., Integrating compression and execution in column-oriented database systems, in Proc. of the 2006 ACM SIGMOD Int. Conf. on Management of Data, 2006, pp. 671–682.
Chernyshev, G.A., Organization of the physical level of column-oriented DBMSs, Tr. St. Petersburg Inst. Infor. Avtom. Ross. Akad. Nauk SPIIRAN, 2013, no. 7 (30), pp. 204–222. http://www.proceedings.spiiras.nw.ru/ojs/index.php/sp/index.
Abadi, D.J., Boncz, P.A., and Harizopoulos, S., Column-oriented database Systems, in Proc. of the VLDB Endowment, 2009, vol. 2, no. 2, pp. 1664–1665.
Abadi, D.J., Boncz, P.A., Harizopoulos, S., Idreos, S., and Madden S., The design and implementation of modern column-oriented database systems, Foundations Trends Databases, 2013, vol. 5, no. 3, pp. 197–280.
Plattner, H., A common database approach for OLTP and OLAP using an in-memory column database, in Proc. of the 2009 ACM SIGMOD Int. Conf. on Management of Data, 2009, pp. 1–2.
Copeland, G.P. and Khoshafian, S. N., A decomposition storage model, in Proc. of the 1985 ACM SIGMOD Int. Conf. on Management of Data, 1985, pp. 268–279.
Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, S., and Kersten, M.L., MonetDB: Two decades of research in column-oriented database architectures, IEEE Data Eng. Bull., 2012, vol. 35, no. 1, pp. 40–45.
Zukowski, M., Heman, S., Nes, N., and Boncz, P., Super-scalar RAM-CPU cache compression, Proc. of the 22nd Int. Conf. on Data Engineering, 2006, pp. 59–71.
Chen, Z., Gehrke, J., and Korn, F., Query optimization in compressed database systems, in Proc. of the 2001 ACM SIGMOD International Conference on Management of Data, 2001, pp. 271–282.
Westmann, T., Kossmann, D., Helmer, S., and Moerkotte, G., The implementation and performance of compressed databases, ACM SIGMOD Record, 2000. vol. 29, no. 3, pp. 55–67.
Aghav, S., Database compression techniques for performance optimization, in Proc. of the 2010 2nd Int. Conf. on Computer Engineering and Technology (ICCET), 2010, pp. 714–717.
Lemke, C., Sattler, K.-U., Faerber, F., Zeier, A., Speeding up queries in column stores: A case for compression, Proc. of the 12th Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK’10), 2010, pp. 117–129.
Ramamurthy, R., Dewitt, D., and Su, Q., A case for fractured mirrors, in Proc. of the VLDB Endowment, 2002, vol. 12, no. 2. pp. 89–101.
Khoshafian, S., Copeland, G., Jagodis, T., Boral, H., and Valduriez, P., A query processing strategy for the decomposed storage model, in Proc. of the Third Int. Conf. on Data Engineering, 1987, pp. 636–643.
Bruno, N., Teaching an old elephant new tricks, in Online Proc. of the Fourth Biennial Conf. on Innovative Data Systems Research (CIDR 2009), 2009. http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_2.pdf.
El-Helw, A., Ross, K.A., Bhattacharjee, B., Lang, C.A., and Mihaila, G.A., Column-oriented query processing for row stores, Proc. of the ACM 14th Int. Workshop on Data Warehousing and OLAP (DOLAP’ 11), 2011, pp. 67–74.
Larson, P.-A., Clinciu, C., Hanson, E.N., Oks, A., Price, S.L., Rangarajan, S., Surna, A., and Zhou, Q., SQL server column store indexes, in Proc. of the 2011 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD’ 11), 2011, pp. 1177–1184.
TPC Benchmark DS–Standard Specification, Transaction Processing Performance Council, 2015. http://www.tpc.org/TPC_Documents_Current_Versions/pdf/tpc-ds_v2.1.0.pdf.
Shapiro, M. and Miller, E., Managing databases with binary large objects, in 16th IEEE Symp. on Mass Storage Systems, 1999, pp. 185–193.
Padmanabhan, S., Malkemus, T., Agarwal, R., and Jhingran A., Block oriented processing of relational database operations in modern computer architectures, in Proc. of the 17th Int. Conf. on Data Engineering, 2001, pp. 567–574.
O’Neil, P.E., Chen, X., and O’Neil, E.J., Adjoined dimension column index to improve star schema query performance, in Proc. of the 24th Int. Conf. on Data Engineering (ICDE 2008), 2008, pp. 1409–1411.
O’Neil, P.E., O’Neil, E.J., and Chen, X., The Star Schema Benchmark (SSB), Revision 3, June 5, 2009. http://www.cs.umb.edu/ poneil/StarSchemaB.PDF.
O’Neil, P.E., O’Neil, E.J., Chen, X., and Revilak, S., The star schema benchmark and augmented fact table indexing: performance evaluation and benchmarking, in First TPC Technology Conference (TPCTC 2009), 2009, pp. 237–252.
Garcia-Molina, H., Ullman, J.D., and Widom, J., Database Systems: The Complete Book, Upper Saddle River, NJ: Prentice Hall, 2002.
Ivanova, E. and Sokolinsky, L.B., Join decomposition based on fragmented column indices, Lobachevskii J. Math., 2016, vol. 37, no. 3, pp. 255–260.
Ivanova, E.V. and Sokolinsky, Using Intel Xeon Phi Coprocessors for execution of natural join on compressed data, Vychisl. Metody Program: Novye Vychisl. Tekhnol, 2015, vol. 16, no. 4, pp. 534–542.
Ivanova, E.V. and Sokolinsky, L.B., Decomposition of the grouping operation based on distributed column indexes, Nauka YurGU: Materialy 67 nauchnoi konferentsii professorsko-prepodavatel’skogo sostava, aspirantov i sotrudnikov, Sec. Estestvennykh nauk (Proc. of the Conf. of the faculty and postgraduates of Yuzhno-Ural’sk State Unversity, Ser. Natural Sciences), 2015, pp. 15–22.
Ivanova, E.V. and Sokolinsky, L.B., Decomposition of intersection and join operations based on the domain interval fragmented column indexes, Vestn. Yuzhno-Ural’sk. Gos. Univ., Ser. Vychisl. Mat. Inform., 2015, vol. 4, no. 1, pp. 44–56.
Ivanova, E.V. and Sokolinsky, L.B., Parallel decomposition of relational operations based on fragmented column indexes, Vestn. Yuzhno-Ural’sk. Gos. Univ., Ser. Vychisl. Mat. Inform., 2015, vol. 4, no. 4, pp. 80–100.
Deutsch, P. and Gailly, J.-L., ZLIB Compressed Data Format Specification version 3.3. RFC Editor, 1996. https://www.ietf.org/rfc/rfc1950.txt.
Roelofs, G., Gailly, J., and Adler, M., Zlib: A Massively Spiffy Yet Delicately Unobtrusive Compression Library. http://www.zlib.net/.
Deutsch, P., DEFLATE Compressed Data Format Specification version 1.3. RFC Editor, 1996. https:// www.ietf.org/rfc/rfc1951.txt.
Kostenetskiy, P.S. and Safonov, A.Y., SUSU Supercomputer Resources, in Proc. of the 10th Annual Int. Scientific Conf. on Parallel Computing Technologies (PCT 2016), CEUR Workshop Proceedings, Vol. 1576, CEUR-WS 2015, pp. 561–573.
Massively Parallel Supercomputer RSC PetaStream.http://rscgroup.ru/ru/our-solutions/massivno-parallelnyy-superkompyuter-rsc-petastream.
TPC Benchmark H–Standard Specification. Transaction Processing Performance Council, 2014. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf.
Ivanova, E.V. and Sokolinsky, L.B., Columnar database coprocessor for computing cluster system, Vestn. Yuzhno-Ural’sk. Gos. Univ., Ser. Vychisl. Mat. Inform., 2015, vol. 4, no. 4, pp. 5–31.
Gray, J., Sundaresan, P., Englert, S., Baclawski, K.,and Weinberger, P.J., Quickly generating billion-record synthetic databases in Proc. of the 1994 ACM SIGMOD Int. Conf. on Management of Data, 1994, pp. 243–252.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © E.V. Ivanova, L.B. Sokolinsky, 2017, published in Programmirovanie, 2017, Vol. 43, No. 3.
Rights and permissions
About this article
Cite this article
Ivanova, E.V., Sokolinsky, L.B. Parallel processing of very large databases using distributed column indexes. Program Comput Soft 43, 131–144 (2017). https://doi.org/10.1134/S0361768817030069
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0361768817030069