Skip to main content
Log in

Parallel processing of very large databases using distributed column indexes

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

The development and investigation of efficient methods of parallel processing of very large databases using the columnar data representation designed for computer cluster is discussed. An approach that combines the advantages of relational and column-oriented DBMSs is proposed. A new type of distributed column indexes fragmented based on the domain-interval principle is introduced. The column indexes are auxiliary structures that are constantly stored in the distributed main memory of a computer cluster. To match the elements of a column index to the tuples of the original relation, surrogate keys are used. Resource hungry relational operations are performed on the corresponding column indexes rather than on the original relations of the database. As a result, a precomputation table is obtained. Using this table, the DBMS reconstructs the resulting relation. For basic relational operations on column indexes, methods for their parallel decomposition that do not require massive data exchanges between the processor nodes are proposed. This approach improves the class OLAP query performance by hundreds of times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Turner, V., Gantz, J.F., Reinsel, D., et al., The Digital Universe of Opportunities: Rich Data and the creasing Value of the Internet of Things: IDC white paper, 2014. http://www.idcdocserv.com/1678.

    Google Scholar 

  2. Big Data Insights. Microsoft, 2013. https://blogs.msdn.microsoft.com/microsoftenterpriseinsight/2013/ 04/12/big-data-insights/.

  3. Stonebraker, M., Madden, S., and Dubey, P., Intel “big data” science and technology center vision and execution plan, ACM SIGMOD Record, 2013, vol. 42, no. 1, pp. 44–49.

    Article  Google Scholar 

  4. Harizopoulos S., Abadi D., Madden S., and Stonebraker, M., OLTP through the looking glass, and what we found there, in Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 2008, pp. 981–992.

    Google Scholar 

  5. Williams, M.H. and Zhou, S., Data placement in parallel database systems, Parallel database techniques, 1998, pp. 203–218.

    Google Scholar 

  6. TOP500: 500 most powerful computer systems in the world. http://top500.org.

  7. Kostenetskii, P.S. and Sokolinsky, L.B., Simulation of hierarchical multiprocessor database systems, Program. Comput. Software, 2013, vol. 39, no. 1, pp. 10–24.

    Article  MathSciNet  MATH  Google Scholar 

  8. Lepikhov, A.V. and Sokolinsky, L.B., Query processing in a DBMS for cluster systems, Program. Comput. Software, 2010, vol. 36, no. 4, pp. 205–215.

    Article  MathSciNet  MATH  Google Scholar 

  9. Lima, A.A., Furtado, C., Valduriez, P., and Mattoso, M., Parallel OLAP query processing in database clusters with data replication, Distributed Parallel Databases, 2009, vol. 25, no. 1–2, pp. 97–123.

    Article  Google Scholar 

  10. Pukdesree, S., Lacharoj, V., and Sirisang, P., Performance evaluation of distributed database on PCcluster computers, WSEAS Trans. Comput., 2011, vol. 10, no. 1, pp. 21–30.

    Google Scholar 

  11. Sokolinsky, L.B., Parallel Database Systems. Moscow: Mosk. Gos. Univ., 2013.

    MATH  Google Scholar 

  12. Taniar, D., Leung, C.H.C., Rahayu, W., and Goel, S., High Performance Parallel Database Processing and Grid Databases, Wiley, 2008.

    Book  Google Scholar 

  13. Sokolinsky, L.B., Survey of architectures of parallel database systems, Program. Comput. Software, 2004, vol. 30, no. 6, pp. 337–346.

    Article  Google Scholar 

  14. Deshmukh, P.A., Review on main memory database, Int. J. Comput. Commun. Technol., 2011. vol. 2, no. 7, pp. 54–58.

    Google Scholar 

  15. Garcia-Molina, H. and Salem, K., Main memory database systems: An overview, IEEE Trans. Knowl. Data Eng., 1992, vol. 4, no. 6, pp. 509–516.

    Article  Google Scholar 

  16. Plattner, H. and Zeier, A., In-Memory Data Management: An Inflection Point for Enterprise Applications, Springer, 2011.

    Book  Google Scholar 

  17. LeHong, H., Fenn, J., Hype Cycle for Emerging Technologies, Research Report, Gartner, 2013.

    Google Scholar 

  18. Chaudhuri, S. and Dayal, U., An overview of data warehousing and OLAP technology, SIGMOD Record, 1997, vol. 26, no. 1, pp. 65–74.

    Article  Google Scholar 

  19. Furtado, P., A survey of parallel and distributed data warehouses, Int. J. Data Warehousing Mining, 2009, vol. 5, no. 5, pp. 57–77.

    Article  Google Scholar 

  20. Golfarelli, M. and Rizzi, S., A survey on temporal data warehousing, Int. J. Data Warehousing Mining, 2009, vol. 5, no. 1, pp. 1–17.

    Article  Google Scholar 

  21. Oueslati, W. and Akaichi, J., A survey on data warehouse evolution, Int. J. Database Management Syst., 2010, vol. 2, no. 4, pp. 11–24.

    Article  Google Scholar 

  22. Boncz, P.A. and Kersten, M.L., MIL primitives for querying a fragmented world, VLDB J., 1999, vol. 8, no. 2, pp. 101–119.

    Article  Google Scholar 

  23. Boncz, P.A., Zukowski, M., and Nes, N., MonetDB/X100: Hyper-pipelining query execution, in Proc. of the Second Biennial Conference on Innovative Data Systems Research (CIDR), 2005, pp. 225–237.

    Google Scholar 

  24. Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S.R., O’Neil E.J., O’Neil, P.E., Rasin, A., Tran, N., and Zdonik, S.B., C-Store: A column-oriented DBMS in Proc. of the 31st Int. Conf. on Very Large Data Bases (VLDB’05), 2005, pp. 553–564.

    Google Scholar 

  25. Abadi, D.J., Madden, S.R., and Hachem, N., Column-stores vs. row-stores: How different are they really? in Proc. of the 2008 ACM SIGMOD Int. Conf. on Management of Data, 2008, pp. 967–980.

    Chapter  Google Scholar 

  26. Abadi, D.J., Madden, S.R., and Ferreira, M., Integrating compression and execution in column-oriented database systems, in Proc. of the 2006 ACM SIGMOD Int. Conf. on Management of Data, 2006, pp. 671–682.

    Chapter  Google Scholar 

  27. Chernyshev, G.A., Organization of the physical level of column-oriented DBMSs, Tr. St. Petersburg Inst. Infor. Avtom. Ross. Akad. Nauk SPIIRAN, 2013, no. 7 (30), pp. 204–222. http://www.proceedings.spiiras.nw.ru/ojs/index.php/sp/index.

    Google Scholar 

  28. Abadi, D.J., Boncz, P.A., and Harizopoulos, S., Column-oriented database Systems, in Proc. of the VLDB Endowment, 2009, vol. 2, no. 2, pp. 1664–1665.

    Article  Google Scholar 

  29. Abadi, D.J., Boncz, P.A., Harizopoulos, S., Idreos, S., and Madden S., The design and implementation of modern column-oriented database systems, Foundations Trends Databases, 2013, vol. 5, no. 3, pp. 197–280.

    Article  Google Scholar 

  30. Plattner, H., A common database approach for OLTP and OLAP using an in-memory column database, in Proc. of the 2009 ACM SIGMOD Int. Conf. on Management of Data, 2009, pp. 1–2.

    Google Scholar 

  31. Copeland, G.P. and Khoshafian, S. N., A decomposition storage model, in Proc. of the 1985 ACM SIGMOD Int. Conf. on Management of Data, 1985, pp. 268–279.

    Chapter  Google Scholar 

  32. Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, S., and Kersten, M.L., MonetDB: Two decades of research in column-oriented database architectures, IEEE Data Eng. Bull., 2012, vol. 35, no. 1, pp. 40–45.

    Google Scholar 

  33. Zukowski, M., Heman, S., Nes, N., and Boncz, P., Super-scalar RAM-CPU cache compression, Proc. of the 22nd Int. Conf. on Data Engineering, 2006, pp. 59–71.

    Google Scholar 

  34. Chen, Z., Gehrke, J., and Korn, F., Query optimization in compressed database systems, in Proc. of the 2001 ACM SIGMOD International Conference on Management of Data, 2001, pp. 271–282.

    Chapter  Google Scholar 

  35. Westmann, T., Kossmann, D., Helmer, S., and Moerkotte, G., The implementation and performance of compressed databases, ACM SIGMOD Record, 2000. vol. 29, no. 3, pp. 55–67.

    Article  Google Scholar 

  36. Aghav, S., Database compression techniques for performance optimization, in Proc. of the 2010 2nd Int. Conf. on Computer Engineering and Technology (ICCET), 2010, pp. 714–717.

    Google Scholar 

  37. Lemke, C., Sattler, K.-U., Faerber, F., Zeier, A., Speeding up queries in column stores: A case for compression, Proc. of the 12th Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK’10), 2010, pp. 117–129.

    Chapter  Google Scholar 

  38. Ramamurthy, R., Dewitt, D., and Su, Q., A case for fractured mirrors, in Proc. of the VLDB Endowment, 2002, vol. 12, no. 2. pp. 89–101.

    Google Scholar 

  39. Khoshafian, S., Copeland, G., Jagodis, T., Boral, H., and Valduriez, P., A query processing strategy for the decomposed storage model, in Proc. of the Third Int. Conf. on Data Engineering, 1987, pp. 636–643.

    Google Scholar 

  40. Bruno, N., Teaching an old elephant new tricks, in Online Proc. of the Fourth Biennial Conf. on Innovative Data Systems Research (CIDR 2009), 2009. http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_2.pdf.

    Google Scholar 

  41. El-Helw, A., Ross, K.A., Bhattacharjee, B., Lang, C.A., and Mihaila, G.A., Column-oriented query processing for row stores, Proc. of the ACM 14th Int. Workshop on Data Warehousing and OLAP (DOLAP’ 11), 2011, pp. 67–74.

    Chapter  Google Scholar 

  42. Larson, P.-A., Clinciu, C., Hanson, E.N., Oks, A., Price, S.L., Rangarajan, S., Surna, A., and Zhou, Q., SQL server column store indexes, in Proc. of the 2011 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD’ 11), 2011, pp. 1177–1184.

    Chapter  Google Scholar 

  43. TPC Benchmark DS–Standard Specification, Transaction Processing Performance Council, 2015. http://www.tpc.org/TPC_Documents_Current_Versions/pdf/tpc-ds_v2.1.0.pdf.

  44. Shapiro, M. and Miller, E., Managing databases with binary large objects, in 16th IEEE Symp. on Mass Storage Systems, 1999, pp. 185–193.

    Google Scholar 

  45. Padmanabhan, S., Malkemus, T., Agarwal, R., and Jhingran A., Block oriented processing of relational database operations in modern computer architectures, in Proc. of the 17th Int. Conf. on Data Engineering, 2001, pp. 567–574.

    Chapter  Google Scholar 

  46. O’Neil, P.E., Chen, X., and O’Neil, E.J., Adjoined dimension column index to improve star schema query performance, in Proc. of the 24th Int. Conf. on Data Engineering (ICDE 2008), 2008, pp. 1409–1411.

    Google Scholar 

  47. O’Neil, P.E., O’Neil, E.J., and Chen, X., The Star Schema Benchmark (SSB), Revision 3, June 5, 2009. http://www.cs.umb.edu/ poneil/StarSchemaB.PDF.

    Google Scholar 

  48. O’Neil, P.E., O’Neil, E.J., Chen, X., and Revilak, S., The star schema benchmark and augmented fact table indexing: performance evaluation and benchmarking, in First TPC Technology Conference (TPCTC 2009), 2009, pp. 237–252.

    Google Scholar 

  49. Garcia-Molina, H., Ullman, J.D., and Widom, J., Database Systems: The Complete Book, Upper Saddle River, NJ: Prentice Hall, 2002.

    Google Scholar 

  50. Ivanova, E. and Sokolinsky, L.B., Join decomposition based on fragmented column indices, Lobachevskii J. Math., 2016, vol. 37, no. 3, pp. 255–260.

    Article  MathSciNet  MATH  Google Scholar 

  51. Ivanova, E.V. and Sokolinsky, Using Intel Xeon Phi Coprocessors for execution of natural join on compressed data, Vychisl. Metody Program: Novye Vychisl. Tekhnol, 2015, vol. 16, no. 4, pp. 534–542.

    Google Scholar 

  52. Ivanova, E.V. and Sokolinsky, L.B., Decomposition of the grouping operation based on distributed column indexes, Nauka YurGU: Materialy 67 nauchnoi konferentsii professorsko-prepodavatel’skogo sostava, aspirantov i sotrudnikov, Sec. Estestvennykh nauk (Proc. of the Conf. of the faculty and postgraduates of Yuzhno-Ural’sk State Unversity, Ser. Natural Sciences), 2015, pp. 15–22.

    Google Scholar 

  53. Ivanova, E.V. and Sokolinsky, L.B., Decomposition of intersection and join operations based on the domain interval fragmented column indexes, Vestn. Yuzhno-Ural’sk. Gos. Univ., Ser. Vychisl. Mat. Inform., 2015, vol. 4, no. 1, pp. 44–56.

    Google Scholar 

  54. Ivanova, E.V. and Sokolinsky, L.B., Parallel decomposition of relational operations based on fragmented column indexes, Vestn. Yuzhno-Ural’sk. Gos. Univ., Ser. Vychisl. Mat. Inform., 2015, vol. 4, no. 4, pp. 80–100.

    Google Scholar 

  55. Deutsch, P. and Gailly, J.-L., ZLIB Compressed Data Format Specification version 3.3. RFC Editor, 1996. https://www.ietf.org/rfc/rfc1950.txt.

    Google Scholar 

  56. Roelofs, G., Gailly, J., and Adler, M., Zlib: A Massively Spiffy Yet Delicately Unobtrusive Compression Library. http://www.zlib.net/.

  57. Deutsch, P., DEFLATE Compressed Data Format Specification version 1.3. RFC Editor, 1996. https:// www.ietf.org/rfc/rfc1951.txt.

    Book  Google Scholar 

  58. Kostenetskiy, P.S. and Safonov, A.Y., SUSU Supercomputer Resources, in Proc. of the 10th Annual Int. Scientific Conf. on Parallel Computing Technologies (PCT 2016), CEUR Workshop Proceedings, Vol. 1576, CEUR-WS 2015, pp. 561–573.

    Google Scholar 

  59. Massively Parallel Supercomputer RSC PetaStream.http://rscgroup.ru/ru/our-solutions/massivno-parallelnyy-superkompyuter-rsc-petastream.

  60. TPC Benchmark H–Standard Specification. Transaction Processing Performance Council, 2014. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf.

  61. Ivanova, E.V. and Sokolinsky, L.B., Columnar database coprocessor for computing cluster system, Vestn. Yuzhno-Ural’sk. Gos. Univ., Ser. Vychisl. Mat. Inform., 2015, vol. 4, no. 4, pp. 5–31.

    Google Scholar 

  62. Gray, J., Sundaresan, P., Englert, S., Baclawski, K.,and Weinberger, P.J., Quickly generating billion-record synthetic databases in Proc. of the 1994 ACM SIGMOD Int. Conf. on Management of Data, 1994, pp. 243–252.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. V. Ivanova.

Additional information

Original Russian Text © E.V. Ivanova, L.B. Sokolinsky, 2017, published in Programmirovanie, 2017, Vol. 43, No. 3.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ivanova, E.V., Sokolinsky, L.B. Parallel processing of very large databases using distributed column indexes. Program Comput Soft 43, 131–144 (2017). https://doi.org/10.1134/S0361768817030069

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768817030069

Navigation