Skip to main content
Log in

Encapsulation of partitioned parallelism into open-source database management systems

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

This paper presents an original approach to parallel processing of very large databases by means of encapsulation of partitioned parallelism into open-source database management systems (DBMSs). The architecture and methods for implementing a parallel DBMS through encapsulation of partitioned parallelism into PostgreSQL DBMS are described. Experimental results that confirm the effectiveness of the proposed approach are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Sokolinsky, L.B., Survey of architectures of parallel database systems, Program. Comput. Software, 2004, vol. 30, no. 6, pp. 337–346.

    Article  MATH  Google Scholar 

  2. Lepikhov, A.V. and Sokolinsky, L.B., Query processing in a DBMS for cluster systems, Program. Comput. Software, 2010, vol. 36, no. 4, pp. 205–215.

    Article  MATH  MathSciNet  Google Scholar 

  3. Page, J., A study of a parallel database machine and its performance the NCR/Teradata DBC/1012, Lect. Notes. Comput. Sci., 1992, vol. 618, pp. 115–137.

    Article  Google Scholar 

  4. Waas, F.M., Beyond conventional data warehousing— Massively parallel data processing with greenplum database, Proc. 2nd Int. Workshop on Business Intelligence for the Real-Time Enterprise (BIRTE) in conjunction with VLDB, Auckland, 2008.

    Google Scholar 

  5. Baru, C.K., Fecteau, G., Goyal, A., et al., An overview of DB2 parallel edition, Proc. ACM SIGMOD Int. Conf. Management of Data, San Jose, 1995, pp. 460–462.

    Google Scholar 

  6. Akal, F., Bohm, K., and Schek, H.-J., OLAP query evaluation in a database cluster: A performance study on intra-query parallelism, Lect. Notes. Comput. Sci., 2002, vol. 2435, pp. 218–231.

    Article  Google Scholar 

  7. Ronström, M. and Oreland, J., Recovery principles in MySQL Cluster 5.1, Proc. 31st Int. Conf. Very Large Data Bases, Trondheim, 2005, pp. 1108–1115.

    Google Scholar 

  8. Pruscino, A., Oracle RAC: Architecture and performance, Proc. ACM SIGMOD Int. Conf. Management of Data, San Diego, 2003, p. 635.

    Google Scholar 

  9. Paes, M., Lima, A.A.B., Valduriez, P., and Mattoso, M., High-performance query processing of a real-world OLAP database with ParGRES, Lect. Notes. Comput. Sci., 2008, vol. 5336, pp. 188–200.

    Article  Google Scholar 

  10. Ngamsuriyaroj, S. and Pornpattana, R., Performance evaluation of TPC-H queries on MySQL Cluster, Proc. 24th IEEE Int. Conf. Advanced Information Networking and Applications Workshops (WAINA), Perth, 2010, pp. 1035–1040.

    Chapter  Google Scholar 

  11. Evdoridis, T. and Tzouramanis, T., A generalized comparison of open source and commercial database management systems, in Database Technologies: Concepts, Methodologies, Tools, and Applications, IGI Global, 2009, pp. 294–308.

    Google Scholar 

  12. Paulson, L.D., Open source databases move into the marketplace, Computer, 2004, vol. 37, no. 7, pp. 13–15.

    Article  Google Scholar 

  13. Gavrish, E.V., Koltakov, A.V., Medvedev, A.A., and Sokolinsky, L.B., Open-source parallel DBMS for cluster computing systems, Vestn. YuUrGU, Ser. Vychisl. Mat. Informatika, 2013, vol. 2, no. 3, pp. 81–91.

    Google Scholar 

  14. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., et al., HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads, Proc. VLDB Endowment, 2009, vol. 2, no. 1, pp. 922–933.

    Article  Google Scholar 

  15. Dean, J. and Ghemawat, S., MapReduce: Simplified data processing on large clusters, Commun. ACM, 2008, vol. 51, no. 1, pp. 107–113.

    Article  Google Scholar 

  16. White, T., Hadoop: The Definitive Guide, O’Reilly Media, 2009.

    Google Scholar 

  17. Sokolinsky, L.B., Organization of parallel query processing in multiprocessor database machines with hierarchical architecture, Program. Comput. Software, 2001, vol. 27, no. 6, pp. 297–308.

    Article  MATH  Google Scholar 

  18. Stonebraker, M. and Kemnitz, G., The POSTGRES: Next-generation database management system, Commun. ACM, 1991, vol. 34, no. 10, pp. 78–92.

    Article  Google Scholar 

  19. Pan, C.S., Development of a parallel DBMS on the basis of PostgreSQL, Proc. 7th Spring Researchers Colloquium on Databases and Information Systems (SYRCo-DIS), 2011, pp. 57–61.

    Google Scholar 

  20. Pan, C.S. and Zymbler, M.L., Taming elephants, or how to embed parallelism into PostgreSQL, Lect. Notes. Comput. Sci., 2013, vol. 8055, pp. 153–164.

    Article  Google Scholar 

  21. Zhou, J., Hash join, Encyclopedia of Database Systems, Liu, L. and Ozsu, M.T., Eds., Springer US, 2009, pp. 1288–1289.

  22. Zhou, J., Nested loop join, Encyclopedia of Database Systems, Liu, L. and Özsu, M.T., Eds., Springer US, 2009, p. 1895.

  23. Zhou, J., Sort-merge join, Encyclopedia of Database Systems, Liu, L. and Ozsu, M.T., Eds., Springer US, 2009, pp. 2673–2674.

  24. Gropp, W., MPI 3 and beyond: Why MPI is successful and what challenges it faces, Lect. Notes. Comput. Sci., 2012, vol. 7490, pp. 1–9.

    Article  Google Scholar 

  25. Moskovskii, A.A., Perminov, M.P., Sokolinsky, L.B., Cherepennikov, V.V., and Shamakina, A.V., Study of performance of the supercomputer family 'SKIF Aurora’ on industrial problems, Vestn. YuUrGU, Ser. Mat. Model. Program., 2010, vol. 211, no. 35, pp. 66–78.

    Google Scholar 

  26. Sokolinsky, L.B., Parallel’nye sistemy baz dannykh (Parallel Database Systems), Moscow: Mosk. Gos. Univ., 2013.

    Google Scholar 

  27. Nambiar, R.O., Poess, M., Masland, A., et al., TPC benchmark roadmap 2012, Lect. Notes. Comput. Sci., 2013, vol. 7755, pp. 1–20.

    Article  Google Scholar 

  28. Kostenetskii, P.S., Lepikhov, A.V., and Sokolinsky, L.B., Technologies of parallel database systems for hierarchical multiprocessor environments, Autom. Remote Control, 2007, vol. 68, no. 5, pp. 847–859.

    Article  MATH  MathSciNet  Google Scholar 

  29. Gubin, M.V. and Sokolinsky, L.B., About communication cost estimation for processing of partitioned relation with uniform distribution, Vestn. YuUrGU, Ser. Vychisl. Mat. Informatika, 2013, vol. 2, no. 1, pp. 33–43.

    Google Scholar 

  30. Sokolinsky, L.B., Effective buffer management replacement algorithm for parallel shared-nothing database system, Vychisl. Metody Program., 2002, vol. 3, no. 1, pp. 113–130.

    Google Scholar 

  31. Kostenetskii, P.S. and Sokolinsky, L.B., Simulation of hierarchical multiprocessor database systems, Program. Comput. Software, 2013, vol. 39, no. 1, pp. 10–24.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. S. Pan.

Additional information

Original Russian Text © C.S. Pan, M.L. Zymbler, 2015, published in Programmirovanie, 2015, Vol. 41, No. 6.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, C.S., Zymbler, M.L. Encapsulation of partitioned parallelism into open-source database management systems. Program Comput Soft 41, 350–360 (2015). https://doi.org/10.1134/S0361768815060067

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768815060067

Keywords

Navigation