Abstract
This paper presents an original approach to parallel processing of very large databases by means of encapsulation of partitioned parallelism into open-source database management systems (DBMSs). The architecture and methods for implementing a parallel DBMS through encapsulation of partitioned parallelism into PostgreSQL DBMS are described. Experimental results that confirm the effectiveness of the proposed approach are presented.
Similar content being viewed by others
References
Sokolinsky, L.B., Survey of architectures of parallel database systems, Program. Comput. Software, 2004, vol. 30, no. 6, pp. 337–346.
Lepikhov, A.V. and Sokolinsky, L.B., Query processing in a DBMS for cluster systems, Program. Comput. Software, 2010, vol. 36, no. 4, pp. 205–215.
Page, J., A study of a parallel database machine and its performance the NCR/Teradata DBC/1012, Lect. Notes. Comput. Sci., 1992, vol. 618, pp. 115–137.
Waas, F.M., Beyond conventional data warehousing— Massively parallel data processing with greenplum database, Proc. 2nd Int. Workshop on Business Intelligence for the Real-Time Enterprise (BIRTE) in conjunction with VLDB, Auckland, 2008.
Baru, C.K., Fecteau, G., Goyal, A., et al., An overview of DB2 parallel edition, Proc. ACM SIGMOD Int. Conf. Management of Data, San Jose, 1995, pp. 460–462.
Akal, F., Bohm, K., and Schek, H.-J., OLAP query evaluation in a database cluster: A performance study on intra-query parallelism, Lect. Notes. Comput. Sci., 2002, vol. 2435, pp. 218–231.
Ronström, M. and Oreland, J., Recovery principles in MySQL Cluster 5.1, Proc. 31st Int. Conf. Very Large Data Bases, Trondheim, 2005, pp. 1108–1115.
Pruscino, A., Oracle RAC: Architecture and performance, Proc. ACM SIGMOD Int. Conf. Management of Data, San Diego, 2003, p. 635.
Paes, M., Lima, A.A.B., Valduriez, P., and Mattoso, M., High-performance query processing of a real-world OLAP database with ParGRES, Lect. Notes. Comput. Sci., 2008, vol. 5336, pp. 188–200.
Ngamsuriyaroj, S. and Pornpattana, R., Performance evaluation of TPC-H queries on MySQL Cluster, Proc. 24th IEEE Int. Conf. Advanced Information Networking and Applications Workshops (WAINA), Perth, 2010, pp. 1035–1040.
Evdoridis, T. and Tzouramanis, T., A generalized comparison of open source and commercial database management systems, in Database Technologies: Concepts, Methodologies, Tools, and Applications, IGI Global, 2009, pp. 294–308.
Paulson, L.D., Open source databases move into the marketplace, Computer, 2004, vol. 37, no. 7, pp. 13–15.
Gavrish, E.V., Koltakov, A.V., Medvedev, A.A., and Sokolinsky, L.B., Open-source parallel DBMS for cluster computing systems, Vestn. YuUrGU, Ser. Vychisl. Mat. Informatika, 2013, vol. 2, no. 3, pp. 81–91.
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., et al., HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads, Proc. VLDB Endowment, 2009, vol. 2, no. 1, pp. 922–933.
Dean, J. and Ghemawat, S., MapReduce: Simplified data processing on large clusters, Commun. ACM, 2008, vol. 51, no. 1, pp. 107–113.
White, T., Hadoop: The Definitive Guide, O’Reilly Media, 2009.
Sokolinsky, L.B., Organization of parallel query processing in multiprocessor database machines with hierarchical architecture, Program. Comput. Software, 2001, vol. 27, no. 6, pp. 297–308.
Stonebraker, M. and Kemnitz, G., The POSTGRES: Next-generation database management system, Commun. ACM, 1991, vol. 34, no. 10, pp. 78–92.
Pan, C.S., Development of a parallel DBMS on the basis of PostgreSQL, Proc. 7th Spring Researchers Colloquium on Databases and Information Systems (SYRCo-DIS), 2011, pp. 57–61.
Pan, C.S. and Zymbler, M.L., Taming elephants, or how to embed parallelism into PostgreSQL, Lect. Notes. Comput. Sci., 2013, vol. 8055, pp. 153–164.
Zhou, J., Hash join, Encyclopedia of Database Systems, Liu, L. and Ozsu, M.T., Eds., Springer US, 2009, pp. 1288–1289.
Zhou, J., Nested loop join, Encyclopedia of Database Systems, Liu, L. and Özsu, M.T., Eds., Springer US, 2009, p. 1895.
Zhou, J., Sort-merge join, Encyclopedia of Database Systems, Liu, L. and Ozsu, M.T., Eds., Springer US, 2009, pp. 2673–2674.
Gropp, W., MPI 3 and beyond: Why MPI is successful and what challenges it faces, Lect. Notes. Comput. Sci., 2012, vol. 7490, pp. 1–9.
Moskovskii, A.A., Perminov, M.P., Sokolinsky, L.B., Cherepennikov, V.V., and Shamakina, A.V., Study of performance of the supercomputer family 'SKIF Aurora’ on industrial problems, Vestn. YuUrGU, Ser. Mat. Model. Program., 2010, vol. 211, no. 35, pp. 66–78.
Sokolinsky, L.B., Parallel’nye sistemy baz dannykh (Parallel Database Systems), Moscow: Mosk. Gos. Univ., 2013.
Nambiar, R.O., Poess, M., Masland, A., et al., TPC benchmark roadmap 2012, Lect. Notes. Comput. Sci., 2013, vol. 7755, pp. 1–20.
Kostenetskii, P.S., Lepikhov, A.V., and Sokolinsky, L.B., Technologies of parallel database systems for hierarchical multiprocessor environments, Autom. Remote Control, 2007, vol. 68, no. 5, pp. 847–859.
Gubin, M.V. and Sokolinsky, L.B., About communication cost estimation for processing of partitioned relation with uniform distribution, Vestn. YuUrGU, Ser. Vychisl. Mat. Informatika, 2013, vol. 2, no. 1, pp. 33–43.
Sokolinsky, L.B., Effective buffer management replacement algorithm for parallel shared-nothing database system, Vychisl. Metody Program., 2002, vol. 3, no. 1, pp. 113–130.
Kostenetskii, P.S. and Sokolinsky, L.B., Simulation of hierarchical multiprocessor database systems, Program. Comput. Software, 2013, vol. 39, no. 1, pp. 10–24.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © C.S. Pan, M.L. Zymbler, 2015, published in Programmirovanie, 2015, Vol. 41, No. 6.
Rights and permissions
About this article
Cite this article
Pan, C.S., Zymbler, M.L. Encapsulation of partitioned parallelism into open-source database management systems. Program Comput Soft 41, 350–360 (2015). https://doi.org/10.1134/S0361768815060067
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0361768815060067