Abstract
OLAP (On-Line Analytical Processing) is an approach to efficiently evaluate multidimensional data for business intelligence applications. OLAP contributes to business decision-making by identifying, extracting, and analyzing multidimensional data. The fundamental structure of OLAP is a data cube that enables users to interactively explore the distinct data dimensions. Processing depends on the complexity of queries, dimensionality, and growing size of the data cube. As data volumes keep on increasing and the demands by business users also increase, higher processing speed than ever is needed, as faster processing means faster decisions and more profit to industry.
In this paper, we are proposing an Adaptive Hybrid OLAP Architecture that takes advantage of heterogeneous systems with GPUs and CPUs and leverages their different memory subsystems characteristics to minimize response time. Thus, our approach (a) exploits both types of hardware rather than using the CPU only as a frontend for GPU; (b) uses two different data formats (multidimensional cube and relational cube) to match the GPU and CPU memory access patterns and diverts queries adaptively to the best resource for solving the problem at hand; (c) exploits data locality of multidimensional OLAP on NUMA multicore systems through intelligent thread placement; and (d) guides its adaptation and choices by an architectural model that captures the memory access patterns and the underlying data characteristics.
Results show an increase in performance by roughly four folds over the best known related approach. There is also the important economical factor. The proposed hybrid system costs only 10 % more than same system without GPU. With this small extra cost, the added GPU increases query processing by almost 2 times.
















Similar content being viewed by others
References
Aho, V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6) (1975)
Beyer, K., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In: Proceedings of the 1999 ACM SIGMOD Conference, pp. 359–370 (1999)
Braun, T., Siegel, H., Beck, N., Boloni, L., Maheswaran, M., Reuther, A., Robertson, J., Theys, M., Yao, B., Hensgen, D., Freund, R.: A comparison study of static mapping heuristics for a class of meta-tasks on heterogeneous computing systems. In: Proceedings of the Heterogeneous Computing Workshop, April 1999, pp. 15–29 (1999)
Brodal, G.S., Fagerberg, R.: Cache-oblivious string dictionaries. In: SODA’06 Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm (2006)
Dick, R.P., Jha, N.K.: MOGAC: a multiobjective genetic algorithm for hardware/software co-synthesis of distributed embedded systems. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 17(10), 920–935 (1998)
Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors. In: SIGMOD 2004: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, Paris, France, pp. 215–226 (2004)
Gray, J., Bosworth, A., Layman, A., Prahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-total. Microsoft technical report, MSR-TR-95-22 (1995)
Junwei, H., Wolf, W.: Process partitioning for distributed embedded systems. In: Proceedings of Fourth International Workshop on Hardware/Software Co-Design, 1996. (Codes/CASHE’96), Mar. 1996, pp. 70–76 (1996)
Kaczmarski, K.: Comparing GPU and CPU in OLAP cube creation. In: Conference on Current Trends in Theory and Practice of Informatics. SOFSEM 2011, Novy Smokovec, Slovakia, pp. 308–319 (2011)
Kimura, H., Coffrin, C., Rasin, A., Zdonik, S.B.: Optimizing index deployment order for evolving OLAP. In: EDBT’12, Berlin, Germany (2012)
Lauer, T., Datta, A., Khadikov, Z., Anselm, C.: Exploring graphics processing units as parallel coprocessors for online aggregation. In: Proceedings of the ACM 13th International Workshop on Data Warehousing and OLAP, Toronto, Ontario (2010)
Liang, W., Orlowska, M.: Computing multidimensional aggregates in parallel. In: Proceedings of Parallel and Distributed System, Tainan, Taiwan, pp. 92–99 (1998)
Malik, M., Riha, L., Shea, C., El-Ghazawi, T.: Task scheduling for GPU accelerated hybrid OLAP systems with multi-core support and text-to-integer translation. In: Proc. International Workshop on High Performance Data Intensive Computing (HPDIC2012), held in conjunction with IPDPS, May 2012
Nvidia: GeForce GTX 580 Specification: http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-580/specifications
Othayoth, R., Poess, M.: The making of TPC-DS. In: 32nd International Conference on Very Large Data Bases (VLDB), September 2006
Poess, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: International Conference on Very Large Data Bases (VLDB), September 2007
Prakash, S., Parker, C.: A design method for optimal synthesis of application-specific heterogeneous multiprocessor systems. In: Proceedings on Heterogeneous Processing, Beverly Hills, California (1992)
Riha, L., Shea, C., Malik, M., El-Ghazawi, T.: Task scheduling for GPU accelerated OLAP systems. In: Proc. of CASCON 2011, Toronto, Canada (2011)
Sarawagi, S., Stonebraker, M.: Efficient organization of large multidimensional arrays. In: Proceedings of the Tenth International Conference on Data Engineering, Washington, DC, pp. 328–336 (1994)
Sarawagi, S., Agrawal, R., Gupta, A.: On computing the data cube. Technical report RJ10026, IBM Almaden Research Center, San Jose, California (1996)
Scarpazza, D.P., Villa, O., Petrini, F.: In: CF’08 Proceedings of the 5th Conference on Computing Frontiers (2008)
Siegel, H., Ali, S.: Techniques for mapping tasks to machines in heterogeneous computing systems. J. Syst. Archit. 46, 627–639 (2000)
Silicon Mechanics: Rackform iServ R350.v2 (Dual Socket Server with GPU accelerator), http://www.siliconmechanics.com/i27076/GPU-1U-Server.php
Tam, Y.J.: Datacube: its implementation and application in OLAP mining. Theses submitted in Simon Fraser University, September 1998
Tesla C2050/C2070 GPU Computing Processor. NVIDIA, Santa Clara
Whitepaper: NVIDIA’s Next Generation CUDA Compute Architecture: Fermi. NVIDIAFermiComputeArchitectureWhitepaper.pdf, NVIDIA Corporation (2009)
Zhao, A.Y., Deshpande, P.M., Naughton, J.F.: An array-based algorithm for simultaneous multi-dimensional aggregates. In: Proceedings of the 1997 ACM-SIGMOD Conference, Tucson, Arizona, pp. 159–170 (1997)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Riha, L., Malik, M. & El-Ghazawi, T. An Adaptive Hybrid OLAP Architecture with optimized memory access patterns. Cluster Comput 16, 663–677 (2013). https://doi.org/10.1007/s10586-012-0237-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-012-0237-4