Abstract
The multi-core trend in CPUs and general purpose graphics processing units (GPUs) offers new opportunities for the database community. The increase of cores at exponential rates is likely to affect virtually every server and client in the coming decade, and presents database management systems with a huge, compelling disruption that will radically change how processing is done. This paper presents a new parallel indexing data structure for answering queries that takes full advantage of the increasing thread-level parallelism emerging in multi-core architectures. In our approach, our Data Parallel Bin-based Index Strategy (DP-BIS) first bins the base data, and then partitions and stores the values in each bin as a separate, bin-based data cluster. In answering a query, the procedures for examining the bin numbers and the bin-based data clusters offer the maximum possible level of concurrency; each record is evaluated by a single thread and all threads are processed simultaneously in parallel.
We implement and demonstrate the effectiveness of DP-BIS on two multi-core architectures: a multi-core CPU and a GPU. The concurrency afforded by DP-BIS allows us to fully utilize the thread-level parallelism provided by each architecture–for example, our GPU-based DP-BIS implementation simultaneously evaluates over 12,000 records with an equivalent number of concurrently executing threads. In comparing DP-BIS’s performance across these architectures, we show that the GPU-based DP-BIS implementation requires significantly less computation time to answer a query than the CPU-based implementation. We also demonstrate in our analysis that DP-BIS provides better overall performance than the commonly utilized CPU and GPU-based projection index. Finally, due to data encoding, we show that DP-BIS accesses significantly smaller amounts of data than index strategies that operate solely on a column’s base data; this smaller data footprint is critical for parallel processors that possess limited memory resources (e.g. GPUs).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Becla, J., Lim, K.T.: Report from the workshop on extremely large databases (2007)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)
Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D., Heber, G.: Scientific data management in the coming decade. CTWatch Quarterly (2005)
Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, Electrical Engineering and Computer Sciences, University of California at Berkeley (2006)
DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35, 85–98 (1992)
Raman, R., Vishkin, U.: Parallel algorithms for database operations and a database operation for parallel algorithms. In: Proc. International Parallel Processing Symposium (1995)
Litwin, W., Neimat, M.A., Schneider, D.A.: LH*—a scalable, distributed data structure. ACM Trans. Database Syst. 21, 480–525 (1996)
Norman, M.G., Zurek, T., Thanisch, P.: Much ado about shared-nothing. SIGMOD Rec. 25, 16–21 (1996)
Bamha, M., Hains, G.: Frequency-adaptive join for shared nothing machines. Parallel and Distributed Computing Practices 2 (1999)
Rahayu, J.W., Taniar, D.: Parallel selection query processing involving index in parallel database systems. In: ISPAN 2002, p. 0309 (2002)
Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M.C., Manocha, D.: Fast computation of database operations using graphics processors. In: Proc. of SIGMOD, pp. 215–226 (2004)
Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. In: Proc. of SIGMOD, pp. 325–336 (2006)
Fang, R., He, B., Lu, M., Yang, K., Govindaraju, N.K., Luo, Q., Sander, P.V.: GPUQP: query co-processing using graphics processors. In: Proc. SIGMOD, pp. 1061–1063 (2007)
He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: Proc. SIGMOD, pp. 511–524 (2008)
Sun, C., Agrawal, D., Abbadi, A.E.: Hardware acceleration for spatial selections and joins. In: Proc. of SIGMOD, pp. 455–466 (2003)
O’Neil, P.E., Quass, D.: Improved query performance with variant indexes. In: Proc. of SIGMOD, pp. 38–49 (1997)
Comer, D.: The ubiquitous B-tree. Computing Surveys 11, 121–137 (1979)
Gaede, V., Günther, O.: Multidimension access methods. ACM Computing Surveys 30, 170–231 (1998)
Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Trans. on Database Systems 31, 1–38 (2006)
Stockinger, K., Wu, K., Shoshani, A.: Evaluation strategies for bitmap indices with binning. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 120–129. Springer, Heidelberg (2004)
Antoshenkov, G.: Byte-aligned bitmap compression. In: Proc. of the Conference on Data Compression, p. 476 (1995)
Antoshenkov, G., Ziauddin, M.: Query processing and optimization in ORACLE RDB. In: Proc. of VLDB, pp. 229–237 (1996)
Wu, K., Otoo, E., Shoshani, A.: On the performance of bitmap indices for high cardinality attributes. In: Proc. of VLDB, pp. 24–35 (2004)
Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-Pipelining Query Execution. In: Proc. Conference on Innovative Data Systems Research, Asilomar, CA, USA, pp. 225–237 (2005)
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-store: a column-oriented dbms. In: Proc. of VLDB, pp. 553–564 (2005)
Gray, J., Liu, D.T., Nieto-Santisteban, M.A., Szalay, A.S., DeWitt, D.J., Heber, G.: Scientific data management in the coming decade. SIGMOD Record 34, 34–41 (2005)
Zhang, R., Ooi, B.C., Tan, K.L.: Making the pyramid technique robust to query types and workloads. In: Proc. of ICDE, p. 313 (2004)
O’Neil, P.E.: Model 204 architecture and performance. In: Gawlick, D., Reuter, A., Haynie, M. (eds.) HPTS 1987. LNCS, vol. 359, pp. 40–59. Springer, Heidelberg (1989)
Amer-Yahia, S., Johnson, T.: Optimizing queries on compressed bitmaps. In: Proc. of VLDB, pp. 329–338 (2000)
Wu, K., Stockinger, K., Shoshani, A.: Breaking the curse of cardinality on bitmap indexes. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 348–365. Springer, Heidelberg (2008)
Sinha, R.R., Winslett, M.: Multi-resolution bitmap indexes for scientific data. ACM Trans. Database Syst. 32, 16 (2007)
Glatter, M., Huang, J., Gao, J., Mollenhour, C.: Scalable data servers for large multivariate volume visualization. Trans. on Visualization and Computer Graphics 12, 1291–1298 (2006)
McCormick, P., Inman, J., Ahrens, J., Hansen, C., Roth, G.: Scout: A hardware-accelerated system for quantitatively driven visualization and analysis. In: Proc. of IEEE Visualization, pp. 171–178 (2004)
He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: Proc. of the conference on Supercomputing, pp. 1–12 (2007)
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A.E., Purcell, T.: A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 26, 80–113 (2007)
Lieberman, M.D., Sankaranarayanan, J., Samet, H.: A fast similarity join algorithm using graphics processing units. In: Proc. of ICDE, pp. 1111–1120 (2008)
NVIDIA Corporation: NVIDIA CUDA compute unified device architecture programming guide (2007), http://developer.nvidia.com/cuda
Bethel, E.W., Campbell, S., Dart, E., Stockinger, K., Wu, K.: Accelerating network traffic analysis using query-driven visualization. In: Proc. of the Symposium on Visual Analytics Science and Technology, pp. 115–122 (2006)
Stockinger, K., Shalf, J., Wu, K., Bethel, E.W.: Query-driven visualization of large data sets. In: Proc. of IEEE Visualization, pp. 167–174 (2005)
Gosink, L., Anderson, J.C., Bethel, E.W., Joy, K.I.: Variable interactions in query driven visualization. IEEE Trans. on Visualization and Computer Graphics. 13, 1400–1407 (2007)
Nichols, B., Buttlar, D., Farrell, J.P.: Pthreads Programming. O’Reilly, Sebastopol (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gosink, L.J., Wu, K., Bethel, E.W., Owens, J.D., Joy, K.I. (2009). Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-02279-1_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02278-4
Online ISBN: 978-3-642-02279-1
eBook Packages: Computer ScienceComputer Science (R0)