ABSTRACT
Gaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is thus well-suited for huge amounts of data. Due to the recursive nature of sparse grid algorithms, they impose a challenge for the parallelization on modern hardware architectures such as accelerators. In this paper, we present the parallelization on several current task- and data-parallel platforms, covering multi-core CPUs with vector units, GPUs, and hybrid systems. Furthermore, we analyze the suitability of parallel programming languages for the implementation.
Considering hardware, we restrict ourselves to the x86 platform with SSE and AVX vector extensions and to NVIDIA's Fermi architecture for GPUs. We consider both multi-core CPU and GPU architectures independently, as well as hybrid systems with up to 12 cores and 2 Fermi GPUs. With respect to parallel programming, we examine both the open standard OpenCL and Intel Array Building Blocks, a recently introduced high-level programming approach. As the baseline, we use the best results obtained with classically parallelized sparse grid algorithms and their OpenMP-parallelized intrinsics counterpart (SSE and AVX instructions), reporting both single and double precision measurements. The huge data sets we use are a real-life dataset stemming from astrophysics and an artificial one which exhibits challenging properties. In all settings, we achieve excellent results, obtaining speedups of more than 60 using single precision on a hybrid system.
- J. K. Adelman-McCarthy et al. The fifth data release of the Sloan Digital Sky Survey. ApJS, 172:634--644, Oct. 2007.Google ScholarCross Ref
- D. M. Allen. The relationship between variable selection and data augmentation and a method for prediction. Technometrics, 16(1):125--127, 1974.Google ScholarCross Ref
- H.-J. Bungartz and M. Griebel. Sparse grids. Acta Numerica, 13:147--269, 2004.Google ScholarCross Ref
- H.-J. Bungartz, A. Heinecke, D. Pflüger, and S. Schraufstetter. Parallelizing a Black-Scholes solver based on finite elements and sparse grids. In IEEE Proc. of IPDPS, 2010.Google ScholarCross Ref
- T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. In Advances in Computational Mathematics, pages 1--50. MIT Press, 2000.Google Scholar
- J. Garcke, M. Griebel, and M. Thess. Data mining with sparse grids. Computing, 67(3):225--253, 2001. Google ScholarDigital Library
- A. Ghuloum et al. Array Building Blocks: A flexible parallel programming model for multi-core and many-core architectures. Online, 2010.Google Scholar
- Intel. Intel Turbo Boost Technology in Intel Core microarchitecture (Nehalem) based processors. Online, 2008.Google Scholar
- V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In ISCA, pages 451--460, 2010. Google ScholarDigital Library
- D. Pflüger. Spatially Adaptive Sparse Grids for High-Dimensional Problems. Verlag Dr. Hut, München, 2010.Google Scholar
- K. Skaugen. Petascale to Exascale: Extending Intel's HPC commitment, Keynote ISC 2010, Hamburg. In International Supercomputing Conference (ISC), 2010.Google Scholar
- P. Trancoso and A. Artemiou. Exploring the GPU to accelerate DSS query execution. In Proc. of the 2008 ACM Conference on Computing Frontiers, poster session, pages 109--110. ACM, 2008. Google ScholarDigital Library
Index Terms
- Multi- and many-core data mining with adaptive sparse grids
Recommendations
Many-core architectures boost the pricing of basket options on adaptive sparse grids
WHPCF '13: Proceedings of the 6th Workshop on High Performance Computational FinanceIn this work, we present a highly scalable approach for numerically solving the Black-Scholes PDE in order to price basket options. Our method is based on a spatially adaptive sparse-grid discretization with finite elements. Since we cannot unleash the ...
Extending a highly parallel data mining algorithm to the intel ® many integrated core architecture
Euro-Par'11: Proceedings of the 2011 international conference on Parallel Processing - Volume 2Extracting knowledge from vast datasets is a major challenge in data-driven applications, such as classification and regression, which are mostly compute bound. In this paper, we extend our SG++ algorithm to the Intel® Many Integrated Core Architecture (...
From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture
Comparing the architectures and performance levels of an Nvidia Fermi accelerator with an Intel MIC Architecture coprocessor demonstrates the benefit of the coprocessor for bringing highly parallel applications into, or even beyond, GPGPU performance ...
Comments