skip to main content
10.1145/2016604.2016640acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Multi- and many-core data mining with adaptive sparse grids

Published:03 May 2011Publication History

ABSTRACT

Gaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is thus well-suited for huge amounts of data. Due to the recursive nature of sparse grid algorithms, they impose a challenge for the parallelization on modern hardware architectures such as accelerators. In this paper, we present the parallelization on several current task- and data-parallel platforms, covering multi-core CPUs with vector units, GPUs, and hybrid systems. Furthermore, we analyze the suitability of parallel programming languages for the implementation.

Considering hardware, we restrict ourselves to the x86 platform with SSE and AVX vector extensions and to NVIDIA's Fermi architecture for GPUs. We consider both multi-core CPU and GPU architectures independently, as well as hybrid systems with up to 12 cores and 2 Fermi GPUs. With respect to parallel programming, we examine both the open standard OpenCL and Intel Array Building Blocks, a recently introduced high-level programming approach. As the baseline, we use the best results obtained with classically parallelized sparse grid algorithms and their OpenMP-parallelized intrinsics counterpart (SSE and AVX instructions), reporting both single and double precision measurements. The huge data sets we use are a real-life dataset stemming from astrophysics and an artificial one which exhibits challenging properties. In all settings, we achieve excellent results, obtaining speedups of more than 60 using single precision on a hybrid system.

References

  1. J. K. Adelman-McCarthy et al. The fifth data release of the Sloan Digital Sky Survey. ApJS, 172:634--644, Oct. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  2. D. M. Allen. The relationship between variable selection and data augmentation and a method for prediction. Technometrics, 16(1):125--127, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  3. H.-J. Bungartz and M. Griebel. Sparse grids. Acta Numerica, 13:147--269, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  4. H.-J. Bungartz, A. Heinecke, D. Pflüger, and S. Schraufstetter. Parallelizing a Black-Scholes solver based on finite elements and sparse grids. In IEEE Proc. of IPDPS, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  5. T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. In Advances in Computational Mathematics, pages 1--50. MIT Press, 2000.Google ScholarGoogle Scholar
  6. J. Garcke, M. Griebel, and M. Thess. Data mining with sparse grids. Computing, 67(3):225--253, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Ghuloum et al. Array Building Blocks: A flexible parallel programming model for multi-core and many-core architectures. Online, 2010.Google ScholarGoogle Scholar
  8. Intel. Intel Turbo Boost Technology in Intel Core microarchitecture (Nehalem) based processors. Online, 2008.Google ScholarGoogle Scholar
  9. V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In ISCA, pages 451--460, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Pflüger. Spatially Adaptive Sparse Grids for High-Dimensional Problems. Verlag Dr. Hut, München, 2010.Google ScholarGoogle Scholar
  11. K. Skaugen. Petascale to Exascale: Extending Intel's HPC commitment, Keynote ISC 2010, Hamburg. In International Supercomputing Conference (ISC), 2010.Google ScholarGoogle Scholar
  12. P. Trancoso and A. Artemiou. Exploring the GPU to accelerate DSS query execution. In Proc. of the 2008 ACM Conference on Computing Frontiers, poster session, pages 109--110. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi- and many-core data mining with adaptive sparse grids

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CF '11: Proceedings of the 8th ACM International Conference on Computing Frontiers
      May 2011
      268 pages
      ISBN:9781450306980
      DOI:10.1145/2016604

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 May 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate240of680submissions,35%

      Upcoming Conference

      CF '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader