research-article

Fast multipole method on GPU: tackling 3-D capacitance extraction on massively parallel SIMD platforms

Authors:
Xueqian Zhao

Michigan Technological University, Houghton, MI

Michigan Technological University, Houghton, MI
View Profile

,
Zhuo Feng

Michigan Technological University, Houghton, MI

Michigan Technological University, Houghton, MI
View Profile

DAC '11: Proceedings of the 48th Design Automation ConferenceJune 2011Pages 558–563https://doi.org/10.1145/2024724.2024853

Published:05 June 2011Publication History

DAC '11: Proceedings of the 48th Design Automation Conference

Pages 558–563

ABSTRACT

To facilitate full chip capacitance extraction, field solvers are typically deployed for characterizing capacitance libraries for various interconnect structures and configurations. In the past decades, various algorithms for accelerating boundary element methods (BEM) have been developed to improve the efficiency of field solvers for capacitance extraction. This paper presents the first massively parallel capacitance extraction algorithm FMMGpu that accelerates the well-known fast multipole methods (FMM) on modern Graphics Processing Units (GPUs). We propose GPU-friendly data structures and SIMD parallel algorithm flows to facilitate the FMM-based 3-D capacitance extraction on GPU. Effective GPU performance modeling methods are also proposed to properly balance the workload of each critical kernel in our FMMGpu implementation, by taking advantage of the latest Fermi GPU's concurrent kernel executions on streaming multiprocessors (SMs). Our experimental results show that FMMGpu brings 22X to 30X speedups in capacitance extractions for various test cases. We also show that even for small test cases that may not well utilize GPU's hardware resources, the proposed cube clustering and workload balancing techniques can bring 20% to 60% extra performance improvements.

References

K. Nabors and J. White. FastCap: a multipole accelerated 3-D capacitance extraction program. IEEE Trans. on Computer-Aided Design, 10(11):1447--1459, Nov. 1991.Google ScholarDigital Library
J. Phillips and J. White. A precorrected-FFT method for electrostatic analysis of complicated 3-D structures. IEEE Trans. on Computer-Aided Design, 16(10):1059--1072, Oct. 1997. Google ScholarDigital Library
W. Shi, J. Liu, N. Kakani, and T. Yu. A fast hierarchical algorithm for 3-D capacitance extraction. In IEEE/ACM DAC, pages 212--217, June 1998. Google ScholarDigital Library
F. Gong, H. Yu, and L. He. Picap: A parallel and incremental capacitance extraction considering stochastic process variation. In IEEE/ACM DAC, pages 764--769, Jul. 2009. Google ScholarDigital Library
R. Iverson and Y. Le Coz. A Stochastic Algorithm for High Speed capacitance Extraction in Integrated Circuits. Solid-State Electronics, 35(7):1005--1012, 1992.Google ScholarCross Ref
T. El-Moselhy, I. Elfadel, and L. Daniel. A hierarchical floating random walk algorithm for fabric-aware 3D capacitance extraction. In IEEE/ACM ICCAD, pages 752--758, 2009. Google ScholarDigital Library
NVIDIA Corporation. Fermi compute architecture white paper. {Online}. Available: http://www.nvidia.com/object/fermi_architecture.html, 2010.Google Scholar
N. Gumerov and R. Duraiswami. Fast multipole methods on graphics processors. J. Comput. Phys., 227(18):8290--8313, 2008. Google ScholarDigital Library
T. Hamada, T. Narumi, R. Yokota, K. Yasuoka, K. Nitadori, and M. Taiji. 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. In SC '09, pages 1--12, 2009. Google ScholarDigital Library
K. Nabors, S. Kim, and J. White. Fast capacitance extraction of general three-dimensional structures. IEEE Trans. on Microwave Theory and Techniques, 40(7):1496--1506, Jul. 1992.Google ScholarCross Ref
L. Greengard and V. Rokhlin. A fast algorithm for particle simulations. J. Comput. Phys., 73(2):325--348, 1987. Google ScholarDigital Library
A. Appel. An efficient program for many-body simulation. SIAM Journal on Scientific and Statistical Computing, 6(1):85--103, 1985.Google ScholarDigital Library
NVIDIA Corporation. NVIDIA CUDA C programming guide. {Online}. Available: http://developer.nvidia.com/object/gpucomputing.html, 2010.Google Scholar

Index Terms

Fast multipole method on GPU: tackling 3-D capacitance extraction on massively parallel SIMD platforms
1. Hardware
  1. Hardware validation
    1. Functional verification
      1. Simulation and emulation

Recommendations

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More
Optimizing linpack benchmark on GPU-accelerated petascale supercomputer
Special issue on Community Analysis and Information Recommendation

In this paper we present the programming of the Linpack benchmark on TianHe-1 system, the first petascale supercomputer system of China, and the largest GPU-accelerated heterogeneous system ever attempted before. A hybrid programming model consisting of ...
Read More
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights
- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
Abstract
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DAC '11: Proceedings of the 48th Design Automation Conference
June 2011
1055 pages
ISBN:9781450306362
DOI:10.1145/2024724
General Chair:
Leon Stok
IBM Corp., Hopewell Jct., NY
,
Program Chairs:
Nikil Dutt
Univ. of California, Irvine, CA
,
Soha Hassoun
Tufts Univ., Medford, MA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU
capacitance extraction
parallel fast multipole method
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 196
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast multipole method on GPU: tackling 3-D capacitance extraction on massively parallel SIMD platforms

DAC '11: Proceedings of the 48th Design Automation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Optimizing linpack benchmark on GPU-accelerated petascale supercomputer

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers