GPU-based Acceleration of System-level Design Tasks

Bordoloi, Unmesh D.; Chakraborty, Samarjit

doi:10.1007/s10766-009-0125-6

GPU-based Acceleration of System-level Design Tasks

Published: 16 January 2010

Volume 38, pages 225–253, (2010)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Unmesh D. Bordoloi¹ &
Samarjit Chakraborty²

149 Accesses
5 Citations
Explore all metrics

Abstract

Many system-level design tasks (e.g., high-level timing analysis, hardware/software partitioning and design space exploration) involve computational kernels that are intractable (usually NP-hard). As a result, they involve high running times even for mid-sized problems. In this paper we explore the possibility of using commodity graphics processing units (GPUs) to accelerate such tasks that commonly arise in the electronic design automation (EDA) domain. We demonstrate this idea via two detailed case studies. The first explores the possibility of using GPUs to speedup standard schedulability analysis problems. The second proposes a GPU-based engine for a general hardware/software design space exploration problem. Not only do these problems commonly arise in the embedded systems domain, their computational kernels turn out to be variants of a combinatorial optimization problem—viz., the knapsack problem—that lies at the heart of several EDA applications. Experimental results show that our GPU-based implementations offer very attractive speedups for the computational kernels (up to 100×), and speedups of up to 17× for the full problem. In contrast to ASIC/FPGA-based accelerators—given that even low-end desktop and notebook computers are now equipped with GPUs—our solution involves no extra hardware cost. Although recent research has shown the benefits of using GPUs for a variety of non-graphics applications (e.g., in databases and bioinformatics), harnessing the parallelism of GPUs to accelerate problems from the EDA domain has not been sufficiently explored so far. We believe that our results and the generality of the core problem that we address will motivate researchers from this community to explore the possibility of using GPUs for a wider variety of problems from the EDA domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abramovici, M., de Sousa, J.T., Saab, D.: A massively-parallel easily-scalable satisfiability solver using reconfigurable hardware. In: Proceedings of 36th Design Automation Conference (DAC), pp. 684–690. ACM Press (1999)
Agarwal, P.K., Krishnan, S., Mustafa, N.H., Venkatasubramanian, S.: Streaming geometric optimization using graphics hardware. In: Proceedings of 11th European Symposium on Algorithms (ESA), Lecture Notes in Computer Science 2832, pp. 544–555. Springer (2003)
Ailamaki, A., Govindaraju, N.K., Harizopoulos, S., Manocha, D.: Query co-processing on commodity processors. In: Proceedings of 32nd International Conference on Very Large Data Bases (VLDB), pp. 1267–1267. VLDB Endowment (2006)
Albers, K., Slomka, F.: An event stream driven approximation for the analysis of real-time systems. In: Proceedings of 16th Euromicro Conference on Real-time Systems (ECRTS), pp. 187–195. IEEE Computer Society (2004)
Baruah S.: Dynamic- and static-priority scheduling of recurring real-time tasks. Real-Time Syst. 24(1), 93–128 (2003)
Article MATH Google Scholar
Baruah S., Chen D., Gorinsky S., Mok A.K.: Generalized multiframe tasks. Real-Time Syst. 17(1), 5–22 (1999)
Article Google Scholar
Baruah S., Mok A.K., Rosier, L.E.: Preemptively scheduling hard-real-time sporadic tasks on one processor. In: Proceedings of 11th IEEE Real-time Systems Symposium, pp. 182–190. IEEE Computer Society Press (1990)
Bauer, M., Ecker, W., Henftling, R., Zinn, A.: A method for accelerating test environments. In: Euromicro, Vol. 01, p. 1477. IEEE Computer Society, Los Alamitos (1999)
Belleman R.G., Bedorf J., Zwart S.P.: High performance direct gravitational n-body simulations on graphics processing units. New Astron. 13(2), 103–112 (2008)
Article Google Scholar
Bordoloi, U.D., Chakraborty, S.: Accelerating system-level design tasks using commodity graphics hardware: A case study. In: 22nd International Conference on VLSI Design (VLSID), pp. 465–470 (2009)
Bordoloi, U.D., Chakraborty, S.: Performance debugging of real-time systems using multicriteria schedulability analysis. In: Proceedings of 13th Real Time and Embedded Technology and Applications Symposium (RTAS), pp. 193–202. IEEE Computer Society (2007)
Buttazzo G.C.: Hard Real-time Computing Systems: Predictable Scheduling Algorithms and Applications. Kluwer, Boston (1997)
MATH Google Scholar
Chakraborty, S., Erlebach, T., Thiele, L.: On the complexity of scheduling conditional real-time code. In: Proceedings of 7th International Workshop on Algorithms and Data Structures (WADS), Lecture Notes in Computer Science 2125, pp. 38–49 (2001)
Chakraborty, S., Künzli, S., Thiele, L.: Approximate schedulability analysis. In: Proceedings of 23rd IEEE Real-time Systems Symposium (RTSS), p. 159. IEEE Computer Society (2002)
Chatterjee, D., De Orio, A., Bertacco, V.: GCS: High-performance gate-level simulation with GP-GPUs. In: Design Automation and Test in Europe (DATE), April (2009)
nVIDIA CUDA Zone, http://www.nvidia.com/object/cuda_home.html
Deb, K.: Multi-objective Optimization Using Evolutionary Algorithms. Wiley, New York (2001)
Dutta, R., Roy, J., Vemuri, R.: Distributed design-space exploration for high-level synthesis systems. In: Proceedings of 29th Design Automation Conference (DAC), pp. 644–650. IEEE Computer Society Press (1992)
Feng, J., Chakraborty, S., Schmidt, B., Liu, W., Bordoloi, U.D.: Fast schedulability analysis using commodity graphics hardware. In: Proceedings of 13th International Conference on Embedded and Real-time Computing Systems and Applications (RTCSA), pp. 400–408. IEEE Computer Society (2007)
Fisher N., Baruah, S.: A fully polynomial-time approximation scheme for feasibility analysis in static-priority systems with arbitrary relative deadlines. In: Proceedings of 17th Euromicro Conference on Real-time Systems (ECRTS), pp. 117–126. IEEE Computer Society (2005)
Garey M.R., Johnson D.S.: Computers and Intractability: A Guide to the Theory of NP-completeness. W.H. Freeman and Company, New York (1979)
MATH Google Scholar
Goodnight, N., Woolley, C., Lewin, G., Luebke, D., Humphreys, G.: A multigrid solver for boundary value problems using programmable graphics hardware. In: Proceedings of SIGGRAPH/Eurographics Conference on Graphics Hardware, pp. 102–111. Eurographics Association (2003)
Gulati, K., Khatri, S.P.: Towards acceleration of fault simulation using graphics processing units. In: Proceedings of 45th Design Automation Conference (DAC), pp. 822–827. ACM (2008)
Hamann, A., Ernst, R.: Efficient priority optimization in complex distributed embedded systems through search space adaptation. In: Proceedings of 9th Conference on Genetic and Evolutionary Computation (GECCO), pp. 1517–1517. ACM (2007)
Henftling, R., Zinn, A., Bauer, M., Zambaldi, M., Ecker, W.: Re-use-centric architecture for a fully accelerated testbench environment. In: Proceedings of 40th Design Automation Conference (DAC), pp. 372–375. ACM (2003)
Kellerer H., Pferschy U., Pisinger D.: Knapsack Problems. Springer, Berlin (2004)
MATH Google Scholar
Kim, D., Ha, S., Gupta, R.: Parallel co-simulation using virtual synchronization with redundant host execution. In: Proceedings of 9th Conference on Design, Automation and Test in Europe (DATE), pp. 1151–1156. European Design and Automation Association (2006)
Kim, Y.-I., Yang, W., Kwon, Y-S., Kyung, C-M.: Communication-efficient hardware acceleration for fast functional simulation. In: Proceedings of 41st Design Automation Conference (DAC), pp. 293–298. ACM (2004)
Krüger J., Westermann R.: Linear algebra operators for GPU implementation of numerical algorithms. ACM Trans. Graph. 22(3), 908–916 (2003)
Article Google Scholar
Liu C., Leyland J.: Scheduling algorithms for multiprogramming in a hard real-time environment. J. ACM 20(1), 46–61 (1973)
Article MATH Google Scholar
Liu, W., Schmidt, B., Voss, G., Müller-Wittig, W.: GPU-ClustalW: Using graphics hardware to accelerate multiple sequence alignment. In: Proceedings of 13th International Conference on High Performance Computing (HiPC), pp. 363–374. Springer (2006)
Mueller, K., Xu, F.: Ultra-fast 3d filtered backprojection on commodity graphics hardware. In: Proceedings of 1st International Symposium on Biomedical Imaging, pp. 571–574 (2004)
Neophytou, N., Mueller, K.: GPU accelerated image aligned splatting. In: Proceedings of 4th Eurographics/IEEE Visualization and Graphics Technical Committee (VGTC) Workshop on Volume Graphics, pp. 197–205. Eurographics Association (2005)
Rost, R.J.: OpenGL Shading Language. Addison-Wesley, Reading (2006)
Skliarova I., Ferrari A.B.: A software/reconfigurable hardware sat solver. IEEE Trans. VLSI Syst. 12(4), 408–419 (2004)
Article Google Scholar
Soulé, L., Blank, T.: Parallel logic simulation on general purpose machines. In: Proceedings of 25th design automation conference (DAC), pp. 166–171. IEEE Computer Society Press (1988)
Venkatasubramanian, S.: The graphics card as a stream computer. In: SIGMOD-DIMACS Workshop on Management and Processing of Data Streams (2003)
Zargham, M.R.: Parallel channel routing. In: Proceedings of 25th Design Automation Conference (DAC), pp. 128–133. IEEE Computer Society Press (1988)
Zhong P., Martonosi M., Ashar P., Malik S.: Using configurable computing to accelerate boolean satisfiability. IEEE Trans. Comput. Aided Design Integr. Circuits Syst. 18(6), 861–868 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Centre Équation-2, VERIMAG, Avenue de Vignate, 38610, Gières, France
Unmesh D. Bordoloi
Institute for Real-Time Computer Systems, TU Munich, 80290, Munich, Germany
Samarjit Chakraborty

Authors

Unmesh D. Bordoloi
View author publications
You can also search for this author in PubMed Google Scholar
Samarjit Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Unmesh D. Bordoloi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bordoloi, U.D., Chakraborty, S. GPU-based Acceleration of System-level Design Tasks. Int J Parallel Prog 38, 225–253 (2010). https://doi.org/10.1007/s10766-009-0125-6

Download citation

Received: 19 July 2009
Accepted: 15 December 2009
Published: 16 January 2010
Issue Date: June 2010
DOI: https://doi.org/10.1007/s10766-009-0125-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GPU-based Acceleration of System-level Design Tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

FANG: Fast and Efficient Successor-State Generation for Heuristic Optimization on GPUs

OpenCL Kernel Optimization Metrics for CPU-GPU Architecture

A Transformation-Based Approach to Developing High-Performance GPU Programs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

GPU-based Acceleration of System-level Design Tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

FANG: Fast and Efficient Successor-State Generation for Heuristic Optimization on GPUs

OpenCL Kernel Optimization Metrics for CPU-GPU Architecture

A Transformation-Based Approach to Developing High-Performance GPU Programs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now