A Systematic Design Space Exploration Approach to Customising Multi-Processor Architectures: Exemplified Using Graphics Processors

Cope, Ben; Cheung, Peter Y. K.; Luk, Wayne; Howes, Lee

doi:10.1007/978-3-642-24568-8_4

Ben Cope¹⁷,
Peter Y. K. Cheung¹⁷,
Wayne Luk¹⁸ &
…
Lee Howes¹⁸

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 6760))

679 Accesses

Abstract

A systematic approach to customising Homogeneous Multi-Processor (HoMP) architectures is described. The approach involves a novel design space exploration tool and a parameterisable system model. Post-fabrication customisation options for using reconfigurable logic with a HoMP are classified. The adoption of the approach in exploring pre- and post-fabrication customisation options to optimise an architecture’s critical paths is then described. The approach and steps are demonstrated using the architecture of a graphics processor. We also analyse on-chip and off-chip memory access for systems with one or more processing elements (PEs), and study the impact of the number of threads per PE on the amount of off-chip memory access and the number of cycles for each output. It is shown that post-fabrication customisation of a graphics processor can provide up to four times performance improvement for negligible area cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Heterogeneous Dark Silicon Chip Multi-Processors: Design and Run-Time Management

IPAS: a design framework for analysis, synthesis and optimization of image processing applications for heterogenous computing architectures

Article 05 April 2016

Application-Specific Processors

References

Vassiliadis, S., et al.: Tera-device computing and beyond: Thematic group 7 (2006), Roadmap ftp://ftp.cordis.europa.eu/pub/fp7/ict/docs/fet-proactive/masict-01_en.pdf
De Bosschere, K., Luk, W., Martorell, X., Navarro, N., O’Boyle, M., Pnevmatikatos, D., Ramírez, A., Sainrat, P., Seznec, A., Stenström, P., Temam, O.: High-performance embedded architecture and compilation roadmap. In: Stenström, P. (ed.) Transactions on High-Performance Embedded Architectures and Compilers I. LNCS, vol. 4050, pp. 5–29. Springer, Heidelberg (2007)
Chapter Google Scholar
Cope, B., Cheung, P.Y.K., Luk, W.: Systematic design space exploration for customisable multi-processor architectures. In: SAMOS, pp. 57–64 (July 2008)
Google Scholar
Keinhuis, B., et al.: An approach for quantitative analysis of application-specific dataflow architectures. In: ASAP, pp. 338–350 (July 1997)
Google Scholar
Lieverse, P., et al.: A methodology for architecture exploration of heterogeneous signal processing systems. Journal of VLSI Signal Processing 29(3), 197–207 (2001)
Article MATH Google Scholar
Moya, V., Golzalez, C., Roca, J., Fernandez, A.: Shader performance analysis on a modern GPU architecture. In: IEEE/ACM Symposium on Microarchitecture, pp. 355–364 (2005)
Google Scholar
Sheaffer, J.W., Skadron, K., Luebke, D.P.: Fine-grained graphics architectural simulation with qsilver. In: Computer Graphics and Interactive Techniques (2005)
Google Scholar
Nvidia: nvidia shaderperf 1.8 performance analysis tool, http://developer.nvidia.com/object/nvshaderperf_home.html
Govindaraju, N.K., Larsen, S., Gray, J., Manocha, D.: A memory model for scientific algorithms on graphics processors. In: ACM/IEEE Super Computing, pp. 89–98 (2006)
Google Scholar
Kahn, G.: The semantics of a simple language for parallel programming. In: IFIP Congress (1974)
Google Scholar
Rissa, T., Donlin, A., Luk, W.: Evaluation of systemc modelling of reconfigurable embedded systems. In: DATE, pp. 253–258 (March 2005)
Google Scholar
Donlin, A., Braun, A., Rose, A.: SystemC for the design and modeling of programmable systems. In: Becker, J., Platzner, M., Vernalde, S. (eds.) FPL 2004. LNCS, vol. 3203, pp. 811–820. Springer, Heidelberg (2004)
Chapter Google Scholar
Todman, T.J., Constantinides, G.A., Wilton, S.J., Mencer, O., Luk, W., Cheung, P.Y.: Reconfigurable computing: Architectures and design methods. IEE Computers and Digital Techniques 152(2), 193–207 (2005)
Article Google Scholar
Cope, B., Cheung, P.Y.K., Luk, W.: Using reconfigurable logic to optimise gpu memory accesses. In: DATE, pp. 44–49 (2008)
Google Scholar
Moll, L., Heirich, A., Shand, M.: Sepia: Scalable 3d compositing using pci pamette. In: FCCM, pp. 146–155 (April 1999)
Google Scholar
Manzke, M., Brennan, R., O’Conor, K., Dingliana, J., O’Sullivan, C.: A scalable and reconfigurable shared-memory graphics architecture. In: Computer Graphics and Interactive Techniques (August 2006)
Google Scholar
Xue, X., Cheryauka, A., Tubbs, D.: Acceleration of fluoro-ct reconstruction for a mobile c-arm on gpu and fpga hardware: A simulation study. In: SPIE Medical Imaging 2006, vol. 6142(1), pp. 1494–1501 (2006)
Google Scholar
Kelmelis, E., Humphrey, J., Durbano, J., Ortiz, F.: High-performance computing with desktop workstations. WSEAS Transactions on Mathematics 6(1), 54–59 (2007)
Google Scholar
Schleupen, K., Lekuch, S., Mannion, R., Guo, Z., Najjar, W., Vahid, F.: Dynamic partial fpga reconfiguration in a prototype microprocessor system. In: FPL, pp. 533–536 (August 2007)
Google Scholar
Tremblay, M., Chaudhry, S.: A third-generation 65nm 16-core 32-thread plus 32-scout-thread cmt sparc processor. In: Proceedings of the IEEE ISSCC, pp. 82–83 (February 2008)
Google Scholar
Dale, K., et al.: A scalable and reconfigurable shared-memory graphics architecture. In: Bertels, K., Cardoso, J.M.P., Vassiliadis, S. (eds.) ARC 2006. LNCS, vol. 3985, pp. 99–108. Springer, Heidelberg (2006)
Chapter Google Scholar
Yalamanchili, S.: From adaptive to self-tuned systems. In: Symposium on The Future of Computing in memory of Stamatis Vassiliadis (2007)
Google Scholar
MathStar: Field programmable object arrays: Architecture (2008), http://www.mathstar.com/Architecture.php
Chen, T.F., Hsu, C.M., Wu, S.R.: Flexible heterogeneous multicore architectures for versatile media processing via customized long instruction words. IEEE Transactions on Circuits and Systems for Video Technology 15(5), 659–672 (2005)
Article Google Scholar
Verbauwhede, I., Schaumont, P.: The happy marriage of architecture and application in next-generation reconfigurable systems. In: Computing Frontiers, pp. 363–376 (April 2004)
Google Scholar
Nollet, V., Verkest, D., Corporaal, H.: A quick safari through the mpsoc run-time management jungle. In: Workshop on Embedded Systems for Real-Time Multimedia, pp. 41–46 (October 2007)
Google Scholar
Shin, D.: Automatic generation of transaction level models for rapid design space exploration. In: Proceedings of Hardware/Software Codesign and System Synthesis, pp. 64–69 (October 2006)
Google Scholar
Cope, B., Cheung, P.Y.K., Luk, W.: Bridging the gap between FPGAs and multi-processor architectures: A video processing perspective. In: Application-specific Systems, Architectures and Processors, pp. 308–313 (2007)
Google Scholar
Priem, C., Solanki, G., Kirk, D.: Texture cache for a computer graphics accelerator. United States Patent No. US 7, 136, 068 B1 (1998)
Google Scholar
Jin, Q., Thomas, D., Luk, W., Cope, B.: Exploring reconfigurable architectures for financial computation. In: Woods, R., Compton, K., Bouganis, C., Diniz, P.C. (eds.) ARC 2008. LNCS, vol. 4943, pp. 245–255. Springer, Heidelberg (2008)
Chapter Google Scholar
Ahn, J.H., Erez, M., Dally, W.J.: The design space of data-parallel memory systems. In: ACM/IEEE Super Computing, pp. 80–92 (November 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical & Electronic Engineering, Imperial College London, UK
Ben Cope & Peter Y. K. Cheung
Department of Computing, Imperial College London, UK
Wayne Luk & Lee Howes

Authors

Ben Cope
View author publications
You can also search for this author in PubMed Google Scholar
Peter Y. K. Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Wayne Luk
View author publications
You can also search for this author in PubMed Google Scholar
Lee Howes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 412 96, Gothenburg, Sweden
Per Stenström

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cope, B., Cheung, P.Y.K., Luk, W., Howes, L. (2011). A Systematic Design Space Exploration Approach to Customising Multi-Processor Architectures: Exemplified Using Graphics Processors. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers IV. Lecture Notes in Computer Science, vol 6760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24568-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-24568-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24567-1
Online ISBN: 978-3-642-24568-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics