Abstract
A systematic approach to customising Homogeneous Multi-Processor (HoMP) architectures is described. The approach involves a novel design space exploration tool and a parameterisable system model. Post-fabrication customisation options for using reconfigurable logic with a HoMP are classified. The adoption of the approach in exploring pre- and post-fabrication customisation options to optimise an architecture’s critical paths is then described. The approach and steps are demonstrated using the architecture of a graphics processor. We also analyse on-chip and off-chip memory access for systems with one or more processing elements (PEs), and study the impact of the number of threads per PE on the amount of off-chip memory access and the number of cycles for each output. It is shown that post-fabrication customisation of a graphics processor can provide up to four times performance improvement for negligible area cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Vassiliadis, S., et al.: Tera-device computing and beyond: Thematic group 7 (2006), Roadmap ftp://ftp.cordis.europa.eu/pub/fp7/ict/docs/fet-proactive/masict-01_en.pdf
De Bosschere, K., Luk, W., Martorell, X., Navarro, N., O’Boyle, M., Pnevmatikatos, D., RamÃrez, A., Sainrat, P., Seznec, A., Stenström, P., Temam, O.: High-performance embedded architecture and compilation roadmap. In: Stenström, P. (ed.) Transactions on High-Performance Embedded Architectures and Compilers I. LNCS, vol. 4050, pp. 5–29. Springer, Heidelberg (2007)
Cope, B., Cheung, P.Y.K., Luk, W.: Systematic design space exploration for customisable multi-processor architectures. In: SAMOS, pp. 57–64 (July 2008)
Keinhuis, B., et al.: An approach for quantitative analysis of application-specific dataflow architectures. In: ASAP, pp. 338–350 (July 1997)
Lieverse, P., et al.: A methodology for architecture exploration of heterogeneous signal processing systems. Journal of VLSI Signal Processing 29(3), 197–207 (2001)
Moya, V., Golzalez, C., Roca, J., Fernandez, A.: Shader performance analysis on a modern GPU architecture. In: IEEE/ACM Symposium on Microarchitecture, pp. 355–364 (2005)
Sheaffer, J.W., Skadron, K., Luebke, D.P.: Fine-grained graphics architectural simulation with qsilver. In: Computer Graphics and Interactive Techniques (2005)
Nvidia: nvidia shaderperf 1.8 performance analysis tool, http://developer.nvidia.com/object/nvshaderperf_home.html
Govindaraju, N.K., Larsen, S., Gray, J., Manocha, D.: A memory model for scientific algorithms on graphics processors. In: ACM/IEEE Super Computing, pp. 89–98 (2006)
Kahn, G.: The semantics of a simple language for parallel programming. In: IFIP Congress (1974)
Rissa, T., Donlin, A., Luk, W.: Evaluation of systemc modelling of reconfigurable embedded systems. In: DATE, pp. 253–258 (March 2005)
Donlin, A., Braun, A., Rose, A.: SystemC for the design and modeling of programmable systems. In: Becker, J., Platzner, M., Vernalde, S. (eds.) FPL 2004. LNCS, vol. 3203, pp. 811–820. Springer, Heidelberg (2004)
Todman, T.J., Constantinides, G.A., Wilton, S.J., Mencer, O., Luk, W., Cheung, P.Y.: Reconfigurable computing: Architectures and design methods. IEE Computers and Digital Techniques 152(2), 193–207 (2005)
Cope, B., Cheung, P.Y.K., Luk, W.: Using reconfigurable logic to optimise gpu memory accesses. In: DATE, pp. 44–49 (2008)
Moll, L., Heirich, A., Shand, M.: Sepia: Scalable 3d compositing using pci pamette. In: FCCM, pp. 146–155 (April 1999)
Manzke, M., Brennan, R., O’Conor, K., Dingliana, J., O’Sullivan, C.: A scalable and reconfigurable shared-memory graphics architecture. In: Computer Graphics and Interactive Techniques (August 2006)
Xue, X., Cheryauka, A., Tubbs, D.: Acceleration of fluoro-ct reconstruction for a mobile c-arm on gpu and fpga hardware: A simulation study. In: SPIE Medical Imaging 2006, vol. 6142(1), pp. 1494–1501 (2006)
Kelmelis, E., Humphrey, J., Durbano, J., Ortiz, F.: High-performance computing with desktop workstations. WSEAS Transactions on Mathematics 6(1), 54–59 (2007)
Schleupen, K., Lekuch, S., Mannion, R., Guo, Z., Najjar, W., Vahid, F.: Dynamic partial fpga reconfiguration in a prototype microprocessor system. In: FPL, pp. 533–536 (August 2007)
Tremblay, M., Chaudhry, S.: A third-generation 65nm 16-core 32-thread plus 32-scout-thread cmt sparc processor. In: Proceedings of the IEEE ISSCC, pp. 82–83 (February 2008)
Dale, K., et al.: A scalable and reconfigurable shared-memory graphics architecture. In: Bertels, K., Cardoso, J.M.P., Vassiliadis, S. (eds.) ARC 2006. LNCS, vol. 3985, pp. 99–108. Springer, Heidelberg (2006)
Yalamanchili, S.: From adaptive to self-tuned systems. In: Symposium on The Future of Computing in memory of Stamatis Vassiliadis (2007)
MathStar: Field programmable object arrays: Architecture (2008), http://www.mathstar.com/Architecture.php
Chen, T.F., Hsu, C.M., Wu, S.R.: Flexible heterogeneous multicore architectures for versatile media processing via customized long instruction words. IEEE Transactions on Circuits and Systems for Video Technology 15(5), 659–672 (2005)
Verbauwhede, I., Schaumont, P.: The happy marriage of architecture and application in next-generation reconfigurable systems. In: Computing Frontiers, pp. 363–376 (April 2004)
Nollet, V., Verkest, D., Corporaal, H.: A quick safari through the mpsoc run-time management jungle. In: Workshop on Embedded Systems for Real-Time Multimedia, pp. 41–46 (October 2007)
Shin, D.: Automatic generation of transaction level models for rapid design space exploration. In: Proceedings of Hardware/Software Codesign and System Synthesis, pp. 64–69 (October 2006)
Cope, B., Cheung, P.Y.K., Luk, W.: Bridging the gap between FPGAs and multi-processor architectures: A video processing perspective. In: Application-specific Systems, Architectures and Processors, pp. 308–313 (2007)
Priem, C., Solanki, G., Kirk, D.: Texture cache for a computer graphics accelerator. United States Patent No. US 7, 136, 068 B1 (1998)
Jin, Q., Thomas, D., Luk, W., Cope, B.: Exploring reconfigurable architectures for financial computation. In: Woods, R., Compton, K., Bouganis, C., Diniz, P.C. (eds.) ARC 2008. LNCS, vol. 4943, pp. 245–255. Springer, Heidelberg (2008)
Ahn, J.H., Erez, M., Dally, W.J.: The design space of data-parallel memory systems. In: ACM/IEEE Super Computing, pp. 80–92 (November 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Cope, B., Cheung, P.Y.K., Luk, W., Howes, L. (2011). A Systematic Design Space Exploration Approach to Customising Multi-Processor Architectures: Exemplified Using Graphics Processors. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers IV. Lecture Notes in Computer Science, vol 6760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24568-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-24568-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24567-1
Online ISBN: 978-3-642-24568-8
eBook Packages: Computer ScienceComputer Science (R0)