Skip to main content

A Systematic Design Space Exploration Approach to Customising Multi-Processor Architectures: Exemplified Using Graphics Processors

  • Chapter
  • 652 Accesses

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 6760))

Abstract

A systematic approach to customising Homogeneous Multi-Processor (HoMP) architectures is described. The approach involves a novel design space exploration tool and a parameterisable system model. Post-fabrication customisation options for using reconfigurable logic with a HoMP are classified. The adoption of the approach in exploring pre- and post-fabrication customisation options to optimise an architecture’s critical paths is then described. The approach and steps are demonstrated using the architecture of a graphics processor. We also analyse on-chip and off-chip memory access for systems with one or more processing elements (PEs), and study the impact of the number of threads per PE on the amount of off-chip memory access and the number of cycles for each output. It is shown that post-fabrication customisation of a graphics processor can provide up to four times performance improvement for negligible area cost.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vassiliadis, S., et al.: Tera-device computing and beyond: Thematic group 7 (2006), Roadmap ftp://ftp.cordis.europa.eu/pub/fp7/ict/docs/fet-proactive/masict-01_en.pdf

  2. De Bosschere, K., Luk, W., Martorell, X., Navarro, N., O’Boyle, M., Pnevmatikatos, D., Ramírez, A., Sainrat, P., Seznec, A., Stenström, P., Temam, O.: High-performance embedded architecture and compilation roadmap. In: Stenström, P. (ed.) Transactions on High-Performance Embedded Architectures and Compilers I. LNCS, vol. 4050, pp. 5–29. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  3. Cope, B., Cheung, P.Y.K., Luk, W.: Systematic design space exploration for customisable multi-processor architectures. In: SAMOS, pp. 57–64 (July 2008)

    Google Scholar 

  4. Keinhuis, B., et al.: An approach for quantitative analysis of application-specific dataflow architectures. In: ASAP, pp. 338–350 (July 1997)

    Google Scholar 

  5. Lieverse, P., et al.: A methodology for architecture exploration of heterogeneous signal processing systems. Journal of VLSI Signal Processing 29(3), 197–207 (2001)

    Article  MATH  Google Scholar 

  6. Moya, V., Golzalez, C., Roca, J., Fernandez, A.: Shader performance analysis on a modern GPU architecture. In: IEEE/ACM Symposium on Microarchitecture, pp. 355–364 (2005)

    Google Scholar 

  7. Sheaffer, J.W., Skadron, K., Luebke, D.P.: Fine-grained graphics architectural simulation with qsilver. In: Computer Graphics and Interactive Techniques (2005)

    Google Scholar 

  8. Nvidia: nvidia shaderperf 1.8 performance analysis tool, http://developer.nvidia.com/object/nvshaderperf_home.html

  9. Govindaraju, N.K., Larsen, S., Gray, J., Manocha, D.: A memory model for scientific algorithms on graphics processors. In: ACM/IEEE Super Computing, pp. 89–98 (2006)

    Google Scholar 

  10. Kahn, G.: The semantics of a simple language for parallel programming. In: IFIP Congress (1974)

    Google Scholar 

  11. Rissa, T., Donlin, A., Luk, W.: Evaluation of systemc modelling of reconfigurable embedded systems. In: DATE, pp. 253–258 (March 2005)

    Google Scholar 

  12. Donlin, A., Braun, A., Rose, A.: SystemC for the design and modeling of programmable systems. In: Becker, J., Platzner, M., Vernalde, S. (eds.) FPL 2004. LNCS, vol. 3203, pp. 811–820. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Todman, T.J., Constantinides, G.A., Wilton, S.J., Mencer, O., Luk, W., Cheung, P.Y.: Reconfigurable computing: Architectures and design methods. IEE Computers and Digital Techniques 152(2), 193–207 (2005)

    Article  Google Scholar 

  14. Cope, B., Cheung, P.Y.K., Luk, W.: Using reconfigurable logic to optimise gpu memory accesses. In: DATE, pp. 44–49 (2008)

    Google Scholar 

  15. Moll, L., Heirich, A., Shand, M.: Sepia: Scalable 3d compositing using pci pamette. In: FCCM, pp. 146–155 (April 1999)

    Google Scholar 

  16. Manzke, M., Brennan, R., O’Conor, K., Dingliana, J., O’Sullivan, C.: A scalable and reconfigurable shared-memory graphics architecture. In: Computer Graphics and Interactive Techniques (August 2006)

    Google Scholar 

  17. Xue, X., Cheryauka, A., Tubbs, D.: Acceleration of fluoro-ct reconstruction for a mobile c-arm on gpu and fpga hardware: A simulation study. In: SPIE Medical Imaging 2006, vol. 6142(1), pp. 1494–1501 (2006)

    Google Scholar 

  18. Kelmelis, E., Humphrey, J., Durbano, J., Ortiz, F.: High-performance computing with desktop workstations. WSEAS Transactions on Mathematics 6(1), 54–59 (2007)

    Google Scholar 

  19. Schleupen, K., Lekuch, S., Mannion, R., Guo, Z., Najjar, W., Vahid, F.: Dynamic partial fpga reconfiguration in a prototype microprocessor system. In: FPL, pp. 533–536 (August 2007)

    Google Scholar 

  20. Tremblay, M., Chaudhry, S.: A third-generation 65nm 16-core 32-thread plus 32-scout-thread cmt sparc processor. In: Proceedings of the IEEE ISSCC, pp. 82–83 (February 2008)

    Google Scholar 

  21. Dale, K., et al.: A scalable and reconfigurable shared-memory graphics architecture. In: Bertels, K., Cardoso, J.M.P., Vassiliadis, S. (eds.) ARC 2006. LNCS, vol. 3985, pp. 99–108. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  22. Yalamanchili, S.: From adaptive to self-tuned systems. In: Symposium on The Future of Computing in memory of Stamatis Vassiliadis (2007)

    Google Scholar 

  23. MathStar: Field programmable object arrays: Architecture (2008), http://www.mathstar.com/Architecture.php

  24. Chen, T.F., Hsu, C.M., Wu, S.R.: Flexible heterogeneous multicore architectures for versatile media processing via customized long instruction words. IEEE Transactions on Circuits and Systems for Video Technology 15(5), 659–672 (2005)

    Article  Google Scholar 

  25. Verbauwhede, I., Schaumont, P.: The happy marriage of architecture and application in next-generation reconfigurable systems. In: Computing Frontiers, pp. 363–376 (April 2004)

    Google Scholar 

  26. Nollet, V., Verkest, D., Corporaal, H.: A quick safari through the mpsoc run-time management jungle. In: Workshop on Embedded Systems for Real-Time Multimedia, pp. 41–46 (October 2007)

    Google Scholar 

  27. Shin, D.: Automatic generation of transaction level models for rapid design space exploration. In: Proceedings of Hardware/Software Codesign and System Synthesis, pp. 64–69 (October 2006)

    Google Scholar 

  28. Cope, B., Cheung, P.Y.K., Luk, W.: Bridging the gap between FPGAs and multi-processor architectures: A video processing perspective. In: Application-specific Systems, Architectures and Processors, pp. 308–313 (2007)

    Google Scholar 

  29. Priem, C., Solanki, G., Kirk, D.: Texture cache for a computer graphics accelerator. United States Patent No. US 7, 136, 068 B1 (1998)

    Google Scholar 

  30. Jin, Q., Thomas, D., Luk, W., Cope, B.: Exploring reconfigurable architectures for financial computation. In: Woods, R., Compton, K., Bouganis, C., Diniz, P.C. (eds.) ARC 2008. LNCS, vol. 4943, pp. 245–255. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  31. Ahn, J.H., Erez, M., Dally, W.J.: The design space of data-parallel memory systems. In: ACM/IEEE Super Computing, pp. 80–92 (November 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Cope, B., Cheung, P.Y.K., Luk, W., Howes, L. (2011). A Systematic Design Space Exploration Approach to Customising Multi-Processor Architectures: Exemplified Using Graphics Processors. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers IV. Lecture Notes in Computer Science, vol 6760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24568-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24568-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24567-1

  • Online ISBN: 978-3-642-24568-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics