Skip to main content

A High Efficient On-Chip Interconnection Network in SIMD CMPs

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6081))

Abstract

In order to improve the performance of on-chip data communications in SIMD (Single Instruction Multiple Data) architecture, we propose an efficient and modular interconnection architecture called Broadcast and Permutation Mesh network (BP-Mesh). BP-Mesh architecture possesses not only low complexity and high bandwidth, but also well flexibility and scalability. Detailed hardware implementation is discussed in the paper. And the proposed architecture is evaluated in terms of area cost and performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bohr, M.T.: Interconnect scaling - the real limiter to high performance ULSI. In: IEEE International Electron Devices Meeting, pp. 241–244 (1995)

    Google Scholar 

  2. Matzke, D.: Will physical scalability sabotage performance gains? IEEE Computer 30, 37–39 (1997)

    Google Scholar 

  3. Wolfe, A.: Intel clears up post-tejas confusion. VARBusiness (May 17, 2004), http://www.varbusiness.com/sections/news/breakingnews.jhtml?articleld=18842588

  4. Agarwal, V., Hrishikesh, M.S., Keckler, S.W., Burger, D.: Clock rate versus IPC: The end of the road for conventional microarchitectures. In: Proc. Of IEEE 27th International Symposium on Computer Architecture (ISCA-27), pp. 248–259 (2000)

    Google Scholar 

  5. Chandrakasan, A.P., Sheng, S., Brodersen, R.W.: Low-power CMOS digital design. IEEE Journal of Solid-State Circuits 27, 473–484 (1992)

    Article  Google Scholar 

  6. Barnes, G.H., Brown, R.M., Kato, M., Kuck, D.J., et al.: The Illiac IV computer. IEEE Transactions on Computers C-17, 746–757 (1968)

    Article  Google Scholar 

  7. Batcher, K.E.: Design of a massively parallel processor. IEEE Transactions on Computers C-29, 836–840 (1980)

    Article  Google Scholar 

  8. Parkinson, D., Hunt, D.J., MacQueen, K.S.: THE AMT DAP 500. In: Proc. Of the 33rd IEEE International Conference of Computer Society, pp. 196–199 (March 1988)

    Google Scholar 

  9. Nickolls, J.R.: The design of the MasPar MP-1: A cost effective massively parallel computer. In: Proc. Of the 35th IEEE International Conference of Computer Society, pp. 25–28 (March 1990)

    Google Scholar 

  10. Singh, H., Lee, M.-H., Lu, G., Kurdahi, F.J., et al.: MorphoSys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Transactions on Computers 49, 465–481 (2000)

    Article  Google Scholar 

  11. Fujita, Y., Kyo, S., Yamashita, N., Okazaki, S.: A 10 GIPS SIMD processor for PC-based real-time vision applications. In: Proc. Of the 4th IEEE International Workshop on Computer Architecture for Machine Perception (CAMP 1997), pp. 22–32 (October 1997)

    Google Scholar 

  12. ClearSpeed Whitepaper: CSX Processor Architecture, http://www.clearspeed.com/newsevents/presskit

  13. Khailany, B., Dally, W.J., Kapasi, U.J., Mattson, P., et al.: Imagine: Media processing with streams. IEEE Micro 21, 35–46 (2001)

    Article  Google Scholar 

  14. Fatemi, H., Corporaal, H., Basten, T., Kleihorst, R., Jonker, P.: Designing area and performance constrained SIMD/VLIW image processing architectures. In: Blanc-Talon, J., Philips, W., Popescu, D.C., Scheunders, P. (eds.) ACIVS 2005. LNCS, vol. 3708, pp. 689–696. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Makino, J., Hiraki, K., Inaba, M.: GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing. In: Proc. Of the 2007 ACM/IEEE Conference on Supercomputing (SC 2007), pp. 1–11 (2007)

    Google Scholar 

  16. Balfour, J., Dally, W.J.: Design tradeoffs for tiled CMP on-chip networks. In: Proc. Of the 20th Annual International Conference on Supercomputing (ICS 2006), pp. 187–198 (June 2006)

    Google Scholar 

  17. Das, R., Eachempati, S., Mishra, A.K., Narayanan, V., Das, C.R.: Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs. In: Proc. of IEEE 15th International Symposium on High Performance Computer Architecture (HPCA 2009), pp. 175–186 (Febuary 2009)

    Google Scholar 

  18. Banerjee, A., Wolkotte, P.T., Mullins, R.D., Moore, S.W., Smit, G.J.M.: An energy and performance exploration of network-on-chip architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17, 319–329 (2009)

    Article  Google Scholar 

  19. Cannon, L.E.: A cellular computer to implement the kalman filter algorithm. Ph.D. thesis, Montana State University (1969)

    Google Scholar 

  20. Fatahalian, K., Sugerman, J., Hanrahan, P.: Understanding the efficiency of GPU algorithms for matrix-matrix multiplication. In: Proc. Of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pp. 133–137 (August 2004)

    Google Scholar 

  21. Bahn, J.H., Yang, J., Bagherzadeh, N.: Parallel FFT algorithms on network-on-chips. In: Proc. Of the 5th International Conference on Information Technology: New Generations, pp. 1087–1093 (April 2008)

    Google Scholar 

  22. Kumar, R., Zyuban, V., Tullsen, D.M.: Interconnections in multi-core architectures: understanding mechanism, overheads and scaling. In: Proc. Of the 32nd International Symposium on Computer Architecture (ISCA 2005), pp. 408–419 (June 2005)

    Google Scholar 

  23. Cheng, L., Muralimanohar, N., Ramani, K., Balasubramonian, R., Carter, J.B.: Interconnect-Aware Coherence Protocols for Chip Multiprocessors. In: Proc. of the 33rd International Symposium on Computer Architecture (ISCA 2006), pp. 339–351 (2006)

    Google Scholar 

  24. Flores, A., Aragon, J.L., Acacio, M.E.: An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures. Journal of Supercomputing 45, 341–364 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, D., Dai, K., Zou, X., Rao, J., Chen, P. (2010). A High Efficient On-Chip Interconnection Network in SIMD CMPs. In: Hsu, CH., Yang, L.T., Park, J.H., Yeo, SS. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2010. Lecture Notes in Computer Science, vol 6081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13119-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13119-6_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13118-9

  • Online ISBN: 978-3-642-13119-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics