Skip to main content

Data Pipeline Optimization for Shared Memory Multiple-SIMD Architecture

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4382))

Abstract

The rapid growth of multimedia applications has been putting high pressure on the processing capability of modern processors, which leads to more and more modern multimedia processors employing parallel single instruction multiple data (SIMD) units to achieve high performance. In embedded system on chips (SOCs), shared memory multiple-SIMD architecture becomes popular because of its less power consumption and smaller chip size. In order to match the properties of some multimedia applications, there are interconnections among multiple SIMD units. In this paper, we present a novel program transformation technique to exploit parallel and pipelined computing power of modern shared-memory multiple-SIMD architecture. This optimizing technique can greatly reduce the conflict of shared data bus and improve the performance of applications with inherent data pipeline characteristics. Experimental results show that our method provides impressive speedup. For a shared memory multiple-SIMD architecture with 8 SIMD units, this method obtains more than 3.6X speedup for the multimedia programs.

This research was supported by Specialized Research Fund for the Doctoral Program of Chinese Higher Education under Grant No. 20050246020 and supported by the Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Diefendorff, K., Dubey, P.K.: How multimedia workloads will change processor design. Computer, pp. 43-45 (Sept. 1997)

    Google Scholar 

  2. Rixner, S., Dally, W.J.: Register organization for media processing. In: 6th International Symposium on High-Performance Computer Architecture, pp. 375–386 (2000)

    Google Scholar 

  3. Singh, H., Lee, M.H., Bagherzadeh, N.: MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications. IEEE Transaction on Computers 49(5), 465–481 (2000)

    Article  Google Scholar 

  4. Wang, X., Ziavras, S.G.: A framework for dynamic resource assignment and scheduling on reconfigurable mixed-mode on-chip multiprocessors. In: IEEE International Conference on Field-Programmable Technology, pp. 51–58. IEEE Computer Society Press, Los Alamitos (2005)

    Chapter  Google Scholar 

  5. Khailany, B., et al.: Imagine: media processing with streams. IEEE Micro 21(2), 35–46 (2001)

    Article  Google Scholar 

  6. http://www.motorala.com

  7. Gayles, E.S., Kelliher, T.P., Irwin, M.J.: The Design of the MGAP-2: A Micro-Grained Massively Parallel Array. IEEE Transaction on Very Large Scale Integration(VLSI) Systems 8(6) (2000)

    Google Scholar 

  8. Komuro, T., Ishikawa, M.: A Dynamically Reconfigurable SIMD Processor for a Vision Chip. IEEE Journal of Solid-State Circuits 39(1) (2004)

    Google Scholar 

  9. Gebis, J., et al.: VIRAM1: A Media-Oriented Vector Processor with Embedded DRAM. In: 41st Design Automation Student Design Contenst, San Diego, CA, June (2004)

    Google Scholar 

  10. Hofstee, H.P.: Power Efficient Processor Architecture and The Cell Processor. In: 11th International Conference on High-Performance Computer Architecture, San Francisco, USA, February (2005)

    Google Scholar 

  11. Venkataramani, G., et al.: Automatic compilation to a coarse-grained reconfigurable system-opn-chip. ACM Transactions on Embedded Computing Systems (TECS) 2(Issue 4) (2003)

    Google Scholar 

  12. Mattson, P., et al.: Communication Scheduling. In: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, Nov. (2000)

    Google Scholar 

  13. Zhang, W., et al.: Optimizing Compiler for Shared-Memory Multiple SIMD Architecture. In: ACM SIGPLAN/SIGBED Conference on Languages, Ottawa, Canada, ACM, New York (2006)

    Google Scholar 

  14. Jiang, W.H., et al.: Boosting the Performance of Multimedia Applications Using SIMD Instructions. In: The 15th International Conference on Compiler Construction, Edinburgh, Scotland, April (2005)

    Google Scholar 

  15. Padua, D.A., Wolfe, M.J.: Advanced Compiler optimizations for Supercomputers. Communications of the ACM 29, 1184–1201 (1986)

    Article  Google Scholar 

  16. Muchnick, S.S.: Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  17. Capitanio, A., Dutt, N., Nicolau, A.: Partitioned register files for VLIWs: A preliminary analysis of trade-offs. In: Proceedings of the 25th Annual International Symposium on Microarchitecture, Dec., pp. 292–300 (1992)

    Google Scholar 

  18. Fernandes, M., Llosa, J., Topham, N.: Distributed modulo scheduling. In: Proceedings of the 5th Annual International Conference on High Performance Computer Architecture, Jan., pp. 130–134 (1999)

    Google Scholar 

  19. Wolf, M.E., Lam, M.S.: A Data Locality Optimizing Algorithm. In: ACM SIGPLAN Conference on Programming Language Designand Implementation, pp. 30–44. ACM Press, New York (1991)

    Google Scholar 

  20. Slingerland, N.T., Smith, A.J.: Multimedia Instruction Sets for General Purpose Microprocessors: A Survey.Technical Report CSD-00-, Univ. of California at Berkeley Computer Science, Dec.2000 (1122)

    Google Scholar 

  21. Talla, D., John, L.K., Burger, D.C.: Bottlenecks in Multimedia Processing with SIMD-Style Extensions and Architectural Enhancements. IEEE Transactions on Computers 52(8), 1015–1031 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

George Almási Călin Caşcaval Peng Wu

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Zhang, W., Bao, T., Zang, B., Zhu, C. (2007). Data Pipeline Optimization for Shared Memory Multiple-SIMD Architecture. In: Almási, G., Caşcaval, C., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2006. Lecture Notes in Computer Science, vol 4382. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72521-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72521-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72520-6

  • Online ISBN: 978-3-540-72521-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics