Data Pipeline Optimization for Shared Memory Multiple-SIMD Architecture

Zhang, Weihua; Bao, Tao; Zang, Binyu; Zhu, Chuanqi

doi:10.1007/978-3-540-72521-3_5

Weihua Zhang^1,2,
Tao Bao¹,
Binyu Zang¹ &
…
Chuanqi Zhu¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4382))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

587 Accesses
3 Citations

Abstract

The rapid growth of multimedia applications has been putting high pressure on the processing capability of modern processors, which leads to more and more modern multimedia processors employing parallel single instruction multiple data (SIMD) units to achieve high performance. In embedded system on chips (SOCs), shared memory multiple-SIMD architecture becomes popular because of its less power consumption and smaller chip size. In order to match the properties of some multimedia applications, there are interconnections among multiple SIMD units. In this paper, we present a novel program transformation technique to exploit parallel and pipelined computing power of modern shared-memory multiple-SIMD architecture. This optimizing technique can greatly reduce the conflict of shared data bus and improve the performance of applications with inherent data pipeline characteristics. Experimental results show that our method provides impressive speedup. For a shared memory multiple-SIMD architecture with 8 SIMD units, this method obtains more than 3.6X speedup for the multimedia programs.

This research was supported by Specialized Research Fund for the Doctoral Program of Chinese Higher Education under Grant No. 20050246020 and supported by the Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Diefendorff, K., Dubey, P.K.: How multimedia workloads will change processor design. Computer, pp. 43-45 (Sept. 1997)
Google Scholar
Rixner, S., Dally, W.J.: Register organization for media processing. In: 6th International Symposium on High-Performance Computer Architecture, pp. 375–386 (2000)
Google Scholar
Singh, H., Lee, M.H., Bagherzadeh, N.: MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications. IEEE Transaction on Computers 49(5), 465–481 (2000)
Article Google Scholar
Wang, X., Ziavras, S.G.: A framework for dynamic resource assignment and scheduling on reconfigurable mixed-mode on-chip multiprocessors. In: IEEE International Conference on Field-Programmable Technology, pp. 51–58. IEEE Computer Society Press, Los Alamitos (2005)
Chapter Google Scholar
Khailany, B., et al.: Imagine: media processing with streams. IEEE Micro 21(2), 35–46 (2001)
Article Google Scholar
http://www.motorala.com
Gayles, E.S., Kelliher, T.P., Irwin, M.J.: The Design of the MGAP-2: A Micro-Grained Massively Parallel Array. IEEE Transaction on Very Large Scale Integration(VLSI) Systems 8(6) (2000)
Google Scholar
Komuro, T., Ishikawa, M.: A Dynamically Reconfigurable SIMD Processor for a Vision Chip. IEEE Journal of Solid-State Circuits 39(1) (2004)
Google Scholar
Gebis, J., et al.: VIRAM1: A Media-Oriented Vector Processor with Embedded DRAM. In: 41st Design Automation Student Design Contenst, San Diego, CA, June (2004)
Google Scholar
Hofstee, H.P.: Power Efficient Processor Architecture and The Cell Processor. In: 11th International Conference on High-Performance Computer Architecture, San Francisco, USA, February (2005)
Google Scholar
Venkataramani, G., et al.: Automatic compilation to a coarse-grained reconfigurable system-opn-chip. ACM Transactions on Embedded Computing Systems (TECS) 2(Issue 4) (2003)
Google Scholar
Mattson, P., et al.: Communication Scheduling. In: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, Nov. (2000)
Google Scholar
Zhang, W., et al.: Optimizing Compiler for Shared-Memory Multiple SIMD Architecture. In: ACM SIGPLAN/SIGBED Conference on Languages, Ottawa, Canada, ACM, New York (2006)
Google Scholar
Jiang, W.H., et al.: Boosting the Performance of Multimedia Applications Using SIMD Instructions. In: The 15th International Conference on Compiler Construction, Edinburgh, Scotland, April (2005)
Google Scholar
Padua, D.A., Wolfe, M.J.: Advanced Compiler optimizations for Supercomputers. Communications of the ACM 29, 1184–1201 (1986)
Article Google Scholar
Muchnick, S.S.: Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Capitanio, A., Dutt, N., Nicolau, A.: Partitioned register files for VLIWs: A preliminary analysis of trade-offs. In: Proceedings of the 25th Annual International Symposium on Microarchitecture, Dec., pp. 292–300 (1992)
Google Scholar
Fernandes, M., Llosa, J., Topham, N.: Distributed modulo scheduling. In: Proceedings of the 5th Annual International Conference on High Performance Computer Architecture, Jan., pp. 130–134 (1999)
Google Scholar
Wolf, M.E., Lam, M.S.: A Data Locality Optimizing Algorithm. In: ACM SIGPLAN Conference on Programming Language Designand Implementation, pp. 30–44. ACM Press, New York (1991)
Google Scholar
Slingerland, N.T., Smith, A.J.: Multimedia Instruction Sets for General Purpose Microprocessors: A Survey.Technical Report CSD-00-, Univ. of California at Berkeley Computer Science, Dec.2000 (1122)
Google Scholar
Talla, D., John, L.K., Burger, D.C.: Bottlenecks in Multimedia Processing with SIMD-Style Extensions and Architectural Enhancements. IEEE Transactions on Computers 52(8), 1015–1031 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Parallel Processing Institute, Fudan University, Shanghai, China
Weihua Zhang, Tao Bao, Binyu Zang & Chuanqi Zhu
Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences,
Weihua Zhang

Authors

Weihua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Bao
View author publications
You can also search for this author in PubMed Google Scholar
Binyu Zang
View author publications
You can also search for this author in PubMed Google Scholar
Chuanqi Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

George Almási Călin Caşcaval Peng Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, W., Bao, T., Zang, B., Zhu, C. (2007). Data Pipeline Optimization for Shared Memory Multiple-SIMD Architecture. In: Almási, G., Caşcaval, C., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2006. Lecture Notes in Computer Science, vol 4382. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72521-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-72521-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72520-6
Online ISBN: 978-3-540-72521-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics