Skip to main content

Instruction Selection for Subword Level Parallelism Optimizations for Application Specific Instruction Processors

  • Conference paper
Parallel and Distributed Processing and Applications (ISPA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4742))

Abstract

Application Specific Instruction Processors (or, ASIPs) have the potential to meet the high-performance demands of multimedia applications, such as image processing, audio and video encoding, speech processing, and digital signal processing. To achieve lower cost and efficient energy for high performance embedded systems built by ASIPs, subword parallelism optimization will become an important alternative to accelerate multimedia applications. But one major problem is how to exploit subword parallelism for ASIPs with limited resources. This paper shows that loop transformations such as loop unrolling, variable expansion, etc., can be utilized to create opportunities for subword parallelism, and presents a novel approach to recognize and extract subword parallelism based on Cost Subgragh (or, CSG). This approach is evaluated on Transport Triggered Architecture (TTA), a customizable processor architecture that is particularly suitable for tailoring the hardware resources according to the requirements of the application. In our experiment, 63.58% of loops and 85.64% of instructions in these loops can exploit subword parallelism. The results indicate that significant available subword parallelism would be attained using our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lee, R.B., Fiskiran, A.M.: PLX: A Fully Subword-Parallel Instruction Set Architecture for Fast Scalable Multimedia Processing. In: Proceedings of 3rd International Conference on Multimedia and Expo (ICME), Lausanne, Switzerland, pp. 117–120 (2002)

    Google Scholar 

  2. Larsen, S., Amarasinghe, S.: Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In: Proceedings of the SIGPLAN 2000 Conference on Pro-gramming Language Design and Implementation, Vancouver, BC, pp. 145–156 (2000)

    Google Scholar 

  3. De Sutter, B., Cristiaens, M., et al.: On the Use of Subword Parallelism in Medical Image Processing. Parallel Computing 24(9-10), 1537–1556 (1998)

    Article  MATH  Google Scholar 

  4. Corporaal, H., Arnold, M.: Using Transport Triggered Architectures for Embedded Processor Design. Integrated Computer-Aided Eng. 5(1), 19–38 (1998)

    Google Scholar 

  5. Cheong, G., Lam, M.S.: An Optimizer for Multimedia Instruction Sets. In: Second SUIF Compiler Workshop, Stanford (1997)

    Google Scholar 

  6. Konda, V., Lauer, H., Muroi, K., Tanaka, K., Tsubota, H., Xu, E., Wilson, C.: A SIMDizing C Compiler for the Mitsubishi Electric Neuro4 Processor Array. In: First SUIF Compiler Workshop, Stanford (1996)

    Google Scholar 

  7. Krall, A., Lelait, S.: Compilation Techniques for Multimedia Processors. International Journal of Parallel Programming 28(4), 347–361 (2000)

    Article  Google Scholar 

  8. Leupers, R.: Code Selection for Media Processors with SIMD Instructions. In: Proceedings of DATE 2000, Paris, France, pp. 4–8 (2000)

    Google Scholar 

  9. Ren, G., Wu, P., Padua, D.: A Preliminary Study on the Vectorization of Multimedia Applications for Multimedia Extensions. In: Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing, Texas A&M University, pp. 420–435 (2003)

    Google Scholar 

  10. Ren, G., Wu, P., Padua, D.: An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensio. In: Proceedings of the 19th IEEE Int. Parallel and Distributed Processing Symp (IPDPS 2005), Denver, USA (2005)

    Google Scholar 

  11. Larsen, S., Rabbah, R., Aarasinghe, S.: Exploiting Vector Parallelism in Software Pipelined Loops. In: Proceedings of the 38th Annual International Symposium on Microarchitecture, Barcelona, Spain, pp. 119–129 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ivan Stojmenovic Ruppa K. Thulasiram Laurence T. Yang Weijia Jia Minyi Guo Rodrigo Fernandes de Mello

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, M., Wu, G., Wang, Z. (2007). Instruction Selection for Subword Level Parallelism Optimizations for Application Specific Instruction Processors. In: Stojmenovic, I., Thulasiram, R.K., Yang, L.T., Jia, W., Guo, M., de Mello, R.F. (eds) Parallel and Distributed Processing and Applications. ISPA 2007. Lecture Notes in Computer Science, vol 4742. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74742-0_83

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74742-0_83

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74741-3

  • Online ISBN: 978-3-540-74742-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics