Skip to main content

Performance Improvement of Multimedia Kernels by Alleviating Overhead Instructions on SIMD Devices

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5737))

Abstract

SIMD extension is one of the most common and effective technique to exploit data-level parallelism in today’s processor designs. However, the performance of SIMD architectures is limited by some constraints such as mismatch between the storage and the computational formats and using data permutation instructions during vectorization. In our previous work we have proposed two architectural modifications, the extended subwords and the Matrix Register File (MRF) to alleviate the limitations. The extended subwords, uses four extra bits for every byte in a media register and it provides additional parallelism. The MRF allows flexible row-wise as well as column-wise access to the register file and it eliminates data permutation instructions. We have validated the combination of the proposed techniques by studying the performance of some multimedia kernels. In this paper, we analysis each proposed technique separately. In other words, we answer the following questions in this paper. How much of the performance gain is a result of the additional parallelism? and how much is due to the elimination of data permutation instructions? The results show that employing the MRF and extended subwords separately obtains the speedup less than 1 and 1.15, respectively. In other words, our results indicate that using either extended subwords or the MRF techniques is insufficient to eliminate most pack/unpack and rearrangement overhead instructions on SIMD processors. The combination of both techniques, on the other hand, yields much more performance benefits than each technique.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Advanced Micro Devices Inc. 3DNow Technology Manual (2000)

    Google Scholar 

  2. Austin, T., Larson, E., Ernst, D.: SimpleScalar: An Infrastructure for Computer System Modeling. IEEE Computer 35(2), 59–67 (2002)

    Article  Google Scholar 

  3. Bannon, P., Saito, Y.: The Alpha 21164PC Microprocessor. In: IEEE Proc. Compcon 1997, February 1997, pp. 20–27 (1997)

    Google Scholar 

  4. Diefendorff, K., Dubey, P.K., Hochsprung, R., Scales, H.: AltiVec Extension to PowerPC Accelerates Media Processing. IEEE Micro 20(2), 85–95 (2000)

    Article  Google Scholar 

  5. Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 3rd edn. Morgan Kaufmann, San Francisco (2002)

    MATH  Google Scholar 

  6. Huang, L., Lai, M., Dai, K., Yue, H., Shen, L.: Hardware Support for Arithmetic Units of Processor with Multimedia Extension. In: Proc. IEEE Int. Conf. on Multimedia and Ubiquitous Engineering, April 2007, pp. 633–637 (2007)

    Google Scholar 

  7. IBM. Synergistic Processor Unit Instruction Set Architecture (January 2007)

    Google Scholar 

  8. Jennings, M.D., Conte, T.M.: Subword Extensions for Video Processing on Mobile Systems. IEEE Concurrency 6(3), 13–16 (1998)

    Article  Google Scholar 

  9. Juurlink, B., Borodin, D., Meeuws, R.J., Aalbers, G.T., Leisink, H.: The SimpleScalar Instruction Tool (SSIT) and the SimpleScalar Architecture Tool (SSAT), http://ce.et.tudelft.nl/~shahbahrami/

  10. Lee, R.B.: Subword Parallelism with MAX-2. IEEE Micro 16(4), 51–59 (1996)

    Article  Google Scholar 

  11. Loeffler, C., Ligtenberg, A., Moschytz, G.S.: Practical Fast 1-D DCT Algorithms With 11 Multiplications. In: Proc. Int. Conf. on Acoustical and Speech and Signal Processing, May 1989, pp. 988–991 (1989)

    Google Scholar 

  12. Peleg, A., Weiser, U.: MMX Technology Extension to the Intel Architecture. IEEE Micro 16(4), 42–50 (1996)

    Article  Google Scholar 

  13. Raman, S.K., Pentkovski, V., Keshava, J.: Implementing Streaming SIMD Extensions on the Pentium 3 Processor. IEEE Micro 20(4), 47–57 (2000)

    Article  Google Scholar 

  14. Ranganathan, P., Adve, S., Jouppi, N.P.: Performance of Image and Video Processing with General Purpose Processors and Media ISA Extensions. In: Proc. Int. Symp. on Computer Architecture, pp. 124–135 (1999)

    Google Scholar 

  15. Shahbahrami, A.: Avoiding Conversion and Rearrangement Overhead in SIMD Architectures. PhD thesis, Delft University of Technology (September 2008)

    Google Scholar 

  16. Shahbahrami, A., Juurlink, B., Vassiliadis, S.: Versatility of Extended Subwords and the Matrix Register File. ACM Transactions on Architecture and Code Optimization (TACO) 5(1) (May 2008)

    Google Scholar 

  17. Tremblay, M., Michael O’Connor, J., Narayanan, V., He, L.: VIS Speeds New Media Processing. IEEE Micro 16(4), 10–20 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shahbahrami, A., Juurlink, B. (2009). Performance Improvement of Multimedia Kernels by Alleviating Overhead Instructions on SIMD Devices. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03644-6_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03643-9

  • Online ISBN: 978-3-642-03644-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics