Skip to main content

Implementation and Optimization of Dense LU Decomposition on the Stream Processor

  • Conference paper
Parallel Processing and Applied Mathematics (PPAM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4967))

Abstract

Developing scientific computing applications on the stream processor has absorbed a lot of researchers attention. In this paper, we implement and optimize dense LU decomposition on the stream processor. Different from other existing parallel algorithms for LU decomposition, StreamLUD algorithm aims at exploiting producer-consumer locality and at overlapping chip-off memory access with kernel execution. Simulation results show that dealing with matrices of different sizes, compared with LUD of HPL on an Itanium 2 processor, StreamLUD we implement and optimize gets a speedup from 2.56 to 3.64 ultimately.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Merrimac: Stanford Streaming Supercomputer Project. Stanford University, http://merrimac.stanford.edu/

  2. Dally, W.J., Hanrahan, P., Erez, M., Knight, T.J., Labonté, F., Ahn, J.H., Jayasena, N., Kapasi, U.J., Das, A., Gummaraju, J., et al.: Merrimac: Supercomputing with Streams. In: Proceedings of the ACM/IEEE SC 2003 Conference (SC 2003), vol. 1, pp. 58113–695 (2003)

    Google Scholar 

  3. Erez, M., Ahn, J.H., Jayasena, N., Knight, T.J., Das, A., Labonté, F., Gummaraju, J., Dally, W.J., Hanrahan, P., Rosenblum, M.: Merrimac: Supercomputing with Streams. In: Proceedings of the, SIGGRAPH GP2 Workshop on General Purpose Computing on Graphics Processors (June 2004)

    Google Scholar 

  4. Fatica, M., Jameson, A., Alonso, J.J.: STREAMFLO: an Euler solver for streaming architectures. In: AIAA Conference (submitted)

    Google Scholar 

  5. Yuhua, T., Guibin, W.: Application and Study of Scientific Computing on Stream Processor. In: Proceedings of Advances on Computer Architecture (ACA 2006) (2006)

    Google Scholar 

  6. Yang, X.J., Du, J.: Implementation and Evaluation of Scientific Computing Programs on Imagine. In: Proceedings of Advances on Computer Architecture (ACA 2006) (2006)

    Google Scholar 

  7. Khailany, B.: The Vlsi Implementation And Evaluation Of Area-And Energy-Efficient Streaming Media Processors. PhD thesis, Stanford University (2003)

    Google Scholar 

  8. Owens, J.: Streaming architectures and technology trends. In: International Conference on Computer Graphics and Interactive Techniques (2005)

    Google Scholar 

  9. Rixner, S.: Stream Processor Architecture. Kluwer Academic Publishers, Dordrecht (2002)

    MATH  Google Scholar 

  10. Kapasi, U.J., Rixner, S., Dally, W.J., Khailany, B., Ahn, J.H., Mattson, P., Owens, J.D.: Programmable Stream Processors. Computer 36(8), 54–62 (2003)

    Article  Google Scholar 

  11. Mattson, P.R.: A programming system for the imagine media processor. PhD thesis, Stanford, CA, USA, Adviser-William J. Dally (2002)

    Google Scholar 

  12. Mattson, P., et al.: Imagine Programming System Developers Guide

    Google Scholar 

  13. Das, A., Mattson, P., Kapasi, U., Owens, J., Rixner, S., Jayasena, N.: Imagine Programming System Users Guide 2.0 (2004)

    Google Scholar 

  14. The Imagine Project. Stanford University, http://cva.stanford.edu/imagine/

  15. Kapasi, U.J., Dally, W.J., Rixner, S., Owens, J.D., Khailany, B.: The Imagine Stream Processor. In: Proceedings of 2002 IEEE International Conference on Computer Design, pp. 282–288 (2002)

    Google Scholar 

  16. Woo, S.C., Singh, J.P., Hennessy, J.L.: The Performance Advantages of Integrating Block Data Transfer in Cache-Coherent Multiprocessors

    Google Scholar 

  17. Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodologicalconsiderations. In: Proceedings. 22nd Annual International Symposium on Computer Architecture, pp. 24–36 (1995)

    Google Scholar 

  18. Zain, U., Ola, J., Magnus, S.: Programming & Implementation of Streaming Applications. Masters thesis, Computer and Electrical Engineering Halmstad University (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Roman Wyrzykowski Jack Dongarra Konrad Karczewski Jerzy Wasniewski

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, Y., Tang, T., Li, G., Yang, X. (2008). Implementation and Optimization of Dense LU Decomposition on the Stream Processor. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2007. Lecture Notes in Computer Science, vol 4967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68111-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68111-3_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68105-2

  • Online ISBN: 978-3-540-68111-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics