Implementation and Optimization of Dense LU Decomposition on the Stream Processor

Zhang, Ying; Tang, Tao; Li, Gen; Yang, Xuejun

doi:10.1007/978-3-540-68111-3_9

Ying Zhang¹,
Tao Tang¹,
Gen Li¹ &
…
Xuejun Yang¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4967))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

883 Accesses
1 Citations

Abstract

Developing scientific computing applications on the stream processor has absorbed a lot of researchers attention. In this paper, we implement and optimize dense LU decomposition on the stream processor. Different from other existing parallel algorithms for LU decomposition, StreamLUD algorithm aims at exploiting producer-consumer locality and at overlapping chip-off memory access with kernel execution. Simulation results show that dealing with matrices of different sizes, compared with LUD of HPL on an Itanium 2 processor, StreamLUD we implement and optimize gets a speedup from 2.56 to 3.64 ultimately.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Merrimac: Stanford Streaming Supercomputer Project. Stanford University, http://merrimac.stanford.edu/
Dally, W.J., Hanrahan, P., Erez, M., Knight, T.J., Labonté, F., Ahn, J.H., Jayasena, N., Kapasi, U.J., Das, A., Gummaraju, J., et al.: Merrimac: Supercomputing with Streams. In: Proceedings of the ACM/IEEE SC 2003 Conference (SC 2003), vol. 1, pp. 58113–695 (2003)
Google Scholar
Erez, M., Ahn, J.H., Jayasena, N., Knight, T.J., Das, A., Labonté, F., Gummaraju, J., Dally, W.J., Hanrahan, P., Rosenblum, M.: Merrimac: Supercomputing with Streams. In: Proceedings of the, SIGGRAPH GP2 Workshop on General Purpose Computing on Graphics Processors (June 2004)
Google Scholar
Fatica, M., Jameson, A., Alonso, J.J.: STREAMFLO: an Euler solver for streaming architectures. In: AIAA Conference (submitted)
Google Scholar
Yuhua, T., Guibin, W.: Application and Study of Scientific Computing on Stream Processor. In: Proceedings of Advances on Computer Architecture (ACA 2006) (2006)
Google Scholar
Yang, X.J., Du, J.: Implementation and Evaluation of Scientific Computing Programs on Imagine. In: Proceedings of Advances on Computer Architecture (ACA 2006) (2006)
Google Scholar
Khailany, B.: The Vlsi Implementation And Evaluation Of Area-And Energy-Efficient Streaming Media Processors. PhD thesis, Stanford University (2003)
Google Scholar
Owens, J.: Streaming architectures and technology trends. In: International Conference on Computer Graphics and Interactive Techniques (2005)
Google Scholar
Rixner, S.: Stream Processor Architecture. Kluwer Academic Publishers, Dordrecht (2002)
MATH Google Scholar
Kapasi, U.J., Rixner, S., Dally, W.J., Khailany, B., Ahn, J.H., Mattson, P., Owens, J.D.: Programmable Stream Processors. Computer 36(8), 54–62 (2003)
Article Google Scholar
Mattson, P.R.: A programming system for the imagine media processor. PhD thesis, Stanford, CA, USA, Adviser-William J. Dally (2002)
Google Scholar
Mattson, P., et al.: Imagine Programming System Developers Guide
Google Scholar
Das, A., Mattson, P., Kapasi, U., Owens, J., Rixner, S., Jayasena, N.: Imagine Programming System Users Guide 2.0 (2004)
Google Scholar
The Imagine Project. Stanford University, http://cva.stanford.edu/imagine/
Kapasi, U.J., Dally, W.J., Rixner, S., Owens, J.D., Khailany, B.: The Imagine Stream Processor. In: Proceedings of 2002 IEEE International Conference on Computer Design, pp. 282–288 (2002)
Google Scholar
Woo, S.C., Singh, J.P., Hennessy, J.L.: The Performance Advantages of Integrating Block Data Transfer in Cache-Coherent Multiprocessors
Google Scholar
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodologicalconsiderations. In: Proceedings. 22nd Annual International Symposium on Computer Architecture, pp. 24–36 (1995)
Google Scholar
Zain, U., Ola, J., Magnus, S.: Programming & Implementation of Streaming Applications. Masters thesis, Computer and Electrical Engineering Halmstad University (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, National University of Defense Technology, 410073, Changsha, China
Ying Zhang, Tao Tang, Gen Li & Xuejun Yang

Authors

Ying Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Gen Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuejun Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Roman Wyrzykowski Jack Dongarra Konrad Karczewski Jerzy Wasniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Tang, T., Li, G., Yang, X. (2008). Implementation and Optimization of Dense LU Decomposition on the Stream Processor. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2007. Lecture Notes in Computer Science, vol 4967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68111-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-68111-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68105-2
Online ISBN: 978-3-540-68111-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics