Large Matrix Multiplication on a Novel Heterogeneous Parallel DSP Architecture

Sohl, Joar; Wang, Jian; Liu, Dake

doi:10.1007/978-3-642-03644-6_32

Joar Sohl¹⁹,
Jian Wang¹⁹ &
Dake Liu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5737))

Included in the following conference series:

International Workshop on Advanced Parallel Processing Technologies

755 Accesses
5 Citations

Abstract

This paper introduces a novel master-multi-SIMD on-chip multi-core architecture for embedded signal processing. The parallel architecture and its memory subsystem are described in this paper. We evaluate the large size matrix multiplication performance on this parallel architecture and compare it with a SIMD-extended data parallel architecture. We also examine how well the new architecture scales for different numbers of SIMD co-processors. The experimental results show that the ePUMA architecture’s memory subsystem can effectively hide the data access overhead. With its 8-way SIMD data path and multi-SIMD parallel execution, the ePUMA architecture improves the performance of matrix multiplication with a speedup of 45x from the conventional SIMD extension.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liu, D.: Embedded DSP Processor Design, ch. 20. Morgen-Kaufmann, Linköping (2008)
Google Scholar
ARM Media Extensions, http://www.arm.com/products/CPUs/arch-simd.html
Tyler, J., Lent, J., Mather, A., Nauyen, H.: AltiVec^TM: Bringing Vector Technology to the PowerPC^TM Processor Family. In: IEEE International IPCCC 1999, February 10-12, pp. 437–444 (1999)
Google Scholar
Kumura, T., Ikekawa, M., Yosbida, M., Kuroda, I.: VLIW DSP for mobile applications. IEEE Signal Processing Magazine 19(4), 10–21 (2002)
Article Google Scholar
Chang, H., Cho, J., Sung, W.: Performance Evaluation of an SIMD Architecture with a Multi-bank Vector Memory Unit. IEEE SIPS, Banff, 71–76 (2006)
Google Scholar
Weiss, M., Fettweis, G.: Dynamic Codewidth Reduction for VLIW Instruction Set Architectures in Digital Signal Processors. In: 3rd International Workshop on Image ana’ Signal Processing, pp. 517–520 (1996)
Google Scholar
Ainsworth, T.W., Pinkston, T.M.: Characterizing The Cell Eib On-Chip Network. IEEE Micro 27(5), 6–14 (2007)
Article Google Scholar
Gössel, M., Rebel, B., Creutzburg, R.: Memory Architecture and Parallel Access. Elsevier Science, Amsterdam (1994)
MATH Google Scholar
Lundgren, B., Ödlund, A.: Expose of patterns in parallel memory access. Master thesis, Linköping university, LiTH-ISY-EX–07/4005-SE
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Linköping University, 581 83, Linköping, Sweden
Joar Sohl, Jian Wang & Dake Liu

Authors

Joar Sohl
View author publications
You can also search for this author in PubMed Google Scholar
Jian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dake Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National University of Defense Technology, Department of Computer Science, 410073, Changsha, P.R. China
Yong Dou
Lausanne (EPFL), Ecole Polytechnique Fédérale de ,Dépt. Physique, 1015, LAUSANNE, Switzerland
Ralf Gruber
Technik Rapperswil, HSR - Hochschule für, Oberseestr. 10, 8640, RAPPERSWIL , SCHWEIZ
Josef M. Joller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sohl, J., Wang, J., Liu, D. (2009). Large Matrix Multiplication on a Novel Heterogeneous Parallel DSP Architecture. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-03644-6_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03643-9
Online ISBN: 978-3-642-03644-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics