Applying Data Mapping Techniques to Vector DSPs

Westermann, Peter; Schwoerer, Ludwig; Kaufmann, Andre

doi:10.1007/s11265-008-0170-1

Applying Data Mapping Techniques to Vector DSPs

Published: 05 April 2008

Volume 57, pages 57–72, (2009)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Peter Westermann¹,
Ludwig Schwoerer² &
Andre Kaufmann³

123 Accesses
Explore all metrics

Abstract

Vector digital signal processors (DSPs) offer a good performance to power consumption ratio. Therefore, they are suitable for mobile devices in software defined radio applications. These vector DSPs require input algorithms with vector operations. The performance of vectorized algorithms to a great extent depends on the distribution of data on vector elements. Traditional algorithms for vectorization focus on the extraction of parallelism from a program; we propose an analysis tool that focuses on the selection of an efficient dynamic data mapping for vector DSPs. We transferred Garcia’s communication parallelism graph (Garcia et al., IEEE Trans Parallel Distrib Syst 12: 416–431, 2001) for distributed memory multiprocessor systems to vector DSPs. By alternating the representation of two-dimensional data distributions and the cost models, we are able to determine a dynamic mapping of data on vector elements on the Embedded Vector Processor (EVP) (van Berkel et al., Proceedings of the 2004 software-defined radio technical conference SDR’04, 2004). Additionally, we propose a new efficient algorithm for processing the graph representation that operates in two steps. We demonstrate the capabilities of our tool by describing the vectorization of some MIMO OFDM algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Vector Memory Unit for SIMD DSP

SWIFT: A Computationally-Intensive DSP Architecture for Communication Applications

Article 30 April 2016

Haoqi Ren, Zhifeng Zhang & Jun Wu

Automated Compiler Optimization of Multiple Vector Loads/Stores

Article 09 January 2017

Farhana Aleen, Vyacheslav P. Zakharin, … Chang-Sun Lin Jr

References

Garcia, J., Ayguade, E., & Labarta, J. (2001). A framework for integrating data alignment, distribution, and redistribution in distributed memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 12, 416–431 (April).
Article Google Scholar
van Berkel, C. H., Heinle, F., Meuwissen, P. P. E., Moerman, K., & Weiss, M. (2004). Vector processing as an enabler for software-defined radio in handsets from 3G+WLAN onwards. In Proceedings of the 2004 software-defined radio technical conference SDR’04. Scottsdale, Arizona, U.S.A. (September).
Rajagopal, S., Rixner, S., & Cavallaro, J. R. (2002). A programmable baseband processor design for software defined radios. In Proceedings of the 45th IEEE midwest symposium on circuits and systems conference, MWSCAS 2002 (pp. 413–416) (August).
Schwoerer, L., & Moerman, K. (2006). Benchmarking MIMO OFDM algorithms on the EVP. In GSPx 2006. Santa Clara, CA, USA (October–November).
Lorenz, M., Marwedel, P., Dräger, T., Fettweis, G., & Leupers, R. (2004). Compiler based exploration of DSP energy savings by SIMD operations. In ASP-DAC ’04: Proceedings of the 2004 conference on Asia South Pacific design automation (pp. 838–841). Piscataway: IEEE.
Google Scholar
Russell, R. M. (1978). The CRAY-1 computer system. Communications of the ACM, 21(1), 63–72.
Article Google Scholar
Raman, S. K., Pentkovski, V., & Keshava, J. (2000). Implementing streaming SIMD extensions on the Pentium III processor. IEEE Micro, 20(4), 47–57.
Article Google Scholar
Larsen, S., Rabbah, R., & Amarasinghe, S. (2005). Exploiting vector parallelism in software pipelined loops. In MICRO 38: Proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture (pp. 119–129). Washington, DC: IEEE Computer Society.
Google Scholar
Wilson, R. P., French, R. S., Wilson, C. S., Amarasinghe, S. P., Anderson. J. M., et al. (1994). SUIF: An infrastructure for research on parallelizing and optimizing compilers. SIGPLAN Notices, 29(12), 31–37.
Article Google Scholar
Allen, R., & Kennedy, K. (1987). Automatic translation of FORTRAN programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4), 491–542.
Article MATH Google Scholar
Tarjan, R. (1972). Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2), 146–160.
Article MATH MathSciNet Google Scholar
Darte, A., & Vivien, F. (1996). On the optimality of Allen and Kennedy’s algorithm for parallelism extraction in nested loops. In Euro-Par ’96: Proceedings of the second international euro-par conference on parallel processing (pp. 379–388). London: Springer.
Google Scholar
Glossner, J., & Iancu, D. (2006). The Sandbridge SB3011 SDR platform. In Proceedings of the symposium on trends in communications (SympoTIC06). Bratislava, Slovakia.
Jintukar, S., Glossner, J., Kotlyar, V., & Moudgill, M. (2004). The Sandblaster automatic multithreaded vectorizing compiler. In 2004 global signal processing expo (GSPx) and international signal processing conference (ISPC). Santa Clara, California.
Anderson, J. M., & Lam, M. S. (1993). Global optimizations for parallelism and locality on scalable parallel machines. In PLDI ’93: Proceedings of the ACM SIGPLAN 1993 conference on programming language design and implementation (pp. 112–125). New York: ACM.
Chapter Google Scholar
Ramanujam, J., & Sadayappan, P. (1991). Compile-time techniques for data distribution in distributed memory machines. IEEE Transactions on Parallel and Distributed Systems, 2(4), 472–482.
Article Google Scholar
Ozcan, E., & Onbasioglu, E. (2004). Genetic algorithms for parallel code optimization. In Proceedings of the 2004 IEEE congress on evolutionary computation (pp. 1375–1381). Portland: IEEE (June).
Chapter Google Scholar
Kennedy, K., & Kremer, U. (1998). Automatic data layout for distributed-memory machines. ACM Transactions on Programming Languages and Systems, 20(4), 869–916.
Article Google Scholar
Bixby, R. E., Kennedy, K., & Kremer, U. (1994). Automatic data layout using 0-1 integer programming. In PACT ’94: Proceedings of the IFIP WG10.3 working conference on parallel architectures and compilation techniques (pp. 111–122). Amsterdam: North-Holland.
Google Scholar
Garcia, J., Ayguade, E., & Labarta, J. (1996) Dynamic data distribution with control flow analysis. In Supercomputing ’96: Proceedings of the 1996 ACM/IEEE conference on supercomputing (CDROM) (p. 11). Washington, DC: IEEE Computer Society.
Google Scholar
Allen, J. R., Kennedy, K., Porterfield, C., & Warren, J. (1983). Conversion of control dependence to data dependence. In POPL ’83: Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on principles of programming languages (pp. 177–189). New York: ACM.
Chapter Google Scholar
Li, J., & Chen, M. (1990). Index domain alignment: Minimizing cost of cross-reference between distributed arrays. In Proc. 3rd symp. frontiers massively computation (October).
Hart, P. E., Nilsson, N. J., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2), 100–107.
Article Google Scholar
Pugh, W. (1991). The omega test: A fast and practical integer programming algorithm for dependence analysis. In Supercomputing ’91: Proceedings of the 1991 ACM/IEEE conference on supercomputing (pp. 4–13). New York: ACM.
Chapter Google Scholar
Guo, Y., & McCain, D. (2005). Reduced QRD-M detector in MIMO-OFDM systems with partial and embedded sorting. In Global telecommunications conference (GLOBECOM ’05).

Download references

Acknowledgements

This work has been sponsored in part by the German Federal Ministry of Education and Research within the scope of the Wireless Gigabit With Advanced Multimedia Support (WIGWAM) project.

Author information

Authors and Affiliations

Circuits and Systems Lab, Dortmund University of Technology, Otto-Hahn-Str. 4, 44221, Dortmund, Germany
Peter Westermann
Institute for Communication & Electronics, Bochum University of Applied Sciences, Lennershofstr. 140, 44801, Bochum, Germany
Ludwig Schwoerer
Nokia GmbH, Meesmannstr. 103, 44807, Bochum, Germany
Andre Kaufmann

Authors

Peter Westermann
View author publications
You can also search for this author in PubMed Google Scholar
Ludwig Schwoerer
View author publications
You can also search for this author in PubMed Google Scholar
Andre Kaufmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Westermann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Westermann, P., Schwoerer, L. & Kaufmann, A. Applying Data Mapping Techniques to Vector DSPs. J Sign Process Syst Sign Image Video Technol 57, 57–72 (2009). https://doi.org/10.1007/s11265-008-0170-1

Download citation

Received: 10 September 2007
Revised: 21 December 2007
Accepted: 04 March 2008
Published: 05 April 2008
Issue Date: October 2009
DOI: https://doi.org/10.1007/s11265-008-0170-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying Data Mapping Techniques to Vector DSPs

Abstract

Access this article

Similar content being viewed by others

An Efficient Vector Memory Unit for SIMD DSP

SWIFT: A Computationally-Intensive DSP Architecture for Communication Applications

Automated Compiler Optimization of Multiple Vector Loads/Stores

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Applying Data Mapping Techniques to Vector DSPs

Abstract

Access this article

Similar content being viewed by others

An Efficient Vector Memory Unit for SIMD DSP

SWIFT: A Computationally-Intensive DSP Architecture for Communication Applications

Automated Compiler Optimization of Multiple Vector Loads/Stores

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation