Abstract
Vector digital signal processors (DSPs) offer a good performance to power consumption ratio. Therefore, they are suitable for mobile devices in software defined radio applications. These vector DSPs require input algorithms with vector operations. The performance of vectorized algorithms to a great extent depends on the distribution of data on vector elements. Traditional algorithms for vectorization focus on the extraction of parallelism from a program; we propose an analysis tool that focuses on the selection of an efficient dynamic data mapping for vector DSPs. We transferred Garcia’s communication parallelism graph (Garcia et al., IEEE Trans Parallel Distrib Syst 12: 416–431, 2001) for distributed memory multiprocessor systems to vector DSPs. By alternating the representation of two-dimensional data distributions and the cost models, we are able to determine a dynamic mapping of data on vector elements on the Embedded Vector Processor (EVP) (van Berkel et al., Proceedings of the 2004 software-defined radio technical conference SDR’04, 2004). Additionally, we propose a new efficient algorithm for processing the graph representation that operates in two steps. We demonstrate the capabilities of our tool by describing the vectorization of some MIMO OFDM algorithms.
Similar content being viewed by others
References
Garcia, J., Ayguade, E., & Labarta, J. (2001). A framework for integrating data alignment, distribution, and redistribution in distributed memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 12, 416–431 (April).
van Berkel, C. H., Heinle, F., Meuwissen, P. P. E., Moerman, K., & Weiss, M. (2004). Vector processing as an enabler for software-defined radio in handsets from 3G+WLAN onwards. In Proceedings of the 2004 software-defined radio technical conference SDR’04. Scottsdale, Arizona, U.S.A. (September).
Rajagopal, S., Rixner, S., & Cavallaro, J. R. (2002). A programmable baseband processor design for software defined radios. In Proceedings of the 45th IEEE midwest symposium on circuits and systems conference, MWSCAS 2002 (pp. 413–416) (August).
Schwoerer, L., & Moerman, K. (2006). Benchmarking MIMO OFDM algorithms on the EVP. In GSPx 2006. Santa Clara, CA, USA (October–November).
Lorenz, M., Marwedel, P., Dräger, T., Fettweis, G., & Leupers, R. (2004). Compiler based exploration of DSP energy savings by SIMD operations. In ASP-DAC ’04: Proceedings of the 2004 conference on Asia South Pacific design automation (pp. 838–841). Piscataway: IEEE.
Russell, R. M. (1978). The CRAY-1 computer system. Communications of the ACM, 21(1), 63–72.
Raman, S. K., Pentkovski, V., & Keshava, J. (2000). Implementing streaming SIMD extensions on the Pentium III processor. IEEE Micro, 20(4), 47–57.
Larsen, S., Rabbah, R., & Amarasinghe, S. (2005). Exploiting vector parallelism in software pipelined loops. In MICRO 38: Proceedings of the 38th annual IEEE/ACM international symposium on microarchitecture (pp. 119–129). Washington, DC: IEEE Computer Society.
Wilson, R. P., French, R. S., Wilson, C. S., Amarasinghe, S. P., Anderson. J. M., et al. (1994). SUIF: An infrastructure for research on parallelizing and optimizing compilers. SIGPLAN Notices, 29(12), 31–37.
Allen, R., & Kennedy, K. (1987). Automatic translation of FORTRAN programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4), 491–542.
Tarjan, R. (1972). Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2), 146–160.
Darte, A., & Vivien, F. (1996). On the optimality of Allen and Kennedy’s algorithm for parallelism extraction in nested loops. In Euro-Par ’96: Proceedings of the second international euro-par conference on parallel processing (pp. 379–388). London: Springer.
Glossner, J., & Iancu, D. (2006). The Sandbridge SB3011 SDR platform. In Proceedings of the symposium on trends in communications (SympoTIC06). Bratislava, Slovakia.
Jintukar, S., Glossner, J., Kotlyar, V., & Moudgill, M. (2004). The Sandblaster automatic multithreaded vectorizing compiler. In 2004 global signal processing expo (GSPx) and international signal processing conference (ISPC). Santa Clara, California.
Anderson, J. M., & Lam, M. S. (1993). Global optimizations for parallelism and locality on scalable parallel machines. In PLDI ’93: Proceedings of the ACM SIGPLAN 1993 conference on programming language design and implementation (pp. 112–125). New York: ACM.
Ramanujam, J., & Sadayappan, P. (1991). Compile-time techniques for data distribution in distributed memory machines. IEEE Transactions on Parallel and Distributed Systems, 2(4), 472–482.
Ozcan, E., & Onbasioglu, E. (2004). Genetic algorithms for parallel code optimization. In Proceedings of the 2004 IEEE congress on evolutionary computation (pp. 1375–1381). Portland: IEEE (June).
Kennedy, K., & Kremer, U. (1998). Automatic data layout for distributed-memory machines. ACM Transactions on Programming Languages and Systems, 20(4), 869–916.
Bixby, R. E., Kennedy, K., & Kremer, U. (1994). Automatic data layout using 0-1 integer programming. In PACT ’94: Proceedings of the IFIP WG10.3 working conference on parallel architectures and compilation techniques (pp. 111–122). Amsterdam: North-Holland.
Garcia, J., Ayguade, E., & Labarta, J. (1996) Dynamic data distribution with control flow analysis. In Supercomputing ’96: Proceedings of the 1996 ACM/IEEE conference on supercomputing (CDROM) (p. 11). Washington, DC: IEEE Computer Society.
Allen, J. R., Kennedy, K., Porterfield, C., & Warren, J. (1983). Conversion of control dependence to data dependence. In POPL ’83: Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on principles of programming languages (pp. 177–189). New York: ACM.
Li, J., & Chen, M. (1990). Index domain alignment: Minimizing cost of cross-reference between distributed arrays. In Proc. 3rd symp. frontiers massively computation (October).
Hart, P. E., Nilsson, N. J., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2), 100–107.
Pugh, W. (1991). The omega test: A fast and practical integer programming algorithm for dependence analysis. In Supercomputing ’91: Proceedings of the 1991 ACM/IEEE conference on supercomputing (pp. 4–13). New York: ACM.
Guo, Y., & McCain, D. (2005). Reduced QRD-M detector in MIMO-OFDM systems with partial and embedded sorting. In Global telecommunications conference (GLOBECOM ’05).
Acknowledgements
This work has been sponsored in part by the German Federal Ministry of Education and Research within the scope of the Wireless Gigabit With Advanced Multimedia Support (WIGWAM) project.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Westermann, P., Schwoerer, L. & Kaufmann, A. Applying Data Mapping Techniques to Vector DSPs. J Sign Process Syst Sign Image Video Technol 57, 57–72 (2009). https://doi.org/10.1007/s11265-008-0170-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-008-0170-1