Abstract
In this paper we propose a Language-Extension-based Vectorizing Compiling Scheme (LEVCS) for a newly developed DSP. The DSP is mainly designed for Software-Defined Radio (SDR) and is called SDR-DSP. The SDR-DSP architecture mixes the styles of VLIW (Very Long Instruction Word) and SIMD (Single Instruction Multiple Data). To explore the potential of SDR-DSP and achieve high performance, vectorization is one of the must equipped critical methods. Because auto-vectorization techniques cannot satisfy the requirements of the typical application, LEVCS is used to direct the vectorization. The C-extending programming language used in LEVCS is called SDR-DSP-C. LEVCS uses flexible data reorganization to make vectorization on SDR-DSP more efficient. We use LEVCS to vectorize five benchmark kernels: Fast Fourier Transform (FFT), Finite Impulse Responsefilter (FIR) and Infinite Impulse Response filter (IIR), Dot product implementation (Dotprod), Sum of vectors (vecsum). Experiment results show that LEVCS is functional correct and can achieve 2.883–8.074 speedups comparing to TI-DSPs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Harada, H., Kuroda, M., Morikawa, H., Wakana, H., Adachi, F.: The overview of the new generation mobile communication system and the role of software defined RADIO Technology. IEICE Trans. Commun. E86-B(12), 3374–3384 (2003)
Jo, G.-D., Sheen, M.-J., Lee, S.-H., Cho, K.-R.: ADSP-Based reconfigurable SDR platform for 3G systems. IEICE Trans. Commun. E88-B(2), 678–686 (2005)
Wally, H.W.: Tuttlebee: Software Defined Radio: Enabling Technologies. Wiley, Chichester (2002)
He, X., Jin, X., Wang, M., Zhou, D., Goto, S.: A 98 GMACs/W 32-core vector processor in 65 nm CMOS. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E94-A(12), 2609–2618 (2011)
Tanaka, H., Takeuchi, Y., Sakanushi, K., Mai, M., Tagawa, H., Ota, Y., Matsumoto, N.: Generation of pack instruction sequence for media processors using multi-valued decision diagram. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E90-A(12), 2800–2809 (2007)
Fisher, J.: Very long instruction word architectures and the ELI-512. In: Proceedings of the Tenth Annual International Symposium on Computer Architecture, pp. 140–150 (1983)
Lorenz M,, Wehmeyer L, Drager T.: Energy aware compilation for DSPs with SIMD instructions. In: Proceedings of the Joint Conference on Languages, Compilers and Tools for Embedded Systems, LCTES/SCOPES 2002, pp. 94–101. ACM Press (2002)
Gardner, J.S.: CEVA exposes DSP six pack XC4000 family uses coprocessors to buff up the baseband. The Linley Group, Microprocessor Report, March 2012
CEVA-XC4000. CEVA, Inc. (2012). http://www.ceva-dsp.com/CEVA-XC4000.html
Balaish, E.: Architecture oriented C optimizations, White Paper, CEVA, Inc., January 2010
Balaish, E.: Combining C code with assembly code in DSP applications. White Paper, CEVA, Inc., August 2009
Texas Instruments: TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor Data Manual. SPRS691C, February 2012
Texas Instruments: TMS320C6000 optimizing compiler v7.3 user’s guide. SPRU187T, July 2011
Maleki, S., Gao, Y., Jess Garzar¢n, M., Wong, T., Padua, D.A.: An evaluation of vectorizing compilers. In: PACT, pp. 372–382 (2011)
Jung, Y., Yoon, H., Kim, J.: New efficient FFT algorithm and pipeline implementation results for OFDM/DMT applications. IEEE Trans. Consum. Electron. 49(1), 14–20 (2003)
Bouknight, W.J., Denenberg, S.A., McIntyre, D.E., Randall, J.M., Sameh, A.H., Slotnick, D.L.: The Illiac IV system. In: Proceedings of the IEEE, vol. 60, no. 4, pp. 369–388, April 1972
Swoop, P., Schmittler, J.: RPU: a programmable ray processing unit for realtime ray tracing. ACM Trans. Graph. 24(3), 434–444 (2005)
Lee, Y., Avizienis, R., Bishara, A., et al.: Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators. In: Proceedings of the IEEE International Symposium on Computer Architecture, San Jose, USA, pp. 129–140, June 2011
Krashinsky, R. Hampton, M., Gerding, S., Batten, C.: The vector-thread architecture, In: Proceedings of the IEEE International Symposium on Computer Architecture, Saint-Malo, France, pp. 37–48, June 2004
Fung, W.W.L., Sham, I., Yuan, G., et al.: Dynamic Warp Formation And Scheduling For efficient GPU control flow. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 407–420 (2007)
Fung, W.W.L., Sham, I., Yuan, G., et al.: Dynamic warp formation: efficient MIMD control flow on SIMD graphics hardware. ACM Trans. Archit. Code Optim. 6(2), 1544–3566 (2009)
Wang, Y., Chen, S., Zhang, K., Wan, J., Xiaowen Chen, H., Chen, H.W.: Instruction shuffle: achieving MIMD-like performance on SIMD architectures. IEEE Comput. Archit. Lett. 11(2), 37–40 (2012)
Kapasi, U., Dally, W.J., Rixner, S., et al.: Efficient conditional operations for data-parallel architectures. In: Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture,pp. 159–170. ACM, NewYork (2000)
Texas Instruments: TMS320C6000 Optimizing Compiler v7.4 User’s Guide, SPRU187 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ni, X., Yang, L., Ma, C. (2016). Language-Extension-Based Vectorizing Compiling Scheme on SDR-DSP. In: Xu, W., Xiao, L., Li, J., Zhang, C., Zhu, Z. (eds) Computer Engineering and Technology. NCCET 2016. Communications in Computer and Information Science, vol 666. Springer, Singapore. https://doi.org/10.1007/978-981-10-3159-5_2
Download citation
DOI: https://doi.org/10.1007/978-981-10-3159-5_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3158-8
Online ISBN: 978-981-10-3159-5
eBook Packages: Computer ScienceComputer Science (R0)