Abstract
Although current SIMD processor architectures can improve the processing performance by exploiting the data level parallelism inherent in video applications, an important performance penalty appears when processing data that is not formatted in an amenable way, e.g. unaligned memory access. This paper presents an enhanced DMA controller that performs block-based data transfers and a realignment when accessing a word in an external memory that is not aligned with the natural data memory/bus width boundary. Moreover, the enhanced DMA controller performs a signal extension while accessing data outside a specific region, e.g. a video frame, decreasing the total amount of processing cycles required for a typical video application. Performance improvements of up to 25% can be achieved when running a highly time consuming video encoding task (motion estimation) on a generic VLIW architecture with the enhanced DMA controller compared to a basic block-transfer DMA controller.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lee, R.: Subword parallelism with MAX-2. Micro. IEEE 16(4), 51–59 (1996)
Slingerland, N., Smith, A.: Measuring the Performance of Multimedia Instruction Sets. IEEE Transactions on Computers 51(11), 1317–1332 (2002)
Alvarez, M., Salami, E., Ramirez, A., Valero, M.: Performance Impact of Unaligned Memory Operations in SIMD Extensions for Video Codec Applications. In: Proc. IEEE International Symposium on Performance Analysis of Systems & Software ISPASS 2007, pp. 62–71 (2007)
ARM Limited: AMBA AXI Protocol Specification, v1.0. (2004)
OCP-IP Association: Open Core Protocol Specifications, Release 1.0. (2001)
Lee, R.: Subword Permutation Instructions for TwoDimensional Multimedia Processing in MicroSIMD Architectures. In: Proc. IEEE International Conference on Application-Specific Systems, Architectures, and Processors, July 2000, pp. 3–14 (2000)
Diefendorff, K., Dubey, P., Hochsprung, R., Scale, H.: AltiVec extension to PowerPC accelerates media processing. IEEE Micro 20(2), 85–95 (2000)
Boggs, D., Baktha, A., et al.: The Microarchitecture of the Intel Pentium 4 Processor on 90nm Technology. Intel Technology Journal 8(1), 7–23 (2004)
van de Waerdt, J.W., Vassiliadis, S., Das, S., Mirolo, S., Yen, C., Zhong, B., Basto, C., van Itegem, J.P., Amirtharaj, D., Kalra, K., Rodriguez, P., van Antwerpen, H.: The TM3270 Media-processor. In: Proc. 38th Annual IEEE/ACM International Symposium on MICRO-38 Microarchitecture, p. 12 (2005)
Texas Instruments: TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide. SPRU732C edn. (August 2006)
Fridman, J.: Data Alignment for Sub-word Parallelism in DSP. In: Proc. IEEE Workshop on Signal Processing Systems SiPS 1999, pp. 251–260 (1999)
Reino-Gomez, F.: Design Space Exploration of an Advanced Direct Memory Access Unit for a Generic VLIW Processor. Master’s thesis, Institute of Microelectronic Systems, Leibniz Universitaet Hannover (July 2008)
Beric, A.: Video Post Processing Architectures. PhD thesis, Technische Universiteit Eindhoven (2008)
ISO/IEC 14496-10: Coding of Audiovisual Objects - Part 10: Advanced Video Coding (2003)
ARM Limited: AMBA Specification, v2.0 (1999)
Katz, D., Gentile, R.: Embedded Media Processing. Elsevier, Amsterdam (2006)
Payá-Vayá, G., Martín-Langerwerf, J., Pirsch, P.: RAPANUI: Rapid Prototyping for Media Processor Architecture Exploration, pp. 32–40. Springer, Heidelberg (2005)
Payá-Vayá, G., Martín-Langerwerf, J., Pirsch, P.: Design Space Exploration of Media Processors: A Generic VLIW Architecture and a Parameterized Scheduler. In: Lukowicz, P., Thiele, L., Tröster, G. (eds.) ARCS 2007. LNCS, vol. 4415, pp. 254–267. Springer, Heidelberg (2007)
de Haan, G., Biezen, P., Huijgen, H., Ojo, O.: True-motion estimation with 3-D recursive search block matching. IEEE Transactions on Circuits and Systems for Video Technology 3(5), 368–379, 388 (1993)
de Haan, G., Biezen, P.: Sub-pixel Motion Estimation with 3-D Recursive Search Block-maching. Signal Processing: Image Communication 6, 229–239 (1994)
Vivancos-SanNicolas, I.: VDHL-Implementation and Verification of an Advanced Direct Memory Access Unit for a Generic VLIW Processor. Master’s thesis, Institute of Microelectronic Systems, Leibniz Universitaet Hannover (July 2008)
Synopsys: Design Compiler User Guide. Synopsys. version Z-2007.03 edn. (2007)
Taiwan Semiconductor Manufacturing Company Limited (TSMC): TSMC 0.13um Core Library Databook. Release 2.1 edn. (October 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Payá-Vayá, G., Martín-Langerwerf, J., Moch, S., Pirsch, P. (2009). An Enhanced DMA Controller in SIMD Processors for Video Applications. In: Berekovic, M., Müller-Schloer, C., Hochberger, C., Wong, S. (eds) Architecture of Computing Systems – ARCS 2009. ARCS 2009. Lecture Notes in Computer Science, vol 5455. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00454-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-00454-4_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00453-7
Online ISBN: 978-3-642-00454-4
eBook Packages: Computer ScienceComputer Science (R0)