A Novel Application-specific Instruction-set Processor Design Approach for Video Processing Acceleration

Mbaye, Mame Maria; Bélanger, Normand; Savaria, Yvon; Pierre, Samuel

doi:10.1007/s11265-007-0050-0

A Novel Application-specific Instruction-set Processor Design Approach for Video Processing Acceleration

Published: 27 March 2007

Volume 47, pages 297–315, (2007)
Cite this article

The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology Aims and scope Submit manuscript

Mame Maria Mbaye¹,
Normand Bélanger¹,
Yvon Savaria¹ &
…
Samuel Pierre²

135 Accesses
5 Citations
Explore all metrics

Abstract

Application-specific instruction-set processors (ASIPs) provide a good alternative for video processing acceleration, but the productivity gap implied by such a new technology may prevent leveraging it fully. Video processing SoCs need flexibility that is not available in pure hardware architectures, while pure software solutions do not meet video processing performance constraints. Thus, ASIP design could offer a good tradeoff between performance and flexibility. Video processing algorithms are often characterized by intrinsic parallelism that can be accelerated by ASIP specialized instructions. In this paper, we propose a new approach for exploiting sequences of tightly coupled specialized instructions in ASIP design applicable to video processing. Our approach, which avoids costly data communications by applying data grouping and data reuse, consists of accelerating an algorithm’s critical loops by transforming them according to a new intermediate representation. This representation is optimized and loop parallelism possibilities are also explored. This approach has been applied to video processing algorithms such as the ELA deinterlacer and the 2D-DCT. Experimental results show speedups up to 18 (on the considered applications, while the hardware overhead in terms of additional logic gates was found to be between 18 and 59%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

L. V. Agostini, I. S. Silva, and S. Bampi, “Pipelined Fast 2D DCT Architecture for JPEG Image Compression,” in Proc. of the 14th Symposium on Integrated Circuits and Systems Design, Pirenópolis, Brazil, 2001, pp. 226–231.
A. Aiken and A. Nicolau, “Optimal Loop Parallelization,” in Proc. of the SIGPLAN ’88 Conference on Programming Language Design and Implementation, Atlanta, Georgia, USA, 1988, pp. 308–317.
ARM Ltd., “Amba Bus,” available at: http://www.arm.com.
M.-A. Cantin, Y. Savaria, D. Prodanos, and P. Lavoie, “An Automatic Word Length Determination Method,” in IEEE International Symposium on Circuits and Systems (ISCAS’2001) vol. 5, Sydney, Australia, May 2001, pp. 53–56.
N. Cheung, J. Henkel, and S. Parameswaran, Rapid Configuration and Instruction Selection for an ASIP: A Case Study, DATE’03, Munich, Germany, 2003, pp. 10802–10809.
N. Clark, J. Blome, M. Chu, S. Mahlke, S. Biles, and K. Flautner, “An Framework for Transparent Instruction Set Customization in Embedded Processors,” in Proc. of the 32nd International Symposium on Computer Architecture, ISCA’05, IEEE, Madison, Wisconsin USA, 2005, pp. 272–283.
J. Cong, Y. Fan, G. Han, A. Jagannathan, G. Reinman, and Z. Zhang, Instruction Set Extension with Shadow Registers for Configurable Processors, FPGA’05, Monterey, California, USA, Feb. 2005, pp. 99–106.
CoWare, “Lisatek,” 2005, http://www.coware.com/products/lisatek.
T. V. K. Gupta, R. E. Ko, and R. Barua, “Compiler-directed Customization of ASIP Cores,” in Proc. of 10th International Symposium on Hadware/Software Codesign, CODES’02, ACM, Estes Park, Colorado, USA, 2002, pp. 97–102.
D. Goodwin and D. Petkov, Automatic Generation of Application Specific Processors, CASES’03, San Jose, California, USA, 2003, pp. 137–147.
A. Hoffmann, T. Kogel, A. Nohl, G. Braun, O. Schliebusch, O. Wahlen, A. Wieferink, and H. Meyr, “A Novel Methodology for the Design of Application-Specific Instruction-Set Processors (ASIPs) Using a Machine Description Language,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 20, no. 11, Nov. 2001, pp. 1338–1354.
Article Google Scholar
M. Imai, N. Binh, and A. Shiomi, “A New HW/SW Partitioning Algorithm for Synthesizing the Highest Performance Pipelined ASIPs with Multiple Identical FUs,” in Proc. of European Design Automation Conference, EURO-VHDL’96, Geneva, Switzerland, 1996, pp. 126–131.
M. K. Jain, M. Balakrishnan, and A. Kumar, An Efficient Technique for Exploring Register File Size in ASIP Synthesis, CASES 2002, ACM, Grenoble, France, 2002, pp. 252–261.
Google Scholar
K. Karuri, M. A. Al Faruque, S. Kraemer, R. Leupers, G. Ascheid, and H. Meyr, Fine-grained Application Source Code Profiling for ASIP Design, ACM, DAC 2005, Anaheim, California, USA, 2005, pp. 329–334.
J. S. Lim, Two-dimensional Signal and Image Processing, Prentice-Hall, Signal Processing Series, 1990.
S. Lin, Y. Chang, and L. Chen, “Motion Adaptive Interpolation with Horizontal Motion Detection for Deinterlacing,” IEEE Trans. Consum. Electron., vol. 49, no. 4, Nov 2003, pp. 1256–1265.
Article Google Scholar
M. Mbaye, N. Bélanger, Y. Savaria, and S. Pierre, Application Specific Instruction-set Processor Generation for Video Processing Based on Loop Optimization, ISCAS ’05, IEEE, Kobe, Japan, May 2005, pp. 3515–3518.
Google Scholar
M. Mbaye, D. Lebel, N. Bélanger, Y. Savaria, and S. Pierre, Design Exploration with an Application-specific Instruction-set Processor for ELA Deinterlacing, ISCAS ’06, IEEE, Island of Kos, Greece, May, 2006, pp. 4607–4610.
H. Meyr, System-on-Chip Communications: The Dawn of ASIPs and the Dusk of ASICs, Signal Processing Systems, SIPS’2003, IEEE, Seoul, Korea, 2003, pp. 4–5.
Google Scholar
P. R. Panda, F. Cathoor, N. D. Dutt, K. Dankaert, E. Brockmeyer, C. Kulkarni, A. Vandercapelle, and P. G. Kjeldsberg, “Data and Memory Optimization Techniques for Embedded Systems,” ACM Transact. Des. Automat. Electron. Syst., vol. 6, no. 2, Apr. 2001, pp. 149–206.
Article Google Scholar
J. Park, P. C. Diniz, and K. R. S. Shayee, “Performance and Area Modeling of Complete FPGA Designs in the Presence of Loop transformations,” IEEE Trans. Comput., vol. 53, no. 11, Nov. 2004, pp. 1420–1435.
Article Google Scholar
L. Pozzi and K. Atasu, “Exact and Approximate Algorithms for the Extension of Embedded Processor Instruction Sets,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 25, no. 7, Jul. 2006, pp. 1209–1229.
Article Google Scholar
C. Shekhar, R. Singh, A. S. Mandal, S. C. Bose, R. Saini, and P. Tanwar, “Application Specific Instruction Set Processors: Redefining Hardware–software Boundary,” in Proc. of the 17th International Conference on VLSI Design, Mumbai, India, 2004, pp. 915–918.
B. Su, S. Ding, and J. Xia, “URPR—An Extension of URCR for Software Pipelining,” in Proc. of the 19th Microprogramming Workshop (MICRO-19), New-York, New-York, USA, 1986, pp. 94–103.
F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha, A Scalable Application-specific Processor Synthesis Methodology, ICCAD’2003, San Jose, California, USA, 2003, pp. 283–290.
D. C. Suresh, W. A. Najjar, F. Vahid, J. R. Villarreal, and G. Stitt, “Profiling Tools for Hardware/Software Partitioning of Embedded Applications,” in Proc. of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems (LCTES), San Diego, California, USA, 2003, pp. 189–198.
Synopsys Inc., “Design Compiler,” 2006, http://www.synopsys.com.
Tensilica Inc., “Xtensa Processor Generator and Xpress Compiler,” 2006, available: http://www.tensilica.com.
P. Yu and T. Mitra, Characterizing Embedded Applications for Instructions-set Extensible Processors, DAC’04, ACM, San Diego, California, USA, 2004, pp. 723–728.
Google Scholar
Wikipedia, “Data dependency,” 2007, http://en.wikipedia.org/wiki/Data_dependency.

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, École Polytechnique de Montréal, P. O. Box 6079, Station Centre-Ville, Montréal, QC, H3C 3A7, Canada
Mame Maria Mbaye, Normand Bélanger & Yvon Savaria
Department of Computer Engineering, École Polytechnique de Montréal, P. O. Box 6079, Station Centre-Ville, Montréal, QC, H3C 3A7, Canada
Samuel Pierre

Authors

Mame Maria Mbaye
View author publications
You can also search for this author in PubMed Google Scholar
Normand Bélanger
View author publications
You can also search for this author in PubMed Google Scholar
Yvon Savaria
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Pierre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mame Maria Mbaye.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mbaye, M.M., Bélanger, N., Savaria, Y. et al. A Novel Application-specific Instruction-set Processor Design Approach for Video Processing Acceleration. J VLSI Sign Process Syst Sign Image Video Technol 47, 297–315 (2007). https://doi.org/10.1007/s11265-007-0050-0

Download citation

Received: 19 September 2006
Accepted: 18 January 2007
Published: 27 March 2007
Issue Date: June 2007
DOI: https://doi.org/10.1007/s11265-007-0050-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Application-specific Instruction-set Processor Design Approach for Video Processing Acceleration

Abstract

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation