Skip to main content
Log in

A Novel Application-specific Instruction-set Processor Design Approach for Video Processing Acceleration

  • Published:
The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology Aims and scope Submit manuscript

Abstract

Application-specific instruction-set processors (ASIPs) provide a good alternative for video processing acceleration, but the productivity gap implied by such a new technology may prevent leveraging it fully. Video processing SoCs need flexibility that is not available in pure hardware architectures, while pure software solutions do not meet video processing performance constraints. Thus, ASIP design could offer a good tradeoff between performance and flexibility. Video processing algorithms are often characterized by intrinsic parallelism that can be accelerated by ASIP specialized instructions. In this paper, we propose a new approach for exploiting sequences of tightly coupled specialized instructions in ASIP design applicable to video processing. Our approach, which avoids costly data communications by applying data grouping and data reuse, consists of accelerating an algorithm’s critical loops by transforming them according to a new intermediate representation. This representation is optimized and loop parallelism possibilities are also explored. This approach has been applied to video processing algorithms such as the ELA deinterlacer and the 2D-DCT. Experimental results show speedups up to 18 (on the considered applications, while the hardware overhead in terms of additional logic gates was found to be between 18 and 59%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. L. V. Agostini, I. S. Silva, and S. Bampi, “Pipelined Fast 2D DCT Architecture for JPEG Image Compression,” in Proc. of the 14th Symposium on Integrated Circuits and Systems Design, Pirenópolis, Brazil, 2001, pp. 226–231.

  2. A. Aiken and A. Nicolau, “Optimal Loop Parallelization,” in Proc. of the SIGPLAN ’88 Conference on Programming Language Design and Implementation, Atlanta, Georgia, USA, 1988, pp. 308–317.

  3. ARM Ltd., “Amba Bus,” available at: http://www.arm.com.

  4. M.-A. Cantin, Y. Savaria, D. Prodanos, and P. Lavoie, “An Automatic Word Length Determination Method,” in IEEE International Symposium on Circuits and Systems (ISCAS’2001) vol. 5, Sydney, Australia, May 2001, pp. 53–56.

  5. N. Cheung, J. Henkel, and S. Parameswaran, Rapid Configuration and Instruction Selection for an ASIP: A Case Study, DATE’03, Munich, Germany, 2003, pp. 10802–10809.

  6. N. Clark, J. Blome, M. Chu, S. Mahlke, S. Biles, and K. Flautner, “An Framework for Transparent Instruction Set Customization in Embedded Processors,” in Proc. of the 32nd International Symposium on Computer Architecture, ISCA’05, IEEE, Madison, Wisconsin USA, 2005, pp. 272–283.

  7. J. Cong, Y. Fan, G. Han, A. Jagannathan, G. Reinman, and Z. Zhang, Instruction Set Extension with Shadow Registers for Configurable Processors, FPGA’05, Monterey, California, USA, Feb. 2005, pp. 99–106.

  8. CoWare, “Lisatek,” 2005, http://www.coware.com/products/lisatek.

  9. T. V. K. Gupta, R. E. Ko, and R. Barua, “Compiler-directed Customization of ASIP Cores,” in Proc. of 10th International Symposium on Hadware/Software Codesign, CODES’02, ACM, Estes Park, Colorado, USA, 2002, pp. 97–102.

  10. D. Goodwin and D. Petkov, Automatic Generation of Application Specific Processors, CASES’03, San Jose, California, USA, 2003, pp. 137–147.

  11. A. Hoffmann, T. Kogel, A. Nohl, G. Braun, O. Schliebusch, O. Wahlen, A. Wieferink, and H. Meyr, “A Novel Methodology for the Design of Application-Specific Instruction-Set Processors (ASIPs) Using a Machine Description Language,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 20, no. 11, Nov. 2001, pp. 1338–1354.

    Article  Google Scholar 

  12. M. Imai, N. Binh, and A. Shiomi, “A New HW/SW Partitioning Algorithm for Synthesizing the Highest Performance Pipelined ASIPs with Multiple Identical FUs,” in Proc. of European Design Automation Conference, EURO-VHDL’96, Geneva, Switzerland, 1996, pp. 126–131.

  13. M. K. Jain, M. Balakrishnan, and A. Kumar, An Efficient Technique for Exploring Register File Size in ASIP Synthesis, CASES 2002, ACM, Grenoble, France, 2002, pp. 252–261.

    Google Scholar 

  14. K. Karuri, M. A. Al Faruque, S. Kraemer, R. Leupers, G. Ascheid, and H. Meyr, Fine-grained Application Source Code Profiling for ASIP Design, ACM, DAC 2005, Anaheim, California, USA, 2005, pp. 329–334.

  15. J. S. Lim, Two-dimensional Signal and Image Processing, Prentice-Hall, Signal Processing Series, 1990.

  16. S. Lin, Y. Chang, and L. Chen, “Motion Adaptive Interpolation with Horizontal Motion Detection for Deinterlacing,” IEEE Trans. Consum. Electron., vol. 49, no. 4, Nov 2003, pp. 1256–1265.

    Article  Google Scholar 

  17. M. Mbaye, N. Bélanger, Y. Savaria, and S. Pierre, Application Specific Instruction-set Processor Generation for Video Processing Based on Loop Optimization, ISCAS ’05, IEEE, Kobe, Japan, May 2005, pp. 3515–3518.

    Google Scholar 

  18. M. Mbaye, D. Lebel, N. Bélanger, Y. Savaria, and S. Pierre, Design Exploration with an Application-specific Instruction-set Processor for ELA Deinterlacing, ISCAS ’06, IEEE, Island of Kos, Greece, May, 2006, pp. 4607–4610.

  19. H. Meyr, System-on-Chip Communications: The Dawn of ASIPs and the Dusk of ASICs, Signal Processing Systems, SIPS’2003, IEEE, Seoul, Korea, 2003, pp. 4–5.

    Google Scholar 

  20. P. R. Panda, F. Cathoor, N. D. Dutt, K. Dankaert, E. Brockmeyer, C. Kulkarni, A. Vandercapelle, and P. G. Kjeldsberg, “Data and Memory Optimization Techniques for Embedded Systems,” ACM Transact. Des. Automat. Electron. Syst., vol. 6, no. 2, Apr. 2001, pp. 149–206.

    Article  Google Scholar 

  21. J. Park, P. C. Diniz, and K. R. S. Shayee, “Performance and Area Modeling of Complete FPGA Designs in the Presence of Loop transformations,” IEEE Trans. Comput., vol. 53, no. 11, Nov. 2004, pp. 1420–1435.

    Article  Google Scholar 

  22. L. Pozzi and K. Atasu, “Exact and Approximate Algorithms for the Extension of Embedded Processor Instruction Sets,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 25, no. 7, Jul. 2006, pp. 1209–1229.

    Article  Google Scholar 

  23. C. Shekhar, R. Singh, A. S. Mandal, S. C. Bose, R. Saini, and P. Tanwar, “Application Specific Instruction Set Processors: Redefining Hardware–software Boundary,” in Proc. of the 17th International Conference on VLSI Design, Mumbai, India, 2004, pp. 915–918.

  24. B. Su, S. Ding, and J. Xia, “URPR—An Extension of URCR for Software Pipelining,” in Proc. of the 19th Microprogramming Workshop (MICRO-19), New-York, New-York, USA, 1986, pp. 94–103.

  25. F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha, A Scalable Application-specific Processor Synthesis Methodology, ICCAD’2003, San Jose, California, USA, 2003, pp. 283–290.

  26. D. C. Suresh, W. A. Najjar, F. Vahid, J. R. Villarreal, and G. Stitt, “Profiling Tools for Hardware/Software Partitioning of Embedded Applications,” in Proc. of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems (LCTES), San Diego, California, USA, 2003, pp. 189–198.

  27. Synopsys Inc., “Design Compiler,” 2006, http://www.synopsys.com.

  28. Tensilica Inc., “Xtensa Processor Generator and Xpress Compiler,” 2006, available: http://www.tensilica.com.

  29. P. Yu and T. Mitra, Characterizing Embedded Applications for Instructions-set Extensible Processors, DAC’04, ACM, San Diego, California, USA, 2004, pp. 723–728.

    Google Scholar 

  30. Wikipedia, “Data dependency,” 2007, http://en.wikipedia.org/wiki/Data_dependency.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mame Maria Mbaye.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mbaye, M.M., Bélanger, N., Savaria, Y. et al. A Novel Application-specific Instruction-set Processor Design Approach for Video Processing Acceleration. J VLSI Sign Process Syst Sign Image Video Technol 47, 297–315 (2007). https://doi.org/10.1007/s11265-007-0050-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-007-0050-0

Keywords

Navigation