Abstract
Video processing applications often need high computing capacity but have performance and power constraints, especially in portable devices. General purpose processors can no longer meet the requirements. This paper presents a parallel reconfigurable computing architecture consisting of reconfigurable processing units connected by an area-efficient routing. The hierarchical configuration contexts can cut the implementation overhead and the energy dissipation spent on fast reconfiguration. The proposed architecture targets multiple-standard video processing. The design is able to give high performance comparable to the fixed-function ASIC through deep pipelining and a large amount of computing parallelism. The experimental results show the proposed architecture has great performance and practicability.
Similar content being viewed by others
References
Altinisik E, Tasdemir K, Sencar HT (2020) Mitigation of H.264 and H.265 video compression for reliable PRNU estimation. IEEE Transactions on Information Forensics and Security 15:1557–1571
Andrews D (2014) Operating Systems Research for Reconfigurable Computing. IEEE Micro 34(1):54–58
Bahtat M, Belkouch S, Elleaume P, Le Gall P (2012) Efficient Implementation Scheme of a Real-time Radar Beam Former on a VLIW DSP Processor, TMS320C66x TI DSP Implementation. IEEE International Conference on Complex Systems 1–6
Bajčinovci V, Vranješ M, Babić D, Kovačević B (2017) Subjective and Objective Quality Assessment of MPEG-2, H.264 and H.265 Videos. International Symposium ELMAR, Zadar 73–77
Castañeda O, Bobbett M, Gallyas-Sanhueza A, Studer C (2019) PPAC: A Versatile In-Memory Accelerator for Matrix-Vector-Product-Like Operations. IEEE International Conference on Application-specific Systems, Architectures and Processors 149–156
Chan W, Tseng Y, Lin Y, Chien S (2014) Coarse-grained reconfigurable stream processor for distributed smart cameras. IEEE Workshop on Signal Processing Systems (SiPS) 1–6
Dighe S, Vangal SR, Aseron P, Kumar S, Jacob T, Bowman KA, Howard J, Tschanz J, Erraguntla V, Borkar N, De VK, Borkar S (2011) Within-die variation-aware dynamic-voltage-frequency-scaling with optimal Core allocation and thread hopping for the 80-Core TeraFLOPS processor. IEEE J Solid State Circuits 46(1):184–193
Dülger Ö, Oğuztüzün H, Demirekler M (2015) Implementation of the sampling importance resampling particle filter algorithm in graphics processing unit. Signal Processing and Communications Applications Conference 2195–2198
Fawaz K, Arslan T, Lindsay I (2009) Implementation of Highly Pipelined Datapaths on a Reconfigurable Asynchronous Substrate. NASA/ESA Conference on Adaptive Hardware and Systems 112–119
Fujimori T, Watanabe M (2018) High Total-ionizing-dose Tolerance Field Programmable Gate Array. IEEE International Symposium on Circuits and Systems (ISCAS) 1–4
A. Guerrieri, S. Kashani-Akhavan, P. Lombardi, B. Belhadj and P. Ienne (2018) A dynamically reconfigurable platform for high-performance and low-power on-board processing. NASA/ESA Conference on Adaptive Hardware and Systems (AHS) 74–81
Iwamoto J, Zhang R, Nakashima Y (2019) Evaluation of a Chained Systolic Array with High-Speed Links. International Symposium on Computing and Networking Workshops 71–77
Kanoun K, Ruggiero M, Atienza D, Schaar MVD (2014) Low Power and Scalable Many-Core Architecture for Big-Data Stream Computing. IEEE Computer Society Annual Symposium on VLSI 468–473
Kao C-C (2015) Performance-Oriented Partitioning for Task Scheduling of Parallel Reconfigurable Architectures. IEEE Transactions on Parallel and Distributed Systems 26(3):858–867
Langdon JH, McAleavey SA (2014) Real-time single track location ultrasound elasticity imaging using graphic processing units. IEEE Western New York Image and Signal Processing Workshop 42–46
Li Y, Si J, Ma S, Hu X (2019) Using Energy-Aware Scheduling Weather Forecast Based Harvesting for Reconfigurable Hardware. IEEE Transactions on Sustainable Computing 4(1):109–117
Lin Z, Sinha S, Liang H, Feng L, Zhang W (2018) Scalable Light-Weight Integration of FPGA Based Accelerators with Chip Multi-Processors. IEEE Transactions on Multi-Scale Computing Systems 4. 2. 152–162
Liu L et al. (2013) Implementation of multi-standard video decoding algorithms on a coarse-grained reconfigurable multimedia processor. IEEE international symposium on circuits and systems (ISCAS) 897-900
Liu L, Dong W, Zhu M, Wang Y, Yin S, Cao P, Yang J, Wei S (2015) An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding. IEEE Transactions on Multimedia 17(10):1706–1720
Liu X, Yan W, Xiang G, Cheng L, Yan Y (2019) A Novel Fast Mode Decision Algorithm for AVS2 Intra Coding. IEEE 4th International Conference on Signal and Image Processing (ICSIP) 850–854
Wang Y, Liu L, Yin S, Zhu M, Cao P, Yang J, Wei S (2014) On-Chip Memory Hierarchy in One Coarse-Grained Reconfigurable Architecture to Compress Memory Space and to Reduce Reconfiguration Time and Data-Reference Time. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22(5):983–994
Weber RJ, Hogan JA, LaMeres BJ (2013) Power Efficiency Benchmarking of a Partially Reconfigurable, Many-tile System Implemented on a Xilinx Virtex-6 FPGA. International Conference on Reconfigurable Computing and FPGAs (ReConFig) 1–4
Won J-H, Jeon Y, Rosenberg JK, Yoon S, Rubin GD, Napel S (2013) Uncluttered Single-Image Visualization of Vascular Structures Using GPU and Integer Programming. IEEE Transactions on Visualization and Computer Graphics 19. 1. 81–93
Wunderlich RB, Adil F, Hasler P (2013) Floating Gate-Based Field Programmable Mixed-Signal Array. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21(8):1496–1505
Yamaguchi T, Shinoda Y (2018) Multichannel High-Speed Fiber Bragg Grating Interrogation System Utilizing a Field Programmable Gate Array. IEEE Sensors Letters 2(1):5500204
Yang X, Liu L, Yin S, Zhu M, Jia W, Wei S (2011) Mapping Deblocking Algorithm of H.264 Decoder onto a Reconfigurable Array Architecture. International Conference on Consumer Electronics, Communications and Networks (CECNet) 4166–4169
Yang C, Zhang H, Wang X, Geng L (2018) An Energy-Efficient and Flexible Accelerator based on Reconfigurable Computing for Multiple Deep Convolutional Neural Networks. IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT) 1–3
Funding
This work was supported by Ministry of Science and Technology, Taiwan, MOST 106–2221-E-024-005 and National University of Tainan, Taiwan, AB108–207.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kao, CC. Performance-driven parallel reconfigurable computing architecture for multi-standard video decoding. Multimed Tools Appl 79, 30583–30599 (2020). https://doi.org/10.1007/s11042-020-09505-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09505-1