Abstract
The challenges posed by complex real-time digital image processing at high resolutions cannot be met by current state-of-the-art general-purpose or DSP processors, due to the lack of processing power. On the other hand, large arrays of FPGA-based accelerators are too inefficient to cover the needs of cost sensitive professional markets. We present a new architecture composed of a network of configurable flexible weakly programmable processing elements, Flexible Weakly programmable Advanced Film Engine (FlexWAFE). This architecture delivers both programmability and high efficiency when implemented on an FPGA basis. We demonstrate these claims using a professional next-generation noise reducer with more than 170G image operations/s at 80% FPGA area utilization on four Virtex II-Pro FPGAs. This article will focus on the FlexWAFE architecture principle and implementation on a PCI-Express board.
- Ahn, J. H., Dally, W. J., Khailany, B., Kapasi, U. J., and Das, A. 2004. Evaluating the imagine stream architecture. SIGARCH Comput. Archit. News 32, 2, 14. Google ScholarDigital Library
- Aspex, Ltd. 2008. ASProCore overview Web site. http://www.aspex-semi.com.Google Scholar
- Blythe, D. 2008. Rise of the graphics processor. Proc. IEEE 96, 5, 761--778.Google ScholarCross Ref
- CCSDS. 1997. Lossless data compression, Blue Book. Consultation Committee for Space Data Systems. http://public.ccsds.org/publications/archive/121x0b1c2_tca724.pdfGoogle Scholar
- Cloutier, J., Pigeon, S., Boyer, F. R., Cosatto, E., and Simard, P. Y. 1996. Vip: An FPGA-based processor for image processing and neural networks. In Proceedings of the International Conference on Microelectronics for Neural, Fuzzy, and Bio-Inspired Systems. ACM, New York, 330. Google ScholarDigital Library
- Crookes, D., Benkrid, K., Bouridane, A., Alotaibi, K., and Benkrid, A. 2000. Design and implementation of a high-level programming environment for FPGA-based image processing. IEE Proc. Vision Signal Process. 147, 4, 377--384.Google ScholarCross Ref
- Da Vinci Systems. 2008. Da Vinci Systems Web site. http://www.geniusofdavinci.com.Google Scholar
- Digital Vision AB. 2008. Digital vision DVNR Web site. http://www.digitalvision.se.Google Scholar
- do Carmo Lucas, A. and Ernst, R. 2005. An image processor for digital film. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architecture Processors. IEEE, Los Alamitos, CA. Google ScholarDigital Library
- do Carmo Lucas, A., Heithecker, S., and Ernst, R. 2007. FlexWAFE: A high-end real-time stream processing library for FPGAs. In Proceedings of the 44th Annual Conference on Design Automation (DAC'07). ACM, New York, 916--921. Google ScholarDigital Library
- do Carmo Lucas, A., Heithecker, S., Rüffer, P., Ernst, R., Rückert, H., Wischermann, G., Gebel, K., Fach, R., Hunther, W., Eichner, S., and Scheller, G. 2006. A reconfigurable HW/SW platform for computation intensive high-resolution real-time digital film applications. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE'06). IEEE, Los Alamitos, CA, 194--199. Google ScholarDigital Library
- Dutta, S., Jensen, R., and Rieckmann, A. 2001. Viper: A multiprocessor SoC for advanced set-top box and digital TV systems. IEEE Des. Test Comput. 21--31. Google ScholarDigital Library
- Eichner, S., Scheller, G., Wessely, U., Rückert, H., and Hedtke, R. 2005. Motion compensated spatial-temporal reduction of film grain noise in the wavelet domain. In Proceedings of the SMPTE Technical Conference. SMPTE, White Plains, NY.Google Scholar
- Guo, Z., Mitra, A., and Najjar, W. 2006. Automation of IP core interface generation for reconfigurable computing. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL'06). IEEE, Los Alamitos, CA, 1--6.Google Scholar
- Hartenstein, R., Hirschbiel, A., and Weber, M. 1987. MOM: Map oriented machine. In Proceedings of the International Workshop on Hardware Accelerators. IEEE, Los Alamitos, CA.Google Scholar
- Heithecker, S., do Carmo Lucas, A., and Ernst, R. 2003. A mixed QoS SDRAM controller for FPGA-based high-end image processing. In Proceedings of the Workshop on Signal Processing Systems Design and Implementation. Elsevier, The Netherlands.Google Scholar
- Heithecker, S., do Carmo Lucas, A., and Ernst, R. 2007. A high-end real-time digital film processing reconfigurable platform. EURASIP J. Embed. Syst. 1, 12. Google ScholarDigital Library
- Heithecker, S. and Ernst, R. 2005. Traffic shaping for an FPGA-based SDRAM controller with complex QoS requirements. In Proceedings of the Design Automation Conference (DAC'05). ACM, New York, 575--578. Google ScholarDigital Library
- Hoover, G. and Brewer, F. 2008. Synthesizing Synchronous Elastic Flow Networks. In Proceedings of the Design, Automation and Test in Europe (DATE'08). IEEE, Los Alamitos, CA, 306--311. Google ScholarDigital Library
- Hunt Engineering, Ltd. Homepage. http://www.hunteng.co.uk.Google Scholar
- Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the cell multiprocessor. IBM J. Res. Develop. 49, 4/5, 589--604. Google ScholarDigital Library
- Karp, R. and Miller, R. 1966. Properties of a model for parallel computations: Determinacy, termination, queueing. SIAM J. Appl. Math. 40, 6.Google Scholar
- Kumar, A., Hansson, A., Huisken, J., and Corporaal, H. 2007. An FPGA design flow for reconfigurable network-based multi-processor systems on chip. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07). IEEE, Los Alamitos, CA, 1--6. Google ScholarDigital Library
- Lee, E. A. and Messerschmitt, D. G. 1987. Synchronous data flow. In Proceedings of the IEEE. 75, 1235--1245.Google ScholarCross Ref
- Lee, K.-B., Lin, T.-C., and Jen, C.-W. 2005. An efficient quality-aware memory controller for multimedia platform SoC. IEEE Trans. Circuits Syst. Video Technol. 15, 5, 620--633. Google ScholarDigital Library
- Loo, S., Wells, B., Freije, N., and Kulick, J. 2002. Handel-C for rapid prototyping of VLSI coprocessors for real-time systems. In Proceedings of the 34th Southeastern Symposium on System Theory. IEEE, Los Alamitos, CA, 6--10.Google Scholar
- Najjar, W., Bohm, W., Draper, B., Hammes, J., Rinker, R., Beveridge, J., Chawathe, M., and Ross, C. 2003. High-level language abstraction for reconfigurable computing. Computer 36, 8, 63--69. Google ScholarDigital Library
- Nallatech, Ltd. 2007. DIMEtalk 3 Product Brief. http://www.nallatech.com/index.php/FPGA-Development-Tools/dimetalk.htmlGoogle Scholar
- Quantel Ltd. 2008. Quantel Pablo Web site. http://www.quantel.com.Google Scholar
- Rice, R. F. 1979. Some practical universal noiseless coding techniques. JPL Publication 91-3, Part i11, Module PSI14, K+.Google Scholar
- Rixner, S., Dally, W. J., and Kapasi, U. J. 2000. Memory access scheduling. In Proceedings of the International Symposium on Computer Architecture. IEEE, Los Alamitos, CA, 128--138. Google ScholarDigital Library
- Shukla, S., Bergmann, N., and Becker, J. 2007. QUKU: A FPGA-based flexible coarse grain architecture design paradigm using process networks. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'07). IEEE, Los Alamitos, CA, 1--7.Google Scholar
- Shukla, S., Bergmann, N. W., and Becker, J. 2005. APEX—A coarse-grained reconfigurable overlay for FPGAs. In Proceedings of the IFIP VLSI SoC.Google Scholar
- Sonics, Inc. 2005. Sonics MemMax 2.0 multi-threaded DRAM access scheduler. Data sheet, Sonics Inc. http://74.125.113.132/search?q=cache:sA2W0BPm61wJ:www.sonicsinc.com/download_doc.php%3Fdoc%3DMemMax2datasheet0906.pdf+Sonics+MemMax+2.0+Multi-threaded+DRAM+Access+Scheduler&cd=1&hl=en&ct=clnk&gl=usGoogle Scholar
- Stream Processors Inc. 2008. Storm-1 SP16HP-G220 Product Brief. http://www.streamprocessors.com.Google Scholar
- The Mathworks. 2008. Simulink—Simulation and Model-Based Design Homepage. http://www.mathworks.com/products/simulink/.Google Scholar
- Thoma, F., Kühnle, M., Bonnot, P., Panainte, E. M., Bertels, K., Goller, S., Schneider, A., Guyetant, S., Schüler, E., Müller-Glaser, K. D., and Becker, J. 2007. MORPHEUS: Heterogeneous reconfigurable computing. In Proceedings of 17th International Conference on Field-Programmable Logic and Applications (FPL'07). IEEE, Los Alamitos, CA.Google Scholar
- Thomson Grassvalley. 2008. Scream 4K/2K/HD noise reducer Web site. http://www.thomsongrassvalley.com.Google Scholar
- Weber, W.-D. 2001. Efficient shared DRAM subsystems for SOCs. Microprocessor Forum.Google Scholar
- Whitty, S. and Ernst, R. 2008. A bandwidth optimized SDRAM controller for the MORPHEUS reconfigurable architecture. In Proceedings of the Parallel and Distributed Processing Symposium (IPDPS). IEEE, Los Alamitos, CA.Google Scholar
- Xilinx, Inc. 2008. Xilinx Virtex 5 Family Overview. http://www.xilinx.com.Google Scholar
Index Terms
- Application development with the FlexWAFE real-time stream processing architecture for FPGAs
Recommendations
FlexWAFE - a high-end real-time stream processing library for FPGAs
DAC '07: Proceedings of the 44th annual Design Automation ConferenceDigital film processing is characterized by a resolution of at least 2K (2048x1536 pixels per frame at 30 bit/pixel and 24 pictures/s, data rate of 2.2 Gbit/s); higher resolutions of 4K (8.8 Gbit/s) and even 8K (35.2 Gbit/s) are on their way. Real-time ...
A reconfigurable HW/SW platform for computation intensive high-resolution real-time digital film applications
DATE '06: Proceedings of the conference on Design, automation and test in Europe: ProceedingsThis paper presents a multi-board, multi-FPGA hardware/software architecture, for computation intensive, high resolution (2048x2048 pixels), real-time (24 frames per second) digital film processing. It is based on Xilinx Virtex-II Pro FPGAs, large SDRAM ...
The Development of Hardware Architecture for Real-time Chain Code based on FPGA
IMCOM '16: Proceedings of the 10th International Conference on Ubiquitous Information Management and CommunicationThis paper presents a hardware architecture developed for use with real-time Chain Code. This system applies open-chain code and closed-chain code to solve some problems related to noise and an inexact edge image. The design of the proposed method uses ...
Comments