research-article

Application development with the FlexWAFE real-time stream processing architecture for FPGAs

Authors:
Amilcar Do Carmo Lucas

Technical University of Braunschweig, Germany

Technical University of Braunschweig, Germany
View Profile

,
Henning Sahlbach

Technical University of Braunschweig, Germany

Technical University of Braunschweig, Germany
View Profile

,
Sean Whitty

Technical University of Braunschweig, Germany

Technical University of Braunschweig, Germany
View Profile

,
Sven Heithecker

Technical University of Braunschweig, Germany

Technical University of Braunschweig, Germany
View Profile

,
Rolf Ernst

Technical University of Braunschweig, Germany

Technical University of Braunschweig, Germany
View Profile

ACM Transactions on Embedded Computing Systems Volume 9 Issue 1Article No.: 4pp 1–23https://doi.org/10.1145/1596532.1596536

Published:29 October 2009Publication History

ACM Transactions on Embedded Computing Systems

Abstract

The challenges posed by complex real-time digital image processing at high resolutions cannot be met by current state-of-the-art general-purpose or DSP processors, due to the lack of processing power. On the other hand, large arrays of FPGA-based accelerators are too inefficient to cover the needs of cost sensitive professional markets. We present a new architecture composed of a network of configurable flexible weakly programmable processing elements, Flexible Weakly programmable Advanced Film Engine (FlexWAFE). This architecture delivers both programmability and high efficiency when implemented on an FPGA basis. We demonstrate these claims using a professional next-generation noise reducer with more than 170G image operations/s at 80% FPGA area utilization on four Virtex II-Pro FPGAs. This article will focus on the FlexWAFE architecture principle and implementation on a PCI-Express board.

References

Ahn, J. H., Dally, W. J., Khailany, B., Kapasi, U. J., and Das, A. 2004. Evaluating the imagine stream architecture. SIGARCH Comput. Archit. News 32, 2, 14. Google ScholarDigital Library
Aspex, Ltd. 2008. ASProCore overview Web site. http://www.aspex-semi.com.Google Scholar
Blythe, D. 2008. Rise of the graphics processor. Proc. IEEE 96, 5, 761--778.Google ScholarCross Ref
CCSDS. 1997. Lossless data compression, Blue Book. Consultation Committee for Space Data Systems. http://public.ccsds.org/publications/archive/121x0b1c2_tca724.pdfGoogle Scholar
Cloutier, J., Pigeon, S., Boyer, F. R., Cosatto, E., and Simard, P. Y. 1996. Vip: An FPGA-based processor for image processing and neural networks. In Proceedings of the International Conference on Microelectronics for Neural, Fuzzy, and Bio-Inspired Systems. ACM, New York, 330. Google ScholarDigital Library
Crookes, D., Benkrid, K., Bouridane, A., Alotaibi, K., and Benkrid, A. 2000. Design and implementation of a high-level programming environment for FPGA-based image processing. IEE Proc. Vision Signal Process. 147, 4, 377--384.Google ScholarCross Ref
Da Vinci Systems. 2008. Da Vinci Systems Web site. http://www.geniusofdavinci.com.Google Scholar
Digital Vision AB. 2008. Digital vision DVNR Web site. http://www.digitalvision.se.Google Scholar
do Carmo Lucas, A. and Ernst, R. 2005. An image processor for digital film. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architecture Processors. IEEE, Los Alamitos, CA. Google ScholarDigital Library
do Carmo Lucas, A., Heithecker, S., and Ernst, R. 2007. FlexWAFE: A high-end real-time stream processing library for FPGAs. In Proceedings of the 44th Annual Conference on Design Automation (DAC'07). ACM, New York, 916--921. Google ScholarDigital Library
do Carmo Lucas, A., Heithecker, S., Rüffer, P., Ernst, R., Rückert, H., Wischermann, G., Gebel, K., Fach, R., Hunther, W., Eichner, S., and Scheller, G. 2006. A reconfigurable HW/SW platform for computation intensive high-resolution real-time digital film applications. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE'06). IEEE, Los Alamitos, CA, 194--199. Google ScholarDigital Library
Dutta, S., Jensen, R., and Rieckmann, A. 2001. Viper: A multiprocessor SoC for advanced set-top box and digital TV systems. IEEE Des. Test Comput. 21--31. Google ScholarDigital Library
Eichner, S., Scheller, G., Wessely, U., Rückert, H., and Hedtke, R. 2005. Motion compensated spatial-temporal reduction of film grain noise in the wavelet domain. In Proceedings of the SMPTE Technical Conference. SMPTE, White Plains, NY.Google Scholar
Guo, Z., Mitra, A., and Najjar, W. 2006. Automation of IP core interface generation for reconfigurable computing. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL'06). IEEE, Los Alamitos, CA, 1--6.Google Scholar
Hartenstein, R., Hirschbiel, A., and Weber, M. 1987. MOM: Map oriented machine. In Proceedings of the International Workshop on Hardware Accelerators. IEEE, Los Alamitos, CA.Google Scholar
Heithecker, S., do Carmo Lucas, A., and Ernst, R. 2003. A mixed QoS SDRAM controller for FPGA-based high-end image processing. In Proceedings of the Workshop on Signal Processing Systems Design and Implementation. Elsevier, The Netherlands.Google Scholar
Heithecker, S., do Carmo Lucas, A., and Ernst, R. 2007. A high-end real-time digital film processing reconfigurable platform. EURASIP J. Embed. Syst. 1, 12. Google ScholarDigital Library
Heithecker, S. and Ernst, R. 2005. Traffic shaping for an FPGA-based SDRAM controller with complex QoS requirements. In Proceedings of the Design Automation Conference (DAC'05). ACM, New York, 575--578. Google ScholarDigital Library
Hoover, G. and Brewer, F. 2008. Synthesizing Synchronous Elastic Flow Networks. In Proceedings of the Design, Automation and Test in Europe (DATE'08). IEEE, Los Alamitos, CA, 306--311. Google ScholarDigital Library
Hunt Engineering, Ltd. Homepage. http://www.hunteng.co.uk.Google Scholar
Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the cell multiprocessor. IBM J. Res. Develop. 49, 4/5, 589--604. Google ScholarDigital Library
Karp, R. and Miller, R. 1966. Properties of a model for parallel computations: Determinacy, termination, queueing. SIAM J. Appl. Math. 40, 6.Google Scholar
Kumar, A., Hansson, A., Huisken, J., and Corporaal, H. 2007. An FPGA design flow for reconfigurable network-based multi-processor systems on chip. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07). IEEE, Los Alamitos, CA, 1--6. Google ScholarDigital Library
Lee, E. A. and Messerschmitt, D. G. 1987. Synchronous data flow. In Proceedings of the IEEE. 75, 1235--1245.Google ScholarCross Ref
Lee, K.-B., Lin, T.-C., and Jen, C.-W. 2005. An efficient quality-aware memory controller for multimedia platform SoC. IEEE Trans. Circuits Syst. Video Technol. 15, 5, 620--633. Google ScholarDigital Library
Loo, S., Wells, B., Freije, N., and Kulick, J. 2002. Handel-C for rapid prototyping of VLSI coprocessors for real-time systems. In Proceedings of the 34th Southeastern Symposium on System Theory. IEEE, Los Alamitos, CA, 6--10.Google Scholar
Najjar, W., Bohm, W., Draper, B., Hammes, J., Rinker, R., Beveridge, J., Chawathe, M., and Ross, C. 2003. High-level language abstraction for reconfigurable computing. Computer 36, 8, 63--69. Google ScholarDigital Library
Nallatech, Ltd. 2007. DIMEtalk 3 Product Brief. http://www.nallatech.com/index.php/FPGA-Development-Tools/dimetalk.htmlGoogle Scholar
Quantel Ltd. 2008. Quantel Pablo Web site. http://www.quantel.com.Google Scholar
Rice, R. F. 1979. Some practical universal noiseless coding techniques. JPL Publication 91-3, Part i11, Module PSI14, K+.Google Scholar
Rixner, S., Dally, W. J., and Kapasi, U. J. 2000. Memory access scheduling. In Proceedings of the International Symposium on Computer Architecture. IEEE, Los Alamitos, CA, 128--138. Google ScholarDigital Library
Shukla, S., Bergmann, N., and Becker, J. 2007. QUKU: A FPGA-based flexible coarse grain architecture design paradigm using process networks. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'07). IEEE, Los Alamitos, CA, 1--7.Google Scholar
Shukla, S., Bergmann, N. W., and Becker, J. 2005. APEX—A coarse-grained reconfigurable overlay for FPGAs. In Proceedings of the IFIP VLSI SoC.Google Scholar
Sonics, Inc. 2005. Sonics MemMax 2.0 multi-threaded DRAM access scheduler. Data sheet, Sonics Inc. http://74.125.113.132/search?q=cache:sA2W0BPm61wJ:www.sonicsinc.com/download_doc.php%3Fdoc%3DMemMax2datasheet0906.pdf+Sonics+MemMax+2.0+Multi-threaded+DRAM+Access+Scheduler&cd=1&hl=en&ct=clnk&gl=usGoogle Scholar
Stream Processors Inc. 2008. Storm-1 SP16HP-G220 Product Brief. http://www.streamprocessors.com.Google Scholar
The Mathworks. 2008. Simulink—Simulation and Model-Based Design Homepage. http://www.mathworks.com/products/simulink/.Google Scholar
Thoma, F., Kühnle, M., Bonnot, P., Panainte, E. M., Bertels, K., Goller, S., Schneider, A., Guyetant, S., Schüler, E., Müller-Glaser, K. D., and Becker, J. 2007. MORPHEUS: Heterogeneous reconfigurable computing. In Proceedings of 17th International Conference on Field-Programmable Logic and Applications (FPL'07). IEEE, Los Alamitos, CA.Google Scholar
Thomson Grassvalley. 2008. Scream 4K/2K/HD noise reducer Web site. http://www.thomsongrassvalley.com.Google Scholar
Weber, W.-D. 2001. Efficient shared DRAM subsystems for SOCs. Microprocessor Forum.Google Scholar
Whitty, S. and Ernst, R. 2008. A bandwidth optimized SDRAM controller for the MORPHEUS reconfigurable architecture. In Proceedings of the Parallel and Distributed Processing Symposium (IPDPS). IEEE, Los Alamitos, CA.Google Scholar
Xilinx, Inc. 2008. Xilinx Virtex 5 Family Overview. http://www.xilinx.com.Google Scholar

Index Terms

Application development with the FlexWAFE real-time stream processing architecture for FPGAs

Recommendations

FlexWAFE - a high-end real-time stream processing library for FPGAs
DAC '07: Proceedings of the 44th annual Design Automation Conference

Digital film processing is characterized by a resolution of at least 2K (2048x1536 pixels per frame at 30 bit/pixel and 24 pictures/s, data rate of 2.2 Gbit/s); higher resolutions of 4K (8.8 Gbit/s) and even 8K (35.2 Gbit/s) are on their way. Real-time ...
Read More
A reconfigurable HW/SW platform for computation intensive high-resolution real-time digital film applications
DATE '06: Proceedings of the conference on Design, automation and test in Europe: Proceedings

This paper presents a multi-board, multi-FPGA hardware/software architecture, for computation intensive, high resolution (2048x2048 pixels), real-time (24 frames per second) digital film processing. It is based on Xilinx Virtex-II Pro FPGAs, large SDRAM ...
Read More
The Development of Hardware Architecture for Real-time Chain Code based on FPGA
IMCOM '16: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication

This paper presents a hardware architecture developed for use with real-time Chain Code. This system applies open-chain code and closed-chain code to solve some problems related to noise and an inexact edge image. The design of the proposed method uses ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Embedded Computing Systems Volume 9, Issue 1
October 2009
184 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/1596532
Issue’s Table of Contents

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 29 October 2009
- Accepted: 1 February 2009
- Revised: 1 January 2009
- Received: 1 June 2008
Published in tecs Volume 9, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Communication centric
FPGA
PCI-Express
QoS
SDRAM-controller
communication scheduling
digital film
real-time
reconfigurable
stream-based architecture
weakly-programmable
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 479
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Application development with the FlexWAFE real-time stream processing architecture for FPGAs

ACM Transactions on Embedded Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

FlexWAFE - a high-end real-time stream processing library for FPGAs

A reconfigurable HW/SW platform for computation intensive high-resolution real-time digital film applications

The Development of Hardware Architecture for Real-time Chain Code based on FPGA

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Application development with the FlexWAFE real-time stream processing architecture for FPGAs

ACM Transactions on Embedded Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

FlexWAFE - a high-end real-time stream processing library for FPGAs

A reconfigurable HW/SW platform for computation intensive high-resolution real-time digital film applications

The Development of Hardware Architecture for Real-time Chain Code based on FPGA

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media