skip to main content
research-article

Application development with the FlexWAFE real-time stream processing architecture for FPGAs

Published:29 October 2009Publication History
Skip Abstract Section

Abstract

The challenges posed by complex real-time digital image processing at high resolutions cannot be met by current state-of-the-art general-purpose or DSP processors, due to the lack of processing power. On the other hand, large arrays of FPGA-based accelerators are too inefficient to cover the needs of cost sensitive professional markets. We present a new architecture composed of a network of configurable flexible weakly programmable processing elements, Flexible Weakly programmable Advanced Film Engine (FlexWAFE). This architecture delivers both programmability and high efficiency when implemented on an FPGA basis. We demonstrate these claims using a professional next-generation noise reducer with more than 170G image operations/s at 80% FPGA area utilization on four Virtex II-Pro FPGAs. This article will focus on the FlexWAFE architecture principle and implementation on a PCI-Express board.

References

  1. Ahn, J. H., Dally, W. J., Khailany, B., Kapasi, U. J., and Das, A. 2004. Evaluating the imagine stream architecture. SIGARCH Comput. Archit. News 32, 2, 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aspex, Ltd. 2008. ASProCore overview Web site. http://www.aspex-semi.com.Google ScholarGoogle Scholar
  3. Blythe, D. 2008. Rise of the graphics processor. Proc. IEEE 96, 5, 761--778.Google ScholarGoogle ScholarCross RefCross Ref
  4. CCSDS. 1997. Lossless data compression, Blue Book. Consultation Committee for Space Data Systems. http://public.ccsds.org/publications/archive/121x0b1c2_tca724.pdfGoogle ScholarGoogle Scholar
  5. Cloutier, J., Pigeon, S., Boyer, F. R., Cosatto, E., and Simard, P. Y. 1996. Vip: An FPGA-based processor for image processing and neural networks. In Proceedings of the International Conference on Microelectronics for Neural, Fuzzy, and Bio-Inspired Systems. ACM, New York, 330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Crookes, D., Benkrid, K., Bouridane, A., Alotaibi, K., and Benkrid, A. 2000. Design and implementation of a high-level programming environment for FPGA-based image processing. IEE Proc. Vision Signal Process. 147, 4, 377--384.Google ScholarGoogle ScholarCross RefCross Ref
  7. Da Vinci Systems. 2008. Da Vinci Systems Web site. http://www.geniusofdavinci.com.Google ScholarGoogle Scholar
  8. Digital Vision AB. 2008. Digital vision DVNR Web site. http://www.digitalvision.se.Google ScholarGoogle Scholar
  9. do Carmo Lucas, A. and Ernst, R. 2005. An image processor for digital film. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architecture Processors. IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. do Carmo Lucas, A., Heithecker, S., and Ernst, R. 2007. FlexWAFE: A high-end real-time stream processing library for FPGAs. In Proceedings of the 44th Annual Conference on Design Automation (DAC'07). ACM, New York, 916--921. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. do Carmo Lucas, A., Heithecker, S., Rüffer, P., Ernst, R., Rückert, H., Wischermann, G., Gebel, K., Fach, R., Hunther, W., Eichner, S., and Scheller, G. 2006. A reconfigurable HW/SW platform for computation intensive high-resolution real-time digital film applications. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE'06). IEEE, Los Alamitos, CA, 194--199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dutta, S., Jensen, R., and Rieckmann, A. 2001. Viper: A multiprocessor SoC for advanced set-top box and digital TV systems. IEEE Des. Test Comput. 21--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Eichner, S., Scheller, G., Wessely, U., Rückert, H., and Hedtke, R. 2005. Motion compensated spatial-temporal reduction of film grain noise in the wavelet domain. In Proceedings of the SMPTE Technical Conference. SMPTE, White Plains, NY.Google ScholarGoogle Scholar
  14. Guo, Z., Mitra, A., and Najjar, W. 2006. Automation of IP core interface generation for reconfigurable computing. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL'06). IEEE, Los Alamitos, CA, 1--6.Google ScholarGoogle Scholar
  15. Hartenstein, R., Hirschbiel, A., and Weber, M. 1987. MOM: Map oriented machine. In Proceedings of the International Workshop on Hardware Accelerators. IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  16. Heithecker, S., do Carmo Lucas, A., and Ernst, R. 2003. A mixed QoS SDRAM controller for FPGA-based high-end image processing. In Proceedings of the Workshop on Signal Processing Systems Design and Implementation. Elsevier, The Netherlands.Google ScholarGoogle Scholar
  17. Heithecker, S., do Carmo Lucas, A., and Ernst, R. 2007. A high-end real-time digital film processing reconfigurable platform. EURASIP J. Embed. Syst. 1, 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Heithecker, S. and Ernst, R. 2005. Traffic shaping for an FPGA-based SDRAM controller with complex QoS requirements. In Proceedings of the Design Automation Conference (DAC'05). ACM, New York, 575--578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hoover, G. and Brewer, F. 2008. Synthesizing Synchronous Elastic Flow Networks. In Proceedings of the Design, Automation and Test in Europe (DATE'08). IEEE, Los Alamitos, CA, 306--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hunt Engineering, Ltd. Homepage. http://www.hunteng.co.uk.Google ScholarGoogle Scholar
  21. Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., and Shippy, D. 2005. Introduction to the cell multiprocessor. IBM J. Res. Develop. 49, 4/5, 589--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Karp, R. and Miller, R. 1966. Properties of a model for parallel computations: Determinacy, termination, queueing. SIAM J. Appl. Math. 40, 6.Google ScholarGoogle Scholar
  23. Kumar, A., Hansson, A., Huisken, J., and Corporaal, H. 2007. An FPGA design flow for reconfigurable network-based multi-processor systems on chip. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07). IEEE, Los Alamitos, CA, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lee, E. A. and Messerschmitt, D. G. 1987. Synchronous data flow. In Proceedings of the IEEE. 75, 1235--1245.Google ScholarGoogle ScholarCross RefCross Ref
  25. Lee, K.-B., Lin, T.-C., and Jen, C.-W. 2005. An efficient quality-aware memory controller for multimedia platform SoC. IEEE Trans. Circuits Syst. Video Technol. 15, 5, 620--633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Loo, S., Wells, B., Freije, N., and Kulick, J. 2002. Handel-C for rapid prototyping of VLSI coprocessors for real-time systems. In Proceedings of the 34th Southeastern Symposium on System Theory. IEEE, Los Alamitos, CA, 6--10.Google ScholarGoogle Scholar
  27. Najjar, W., Bohm, W., Draper, B., Hammes, J., Rinker, R., Beveridge, J., Chawathe, M., and Ross, C. 2003. High-level language abstraction for reconfigurable computing. Computer 36, 8, 63--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Nallatech, Ltd. 2007. DIMEtalk 3 Product Brief. http://www.nallatech.com/index.php/FPGA-Development-Tools/dimetalk.htmlGoogle ScholarGoogle Scholar
  29. Quantel Ltd. 2008. Quantel Pablo Web site. http://www.quantel.com.Google ScholarGoogle Scholar
  30. Rice, R. F. 1979. Some practical universal noiseless coding techniques. JPL Publication 91-3, Part i11, Module PSI14, K+.Google ScholarGoogle Scholar
  31. Rixner, S., Dally, W. J., and Kapasi, U. J. 2000. Memory access scheduling. In Proceedings of the International Symposium on Computer Architecture. IEEE, Los Alamitos, CA, 128--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Shukla, S., Bergmann, N., and Becker, J. 2007. QUKU: A FPGA-based flexible coarse grain architecture design paradigm using process networks. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'07). IEEE, Los Alamitos, CA, 1--7.Google ScholarGoogle Scholar
  33. Shukla, S., Bergmann, N. W., and Becker, J. 2005. APEX—A coarse-grained reconfigurable overlay for FPGAs. In Proceedings of the IFIP VLSI SoC.Google ScholarGoogle Scholar
  34. Sonics, Inc. 2005. Sonics MemMax 2.0 multi-threaded DRAM access scheduler. Data sheet, Sonics Inc. http://74.125.113.132/search?q=cache:sA2W0BPm61wJ:www.sonicsinc.com/download_doc.php%3Fdoc%3DMemMax2datasheet0906.pdf+Sonics+MemMax+2.0+Multi-threaded+DRAM+Access+Scheduler&cd=1&hl=en&ct=clnk&gl=usGoogle ScholarGoogle Scholar
  35. Stream Processors Inc. 2008. Storm-1 SP16HP-G220 Product Brief. http://www.streamprocessors.com.Google ScholarGoogle Scholar
  36. The Mathworks. 2008. Simulink—Simulation and Model-Based Design Homepage. http://www.mathworks.com/products/simulink/.Google ScholarGoogle Scholar
  37. Thoma, F., Kühnle, M., Bonnot, P., Panainte, E. M., Bertels, K., Goller, S., Schneider, A., Guyetant, S., Schüler, E., Müller-Glaser, K. D., and Becker, J. 2007. MORPHEUS: Heterogeneous reconfigurable computing. In Proceedings of 17th International Conference on Field-Programmable Logic and Applications (FPL'07). IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  38. Thomson Grassvalley. 2008. Scream 4K/2K/HD noise reducer Web site. http://www.thomsongrassvalley.com.Google ScholarGoogle Scholar
  39. Weber, W.-D. 2001. Efficient shared DRAM subsystems for SOCs. Microprocessor Forum.Google ScholarGoogle Scholar
  40. Whitty, S. and Ernst, R. 2008. A bandwidth optimized SDRAM controller for the MORPHEUS reconfigurable architecture. In Proceedings of the Parallel and Distributed Processing Symposium (IPDPS). IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  41. Xilinx, Inc. 2008. Xilinx Virtex 5 Family Overview. http://www.xilinx.com.Google ScholarGoogle Scholar

Index Terms

  1. Application development with the FlexWAFE real-time stream processing architecture for FPGAs

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader