skip to main content
research-article

Rigel: flexible multi-rate image processing hardware

Published: 11 July 2016 Publication History

Abstract

Image processing algorithms implemented using custom hardware or FPGAs of can be orders-of-magnitude more energy efficient and performant than software. Unfortunately, converting an algorithm by hand to a hardware description language suitable for compilation on these platforms is frequently too time consuming to be practical. Recent work on hardware synthesis of high-level image processing languages demonstrated that a single-rate pipeline of stencil kernels can be synthesized into hardware with provably minimal buffering. Unfortunately, few advanced image processing or vision algorithms fit into this highly-restricted programming model.
In this paper, we present Rigel, which takes pipelines specified in our new multi-rate architecture and lowers them to FPGA implementations. Our flexible multi-rate architecture supports pyramid image processing, sparse computations, and space-time implementation tradeoffs. We demonstrate depth from stereo, Lucas-Kanade, the SIFT descriptor, and a Gaussian pyramid running on two FPGA boards. Our system can synthesize hardware for FPGAs with up to 436 Megapixels/second throughput, and up to 297x faster runtime than a tablet-class ARM CPU.

Supplementary Material

ZIP File (a85-hegarty-aux.zip)
Supplemental files.
MP4 File (a85.mp4)

References

[1]
Adams, A., Talvala, E.-V., Park, S. H., Jacobs, D. E., Ajdin, B., Gelfand, N., Dolson, J., Vaquero, D., Baek, J., Tico, M., Lensch, H. P. A., Matusik, W., Pulli, K., Horowitz, M., and Levoy, M. 2010. The Frankencamera: An experimental platform for computational photography. ACM Transactions on Graphics 29, 4 (July), 29:1--29:12.
[2]
Adelson, E. H., Anderson, C. H., Bergen, J. R., Burt, P. J., and Ogden, J. M. 1984. Pyramid methods in image processing. RCA engineer 29, 6, 33--41.
[3]
Bilsen, G., Engels, M., Lauwereins, R., and Peper-straete, J. 1995. Cyclo-static data flow. In 1995 International Conference on Acoustics, Speech, and Signal Processing, vol. 5, 3255--3258.
[4]
Bouguet, J.-Y. 2001. Pyramidal implementation of the affine Lucas Kanade feature tracker description of the algorithm. Tech. rep., Intel Corporation.
[5]
Brunhaver, J. 2015. Design and Optimization of a Stencil Engine. PhD thesis, Stanford University.
[6]
DeVito, Z., Hegarty, J., Aiken, A., Hanrahan, P., and Vitek, J. 2013. Terra: A multi-stage language for high-performance computing. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, 105--116.
[7]
Elinux, 2015. Jetson computer vision performance. http://elinux.org/Jetson/Computer_Vision_Performance. {Online; accessed 12-April-2016}.
[8]
Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B. C., Richardson, S., Kozyrakis, C., and Horowitz, M. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ACM, 37--47.
[9]
Harris, C., and Stephens, M. 1988. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference, 147--151.
[10]
Hegarty, J., Brunhaver, J., DeVito, Z., Ragan-Kelley, J., Cohen, N., Bell, S., Vasilyev, A., Horowitz, M., and Hanrahan, P. 2014. Darkroom: Compiling high-level image processing code into hardware pipelines. ACM Transactions on Graphics 33, 4 (July), 144:1--144:11.
[11]
Horstmannshoff, J., Grotker, T., and Meyr, H. 1997. Mapping multirate dataflow to complex rt level hardware models. In Application-Specific Systems, Architectures and Processors, 1997. Proceedings., IEEE International Conference on, 283--292.
[12]
Huang, J., Qian, F., Gerber, A., Mao, Z. M., Sen, S., and Spatscheck, O. 2012. A close examination of performance and power characteristics of 4g lte networks. In Proceedings of the 10th international conference on Mobile systems, applications, and services, ACM, 225--238.
[13]
Lee, E. A., and Messerschmitt, D. G. 1987. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers 100, 1, 24--35.
[14]
Leiserson, C. E., and Saxe, J. B. 1991. Retiming synchronous circuitry. Algorithmica 6, 1-6, 5--35.
[15]
Lowe, D. 1999. Object recognition from local scale-invariant features. In The Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, 1150--1157 vol. 2.
[16]
Lucas, B. D., Kanade, T., et al. 1981. An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence, vol. 81, 674--679.
[17]
Mullapudi, R. T., Adams, A., Sharlet, D., Ragan-Kelley, J., and Fatahalian, K. 2016. Automatically scheduling halide image processing pipelines. ACM Transactions on Graphics 35, 4 (July).
[18]
Murthy, P. K., and Lee, E. 2002. Multidimensional synchronous dataflow. IEEE Transactions on Signal Processing 50, 8, 2064--2079.
[19]
Murthy, P., Bhattacharyya, S., and Lee, E. 1997. Joint minimization of code and data for synchronous dataflow programs. Formal Methods in System Design 11, 1, 41--70.
[20]
Ragan-Kelley, J., Adams, A., Paris, S., Levoy, M., Amarasinghe, S., and Durand, F. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics 31, 4, 32.
[21]
Scharstein, D., and Szeliski, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision 47, 1-3, 7--42.
[22]
Sugerman, J., Fatahalian, K., Boulos, S., Akeley, K., and Hanrahan, P. 2009. Gramps: A programming model for graphics pipelines. ACM Transactions on Graphics 28, 1 (Feb.), 4:1--4:11.
[23]
Vivado, 2016. Vivado high-level synthesis. http://www.xilinx.com/products/design-tools/vivado/integration/esl-design/. {Online; accessed 12-April-2016}.
[24]
Xilinx. 2016. Zynq-7000 All Programmable SoC Overview. DS190 Rev. 1.9.
[25]
Xilinx, 2016. Power efficiency. http://www.xilinx.com/products/technology/power.html. {Online; accessed 12-April-2016}.

Cited By

View all
  • (2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
  • (2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
  • (2024)SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for HalideIEEE Access10.1109/ACCESS.2023.334566012(7563-7583)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 35, Issue 4
July 2016
1396 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2897824
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2016
Published in TOG Volume 35, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGAs
  2. domain-specific languages
  3. hardware synthesis
  4. image processing
  5. video processing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)4
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
  • (2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
  • (2024)SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for HalideIEEE Access10.1109/ACCESS.2023.334566012(7563-7583)Online publication date: 2024
  • (2023)FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific CompilerACM Transactions on Architecture and Code Optimization10.1145/362952320:4(1-25)Online publication date: 25-Oct-2023
  • (2023)HIR: An MLIR-based Intermediate Representation for Hardware Accelerator DescriptionProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624767(189-201)Online publication date: 25-Mar-2023
  • (2023)Modular Hardware Design with Timeline TypesProceedings of the ACM on Programming Languages10.1145/35912347:PLDI(343-367)Online publication date: 6-Jun-2023
  • (2023)ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing AcceleratorsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589076(1-13)Online publication date: 17-Jun-2023
  • (2023)Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory AcceleratorsACM Transactions on Architecture and Code Optimization10.1145/357290820:2(1-26)Online publication date: 1-Mar-2023
  • (2023)AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and CompilersACM Transactions on Embedded Computing Systems10.1145/353493322:2(1-34)Online publication date: 24-Jan-2023
  • (2023)Software-Defined Imaging: A SurveyProceedings of the IEEE10.1109/JPROC.2023.3266736111:5(445-464)Online publication date: May-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media