Accelerating OpenVX Application Kernels Using Halide Scheduling

Zhao, Bo-Yu; Peng, Ming-Yi; Wang, Xiang-Yu; Liao, Shih-Wei

doi:10.1007/s11265-023-01851-1

Accelerating OpenVX Application Kernels Using Halide Scheduling

Published: 28 February 2023

Volume 95, pages 623–642, (2023)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Bo-Yu Zhao¹^na1,
Ming-Yi Peng¹^na1,
Xiang-Yu Wang²^na1 &
…
Shih-Wei Liao¹^na1

147 Accesses
Explore all metrics

Abstract

In this study, we investigate how to use a Domain-Specific Language—Halide to accelerate and optimize OpenVX graphs. Halide is a new high-level image processing pipeline language. It offers developers to separate the program into algorithms and schedule. This makes developers program friendly. The Halide image processing language has also proven to be an effective system for authoring high-performance image processing code. We present a prototype that use Halide to optimize OpenVX image processing modules. Since OpenVX is a lack of scheduling primitives, but Halide does. We implemented Halide into OpenVX graphs. This method increases the developer’s utilities and achieves relatively high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

In-memory database acceleration on FPGAs: a survey

Article Open access 26 October 2019

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

Data Availability

The datasets generated during and analyzed during the current study are available from the corresponding author on reasonable request.

References

Khronos Group. (2014). The OpenVX API for hardware acceleration. Retrieved August 7, 2021, from https://www.khronos.org/openvx/
Tagliavini, G., et al. (2016). Optimizing memory bandwidth exploitation for openvx applications on embedded many-core accelerators. Journal of Real-Time Image Processing.
Tagliavini, G., Haugou, G., & Benini, L. (2014). Optimizing memory bandwidth in OpenVX graph execution on embedded many-core accelerators. In Proceedings of the 2014 Conference on Design and Architectures for Signal and Image Processing, pp. 1–8. IEEE.
Dekkiche, D., Vincke, B., & Merigot, A. (2016). Investigation and performance analysis of openvx optimizations on computer vision applications. In 14th International Conference on Control, Automation, Robotics and Vision, pp. 1–6.
Tagliavini, G., Haugou, G., Marongiu, A., & Benini, L. (2015). ADRENALINE: an OpenVX environment to optimize embedded vision applications on many-core accelerators. In IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on- Chip (MCSoC), pp. 289–296.
Ragan-Kelley, J., Adams, A., Paris, S., Levoy, M., Ama-Rainghe, S., & Durand, F. (2012) Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics, 31(4), 32.
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. ACM.
Rainey, E., Villarreal, J., Dedeoglu, G., Pulli, K., Lepley, T., & Brill, F. (2014). Addressing System-Level Optimization with OpenVX Graphs. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 658–663.
Canis, A., Choi, J., Aldham, M., Zhang, V., Kammoona, A., Anderson, J. H., Brown, S., & Czajkowski, T. (2011). LegUp: high-level synthesis for FPGA-based processor/accelerator systems. In: Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 33–36. ACM.
Gehrig, S. K., Eberli, F., & Meyer, T. (2009). A real-time low-power stereo vision engine using semi-global matching. In: Computer Vision Systems, pp. 134–143. Springer.
Lei, Y., Gang, Z., Si-Heon, R., Choon-Young, L., Sang-Ryong, L., & Bae, K. M. (2008). The platform of image acquisition and processing system based on DSP and FPGA. In: International Conference on Smart Manufacturing Application, pp. 470–473. IEEE.
Cong, J., Ghodrat, M. A., Gill, M., Grigorian, B., & Reinman, G. (2012). CHARM: a composable heterogeneous accelerator-rich micro- processor. In: Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 379–384. ACM.
Cong, J., Liu, C., Ghodrat, M.A., Reinman, G., Gill, M., & Zou, Y. (2011). AXR-CMP: architecture support in accelerator-rich CMPs. In: 2nd Workshop on SoC Architecture, Accelerators and Workloads.
Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E., & LeCun, Y. (2011) Neuflow: a runtime reconfigurable dataflow processor for vision. In: 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 109–116. IEEE.
Hegarty, J., Brunhaver, J., DeVito, Z., Ragan-Kelley, J., Cohen, N., Bell, S., Vasilyev, A., Horowitz, M., & Hanrahan, P. D. (2014). Compiling high-level image processing code into hardware pipelines. In: Proceedings of the 41st International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH).
Intel. (2000). OpenCV Library. Retrieved September 16, 2021, from http://www.opencv.org
Coombs, J., Prabhu, R., & Peake, G. (2012). Overcoming the challenges of porting OpenCV to TI’s embedded ARM+ DSP platforms. International Journal of Electrical Engineering Education, 49(3), 260–274.
Article Google Scholar
Nvidia. (2008). Tegra Android Development Documentation Website. Retrieved September 1, 2021, from http://docs.nvidia.com/tegra/index.html
Qualcomm. (2015). Computer Vision (FastCV). Retrieved September 2, 2021, from https://developer.qualcomm.com/computer-vision-fastcv
Stone, J. E., Gohara, D., & Shi, G. (2010). OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering.
Czajkowski, T. S., Aydonat, U., Denisenko, D., Freeman, J., Kinsner, M., Neto, D., Wong, J., Yiannacouras, P., & Singh, DP. (2012). From OpenCL to high-performance hardware on FPGAs. In: 22nd International Conference on Field Programmable Logic and Applications (FPL), pp. 531–534. IEEE.
Boudier, P., & Sellers, G. (2011). Memory system on fusion APUs. AMD fusion developer summit. Retrieved August 20, 2021, from https://developer.amd.com/wordpress/media/2013/06/1004_final.pdf
Mullapudi, R. T., Adams, A., Sharlet, D., Ragan-Kelley, J., & Fatahalian, K. (2016, July). Automatically scheduling halide image processing pipelines. ACM Transactions on Graphics, 35(4), Article 83;11.
Mullapudi, R. T., Vasista, V., & Bondhugula, U. (2015). PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the Twentieth International Confer- ence on Architectural Support for Programming Languages and Operating Systems, pp. 429–443.
Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 25, 1106–1114.
Google Scholar

Download references

Author information

Bo-Yu Zhao, Ming-Yi Peng, Xiang-Yu Wang and Shih-Wei Liao contributed equally to this work.

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Bo-Yu Zhao, Ming-Yi Peng & Shih-Wei Liao
Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan
Xiang-Yu Wang

Authors

Bo-Yu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Yi Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xiang-Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Wei Liao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shih-Wei Liao.

Ethics declarations

Conflicts of Interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, BY., Peng, MY., Wang, XY. et al. Accelerating OpenVX Application Kernels Using Halide Scheduling. J Sign Process Syst 95, 623–642 (2023). https://doi.org/10.1007/s11265-023-01851-1

Download citation

Received: 10 July 2022
Revised: 31 January 2023
Accepted: 01 February 2023
Published: 28 February 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11265-023-01851-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating OpenVX Application Kernels Using Halide Scheduling

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

In-memory database acceleration on FPGAs: a survey

Shared Memory Parallelism in Modern C++ and HPX

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accelerating OpenVX Application Kernels Using Halide Scheduling

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

In-memory database acceleration on FPGAs: a survey

Shared Memory Parallelism in Modern C++ and HPX

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation