Skip to main content
Log in

Accelerating OpenVX through Halide and MLIR

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In recent years, as many social media and AI-enabled applications have become increasingly ubiquitous, camera-centric applications have emerged as the most popular category of apps on mobile phones. A programmer can develop a camera application in an hour or less without any knowledge related to this domain by using different API provided by frameworks. This allows for the rapid promotion of this technology. OpenVX is a computer vision framework with vital considerations for performance and portability. This paper proposes a new framework that effectively accelerates OpenVX with Halide and MLIR. Our framework possesses Halide’s properties of decoupling algorithms and also has schedules such as an auto-scheduler. It also has MLIR’s multi-level dialects that structure the operations and the data accesses. To generate more efficient programs, we propose a bridge that can transform the programs written in OpenVX into Halide and then translate from Halide to MLIR. In the process, Our framework attains both Halide’s scheduling and MLIR’s dialect to generate more efficient binary code for execution speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26

Similar content being viewed by others

Data Availability

The datasets generated during and analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  2. Baghdadi, R., Ray, J., Romdhane, M. B., Sozzo, E. D., Akkas, A., Zhang, Y., Suriana, P., Kamil, S., & Amarasinghe, S. (2018). Tiramisu: A polyhedral compiler for expressing fast and portable code.

  3. Benabderrahmane, M.-W., Pouchet, L.-N., Cohen, A., & Bastoul, C. (2010). The polyhedral model is more widely applicable than you think. In R. Gupta (Ed.), Compiler Construction (pp. 283–303). Berlin, Heidelberg: Springer.

    Chapter  Google Scholar 

  4. Hartono, A., Baskaran, M. M., Bastoul, C., Cohen, A., Krishnamoorthy, S., Norris, B., Ramanujam, J., & Sadayappan, P. (2009). Parametric multi-level tiling of imperfectly nested loops. In Proceedings of the 23rd International Conference on Supercomputing. ICS ’09 (pp. 147–157). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1542275.1542301

  5. Tavarageri, S., Hartono, A., Baskaran, M., Pouchet, L.-N., Ramanujam, J., & Sadayappan, P. (2010). Parametric tiling of affine loop nests. In Proceedings of the 15th Workshop on Compilers for Parallel Computers, Vienna, Austria.

  6. Maleki, S., Gao, Y., Garzar’n, M. J., Wong, T., & Padua, D. A. (2011). An evaluation of vectorizing compilers. In 2011 International Conference on Parallel Architectures and Compilation Techniques (pp. 372–382). https://doi.org/10.1109/PACT.2011.68

  7. Giduthuri, R., & Pulli, K. (2016). OpenVX: A framework for accelerating computer vision. In SIGGRAPH ASIA 2016 Courses. https://doi.org/10.1145/2988458.2988513

  8. Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 519–530). PLDI ’13. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2491956.2462176

  9. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., & Zheng, X. (2016). Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (pp. 265–283). Retrieved September 20, 2021, from https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi

  10. Lattner, C., Amini, M., Bondhugula, U., Cohen, A., Davis, A., Pienaar, J., Riddle, R., Shpeisman, T., Vasilache, N., & Zinenko, O. (2021). MLIR: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (pp. 2–14). https://doi.org/10.1109/CGO51591.2021.9370308

  11. Bondhugula, U. (2020). High performance code generation in MLIR: An early case study with GEMM. CoRR abs/2003.00532. arXiv:2003.00532

  12. Wang, E., et al. (2014). Intel Math Kernel Library. In High-performance computing on the Intel® Xeon Phi™. Springer, Cham. Retrieved September 1, 2021, from https://doi.org/10.1007/978-3-319-06486-4_7

  13. Wang, Q., Zhang, X., Zhang, Y., & Yi, Q. (2013). AUGEM: Automatically generate high performance dense linear algebra Kernels on x86 CPUS. In SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (pp. 1–12). https://doi.org/10.1145/2503210.2503219

  14. Xianyi, Z., Qian, W., & Yunquan, Z. (2012). Model-driven level 3 BLAS performance optimization on Loongson 3A processor. In 2012 IEEE 18th International Conference on Parallel and Distributed Systems (pp. 684–691). https://doi.org/10.1109/ICPADS.2012.97

  15. Verdoolaege, S. (2010). ISL: An integer set library for the polyhedral model. In Proceedings of the Third International Congress Conference on Mathematical Software (pp. 299–302). ICMS’10. Springer, Berlin, Heidelberg.

  16. Bastoul, C. (2004). Code generation in the polyhedral model is easier than you think. In PACT’13 IEEE International Conference on Parallel Architecture and Compilation Techniques, Juan-les-Pins, France (pp. 7–16).

  17. Goto, K., & Geijn, R. A. V. D. (2008). Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software, 34(3). https://doi.org/10.1145/1356052.1356053

Download references

Acknowledgements

We are grateful for the helpful comments and suggestions about English writing from Tony Siu at Tamkang University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shih-Wei Liao.

Ethics declarations

Conflicts of Interest

The authors have no competing interests to declare relevant to this article’s content.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, SL., Wang, XY., Peng, MY. et al. Accelerating OpenVX through Halide and MLIR. J Sign Process Syst 95, 571–584 (2023). https://doi.org/10.1007/s11265-022-01826-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-022-01826-8

Keywords

Navigation