Skip to main content
Log in

EVE: A Flexible SIMD Coprocessor for Embedded Vision Applications

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In this paper we introduce EVE (embedded vision/vector engine), with a FlexSIMD (flexible SIMD) architecture highly optimized for embedded vision. We show how EVE can be used to meet the growing requirements of embedded vision applications in a power- and area-efficient manner. EVE’s SIMD features allow it to accelerate low-level vision functions (such as image filtering, color-space conversion, pyramids, and gradients). With added flexibility of data accesses, EVE can also be used to accelerate many mid-level vision tasks (such as connected components, integral image, histogram, and Hough transform). Our experiments with a silicon implementation of EVE show that it performs many low- and mid-level vision functions with a 3–12x speed advantage over a C64x+DSP, while consuming less power and area. EVE also achieves code size savings of 4–6x over a C64x+DSP for regular loops. Thanks to its flexibility and programmability, we were able to implement two end-to-end vision applications on EVE and achieve more than a 5× application-level speedup over a C64x+. Having EVE as a coprocessor next to a DSP or a general purpose processor, algorithm developers have an option to accelerate the low- and mid-level vision functions on EVE. This gives them more room to innovate and use the DSP for new, more complex, high-level vision algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

Notes

  1. T. S. Huang, in introduction to [11].

  2. EVE can load 16 8-bit or 16 16-bit in 1 cycle. With 32-bit data, EVE can load 16 32-bit data in 1.5 cycles on average, using two regular vector loads and intelligent load buffering. EVE has load and store pre-fetch units, which exploit data re-use, and allow us to deal with memory contention.

  3. For cases in which the algorithm can guarantee that the offsets do indeed point to different memory banks, we can use p_scatter (parallel scatter) and do them at the rate of 8 values per cycle, as opposed to the general case of sequential scatter.

References

  1. Bertozzi, M., et al. (2002). Artificial vision in road vehicles. Proceedings of the IEEE, 90(7), 1258-1271.

    Article  Google Scholar 

  2. Chai, S. (2008). Mobile challenges for embedded computer vision. In B. Kisačanin, S.S. Bhattacharyya, S. Chai (Eds.), Embedded computer vision. London: Springer.

    Google Scholar 

  3. Chiricescu, S., et al. (2005). RSVP II: a next generation automotive processor. In Proceedings of the intelligent vehicles symposium.

  4. Crnojevic, V.S., Schubert, P.J., Kisačanin, B. (2006). Method of developing a classifier using adaboost-over-genetic programming. US Patent Application 20080126275.

  5. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE CVPR.

  6. Gagvani, N. (2008). Challenges in video analytics. In B. Kisačanin, S.S. Bhattacharyya, S. Chai (Eds.), Embedded computer vision. London: Springer.

    Google Scholar 

  7. Goodacre, J., & Sloss, A.N. (2005). Parallelism and the ARM instruction set architecture. Computer, 38(7), 42–50.

    Article  Google Scholar 

  8. He, B., et al. (2007). Efficient gather and scatter operations on graphics processors. In Proceedings of SC07.

  9. Horn, B.K.P. (2003). Determining constant optical flow. http://people.csail.mit.edu/bkph/articles/Fixed_Flow.pdf (retrieved 20 January 2009).

  10. Iwata, N., Kagami, S., Hashimoto, K. (2007). A dynamically reconfigurable architecture combining pixel-level SIMD and operation-pipeline modes for high frame rate visual processing. In Proceedings of the ICFPT.

  11. Kisačanin, B., Pavlović, V., Huang, T.S. (Eds.) (2005). Real-time computer vision for human-computer interaction. New York: Springer.

    Google Scholar 

  12. Kisačanin, B., & Nikolić, Z. (2010). Algorithmic and software techniques for embedded vision on programmable processors. Signal Processing: Image Communication, 25(5), 352-362.

    Google Scholar 

  13. Kisačanin, B. (2011). Automotive vision for advanced driver assistance systems. In Proceedings of international symposium on VLSI design, automation and test (VLSI-DAT).

  14. Komuro, T., Kagami, S., Ishikawa, M.A. (2004). Dynamically reconfigurable SIMD processor for a vision chip. IEEE Journal Solid-State Circuits, 39(1), 265-268.

    Article  Google Scholar 

  15. Kyo, S., & Okazaki, S. (2008). In-vehicle vision processors for driver assistance systems. In Proceedings of the design automation conference.

  16. Lee, V.W., et al. (2010). Debunking the 100 X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proceedings of ISCA.

  17. Lipton, A.J. (2008). We can watch it for you wholesale. In B. Kisačanin, S.S. Bhattacharyya, S. Chai (Eds.), Embedded computer vision. London: Springer.

    Google Scholar 

  18. Owens, J.D., et al. (2008). GPU computing. In Proceedings of the IEEE.

  19. Porikli, F. (2005). Integral histogram: a fast way to extract histograms in Cartesian spaces. In Proceedings of the IEEE CVPR.

  20. Shotton, J., et al. (2011). Real-time human pose recognition in parts from single depth images. In Proceedings of the IEEE CVPR.

  21. Simar, R., & Tatge, R. (2009). How TI adopted VLIW in digital signal processors. IEEE Solid-State Circuits Magazine.

  22. Stein, G.P., et al. (2005). A computer vision system on a chip: a case study from the automotive domain. In Proceedings of the IEEE workshop on embedded computer vision.

  23. Trivedi, M.M., Gandhi, T., McCall, J. (2007). Looking-in and looking-out of a vehicle: computer-vision-based enhanced vehicle safety. IEEE Transactions on Intelligent Transportation Systems, 8(1), 108-120.

    Article  Google Scholar 

  24. Van Der Wal, G.S. (2010). Technical overview of Sarnoff Acadia II vision processor. In Proceedings of SPIE 7710, multisensor, multisource information fusion: architectures, algorithms, and applications.

Download references

Acknowledgments

The authors would like to thank Jeremiah Golston and Peter Barnum of Texas Instruments for their valuable comments on an early draft of this paper. We are also grateful to the anonymous reviewers of this paper for their constructive suggestions and comments. We would also like to express our gratitude to our management and colleagues for their support throughout this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Branislav Kisačanin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sankaran, J., Hung, CY. & Kisačanin, B. EVE: A Flexible SIMD Coprocessor for Embedded Vision Applications. J Sign Process Syst 75, 95–107 (2014). https://doi.org/10.1007/s11265-013-0770-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-013-0770-2

Keywords

Navigation