Abstract
Through continued miniaturization of electronic devices embedded smart cameras are steadily becoming more and more important. The reduction of the camera size increases the spectrum of applications. In industrial applications the range of smart cameras spans from quality monitoring and position tracking to the calibration of production machines. In non-professional applications a distinct boom in action cameras combined with fused sensor information can be observed. However, all of these applications have a common bottleneck: the memory architecture. Most image processing applications are memory-bound tasks. Thus, the amount of time for transferring data with image processing applications decisively affects the application’s entire processing time. Different memory access patterns require different memory configurations and hierarchies. An insufficient match between the image processing application and the memory architecture leads to a poor performance in the image processing system. This can lead to longer processing times, and larger energy consumption rates. This work introduces new methods of classifying image processing applications by using their memory access pattern for mapping on memory architectures. Our work combines a simulation framework the heterogenous memory simulator with a analytical framework the memory analyzer to find bottlenecks inside the image processing application and aids in finding a suitable, application-specific memory configuration in terms of processing time and energy consumption.
Similar content being viewed by others
References
Avnet. http://www.zedboard.org/ (2016)
Bailey, D.: Design for Embedded Image Processing on FPGAs. Wiley, New York (2011)
Binkert, N., Beckmann, B., Black, G., Reinhardt, S., Saidi, A., Basu, A., Hestness, J., Hower, D., Krishna, T., Sardashti, S., Sen, R., Sewel, K., Shoaib, M., Vaish, N., Hill, M., Wood, D.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011)
Burger, W., Burge, M.: Principles of Digital Image Processing. Springer, London (2009)
Das, S., Aamodt, T.M., Dally, W.J.: Reuse distance-based probabilistic cache replacement. Trans. Archit. Code Optim. 12(4), 33:1–33:22 (2015)
Eeckhout, L.: Computer Architecture Performance Evaluation Methods. Morgan and Claypool, Wisconsin (2010)
Gonzalez, R., Woods, R.: Digital Image Processing. Person Education Ltd., London (2008)
GPGPU-Sim. http://www.gpgpu-sim.org (2017)
Hartmann, C., Reichenbach, M., Fey, D.: Ipol—a domain specific language for image processing applications. In: Proceedings of the International Symposium on International Conference on Systems, pp. 40–43. Barcelona, Spain, IARIA (2015)
Hartmann, C., Häublein, K., Reichenbach, M., Fey, D.: Ipas: a design framework for analysis, synthesis and optimization of image processing applications for heterogenous computing architectures. J. Real Time Image Process. 11, 1–16 (2016). doi:10.1007/s11554-016-0587-x
Herglotz, C., Seiler, J., Kaup, A., Hendricks, A., Reichenbach, M., Fey, D.: Estimation of non-functional properties for embedded hardware with application to image processing. In: Proceedings of the International Parallel and Distributed Processing Symposium Workshop, pp. 190–195. Hyderabad, Malay, IEEE (2015)
HP Labs. http://www.hpl.hp.com/research/cacti/ (2016)
Imperas. www.imperas.com (2016)
Intel. www.intel.com (2016)
Mathematica. http://www.wolfram.com/mathematica/ (2016)
Naji, O., Hansson, A., Weis, C., Jung, M., Wehn, N.: A high-level dram timing, power and area exploration tool. In: International Conference on Embedded Computer Systems Architectures Modeling and Simulation, pp. 149–156. IEEE (2015)
Nugteren, C., van den Braak, G.-J., Corporaal, H., Bal, H.: A detailed gpu cache model based on reuse distance theory. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp. 37–48. IEEE (2014)
Pan, X., Jonsson, B.: A modeling framework for reuse distance-based estimation of cache performance. In: Performance Analysis of Systems and Software (ISPASS), pp. 62–71. Philadelphia, USA, IEEE (2015)
Pelcat, M., Desnos, K., Heulot, J., Guy, C., Nezan, J-F., Aridhi, S.: Preesm: a dataflow-based rapid prototyping framework for simplifying multicore dsp programming. In: European Embedded Design in Education and Research Conference, pp. 30–40. Milano, Italy, IEEE (2014)
Schmidt, M., Reichenbach, M., Fey, D.: Traffic sign recognition with color-based method, shape-arc estimation and svm. In: International Conference on Electrical Engineering and Informatics (ICEEI), pp. 1–6. IEEE (2011)
Schmidt, M., Reichenbach, M., Fey, D.: A generic vhdl template for 2d stencil code applications on fpgas. In: International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops (ISORCW), pp. 180–187. IEEE (2012)
Xu, C., Chen, X., Dick, R., Mao, Z.: Cache contention and application performance prediction for multi-core systems. In: Performance Analysis of Systems and Software (ISPASS), pp. 76–86. White Plains, USA, IEEE (2010)
Zimmer. http://www.zes.com/en/Products/Precision-Power-Analyzer/LMG640 (2016)
Acknowledgements
This work is supported by the Bavarian Research Foundation (BFS) as part of their research project “FORMUS3IC”.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hartmann, C., Fey, D. An extended analysis of memory hierarchies for efficient implementations of image processing applications. J Real-Time Image Proc 14, 713–728 (2018). https://doi.org/10.1007/s11554-017-0723-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-017-0723-2