Abstract
In this paper, we present a new performance improvement technique, window memoization, for software implementations of local image processing algorithms. Window memoization combines the memoization techniques proposed in software and hardware with data redundancy in image processing to improve the performance of local image processing algorithms. It minimizes the number of redundant computations performed on an image by identifying similar neighborhoods of pixels in the image and skipping the computations that are not necessary. This leads to performance improvement in software. We have developed an optimized architecture for window memoization in software and applied it to six image processing algorithms. We have also developed a performance model to predict the speedups obtained by window memoization in software. The typical (average) speedups range from 1.2x to 7.9x while the total average speedup for different algorithms with different input images across different processors is 3.95x.
Similar content being viewed by others
Notes
This method of fast symbol generation, which benefits from overlapping windows in the image, is similar to Huang’s [10] method for fast median filter.
The error in an image (Img) with respect to a reference image (R Img ) is usually measured by signal-to-noise ratio (SNR) as [8]: \(\hbox{SNR} = 20\hbox{log}_{10}(\frac{A_{\hbox{signal}}}{A_{\hbox{noise}}})\) where A is the RMS (root mean squared) amplitude. \(A^2_{\hbox{noise}}\) is defined as: \(A^{2}_{\hbox{noise}} =\frac{1}{rc} \sum\nolimits_{i=0}^{r-1}\sum\nolimits_{j=0}^{c-1}(Img(i,j)-R_{Img}(i,j))^2\) where \(r \times c\) is the size of Img and R Img .
c a: average number of CPU cycles for arithmetic operations
c l: average number of CPU cycles for logical operations
c mul: average number of CPU cycles for multiplication operations
c m: average number of CPU cycles for memory operations
Area overlap for two sets A and B is calculated as \(\frac{|A \cap B|}{|A \cup B|}. \)
References
Alvarez, C., Corbal, J., Salami, E., Valero, M.: On the potential of tolerant region reuse for multimedia applications. In: International Conference on Supercomputing, pp. 218–228 (2001)
Alvarez, C., Corbal, J., Valero, M.: Fuzzy memoization for floating-point multimedia applications. IEEE Trans. Comput. 54(7), 922–927 (2005)
Bird, R.S.: Tabulation techniques for recursive programs. ACM Comput. Surv. 12(4), 403–417 (1980)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein C.: Introduction to Algorithms. The MIT Press, Cambridge, MA (2001)
Semacode Corporation: http://www.semacode.com. Accessed 29 Feb 2012
Ding, Y., Li, Z.: Operation reuse on handheld devices. In: Languages and Compilers for Parallel Computing (LCPC-03), vol. 2958/2004, pp. 273–287. Springer, Berlin (2003)
Egmont-Petersen, M., de Ridder, D., Handels, H.: Image processing with neural networks, a review. Pattern Recognit. 35, 2279–2301 (2002)
Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice Hall, Upper Saddle River (2008)
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Morgan Kaufmann, Boston, MA (2003)
Huang, T.S., Yang, G.J., Tang, G.Y.: A fast two-dimensional median filtering algorithm. IEEE Trans. Acoust. Speech Signal Process. ASSP 27(1), 13–18 (1979)
Hughes, J.: Lazy memo-functions. In: A Conference on Functional Programming Languages and Computer Architecture, pp. 129–146. Springer, New York (1985)
Philips Breast Images: http://www.medical.philips.com/main/products/ultrasound. Accessed 29 Feb 2012
Intel: IA-32 Intel Architecture Optimization (2004)
Jain, A.K.: Image data compression: a review. Proc. IEEE 69, 349–389 (1981)
Khalvati, F., Aagaard, M.D.: Window memoization: an efficient hardware architecture for high-performance image processing. J. Real-Time Image Process. doi:10.1007/s11554-009-0128-y (2009)
Khalvati, F., Aagaard, M.D., Tizhoosh, H.R.: Accelerating image processing algorithms based on the reuse of spatial patterns. In: Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 172–175 (2007)
Khalvati, F., Tizhoosh, H.R., Aagaard, M.D.: Opposition-based window memoization for morphological algorithms. In: IEEE Symposium on Computational Intelligence in Image and Signal Processing (CIISP), pp. 425–430 (2007)
Kirsch, R.A.: Computer determination of the constituent structure of biological images. Comput. Biomed. Res. 4, 315–328 (1971)
Robarts Imaging Research Laboratories: http://www.imaging.robarts.ca. Accessed 29 Feb 2012
Lipasti, M.H., Wilkerson, C.B., Shen, J.P.: Value locality and load value prediction. In: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pp. 138–147 (1996)
Mayfield, J., Finin, T., Hall, M.: Using automatic memoization as a software engineering tool in real-world AI systems. In: The 11th Conference on Artificial Intelligence for Applications (CAIA-95), pp. 87–93 (1995)
Michie, D.: Memo functions and machine learning. Nature 218, 19–22 (1968)
Pugh, W.: An improved replacement strategy for function caching. In: The 1988 ACM Conference on LISP and Functional Programming (LFP-88), pp. 269–276. ACM (1988)
Pugh, W., Teitelbaum, T.: Incremental computation via function caching. In: The 16th Annual ACM Symposium on Principles of Programming Languages, pp. 315–328 (1989)
Richardson, S.E.: Exploiting trivial and redundant computation. In: IEEE Symposium on Computer Arithmetics, pp. 220–227 (1993)
Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13(1), 146 (2004)
Shen, J.P., Lipasti, M.H.: Modern Processor Design. McGraw-Hill, New York (2004)
Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis, and Machine Vision. PWS, Pacific Grove, CA (1999)
Trajkovi, M., Hedley, M.: Fast corner detection. Image Vis. Comput. 16, 75–87 (1998)
Tuytelaars, T., Mikolajczyk, K.: Survey on local invariant features. FnT Comput. Graph. Vis. 1(1), 1–94 (2008)
Wang, W., Raghunathan, A., Jha, N.K.: Profiling driven computation reuse: An embedded software synthesis technique for energy and performance optimization. In: IEEE International Conference on VLSI Design (VLSID-04 Design), p. 267 (2004)
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In this section, we present the numerical values of speedups and results accuracy for window memoization in software. For natural images, we also present the original results for a sample image along with the results for window memoization for all six case study algorithms used in this paper. The algorithms include Canny edge detector (Canny), morphological gradient (Morpho), Kirsch edge detector (Kirsch), corner detector (Corner), median filter (Median), and local variance calculator (Variance) (Tables 11, 12, 13, 14; Figs. 11, 12).
Rights and permissions
About this article
Cite this article
Khalvati, F., Aagaard, M.D. & Tizhoosh, H.R. Window memoization: toward high-performance image processing software. J Real-Time Image Proc 10, 5–25 (2015). https://doi.org/10.1007/s11554-012-0247-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-012-0247-8