Abstract
Convolution has been extensively used in image processing and computer vision, including image enhancement, smoothing, and structure extraction. However, convolution operation typically requires a significant amount of computing resources. A novel one-dimensional (1D) convolution processor with reconfigurable architecture is implemented in this study. This processor is a combination of a line buffer, controller units, as well as a reconfigurable and separable convolution module. The use of a reconfigurable architecture and separable convolution approach improves the flexibility and performance of the convolution processor. The reconfigurable and separable convolution array, which is the main component of the processor, can simultaneously execute convolution operation with different kernels, with a maximum kernel size of up to 24 × 24. Experimental results show that the maximum frames rate of the processor is approximately 194 frames per second (fps), which exceeds the real-time requirement. Synthesis results show that the processor occupies 13.39 mm 2 at a 204 MHz system clock and consumes a power of 419 mW at maximum kernel size at a 120 MHz system clock in SMIC 0.18 μm CMOS technology. Verification experiments on field programmable gate arrays (FPGAs) demonstrate that the processor is suitable for real-time image processing applications even for high-resolution images.
Similar content being viewed by others
References
Parmar, J.M., & Patil, S.A. (2013). Performance evaluation and comparison of modified denoising method and the local adaptive wavelet image denoising method. International Conference on Intelligent Systems and Signal Processing, 101–105.
Foi, A., & Boracchi, G. (2013). Anisotropically foveated nonlocal image denoising. In 2013 20th IEEE International Conference on Image Processing (ICIP) (pp. 464–468).
Zhu, Q., Zheng, D, Xiong, H. (2012). 3D tubular structure extraction using kernel-based superellipsoid model with Gaussian process regression. IEEE Visual Communications and Image Processing (VCIP), 1–6.
Letourneau, E., Verhaeghe, J., Reader, A.J. (2012). Impact of tracer distribution, count level, iterations and post-smoothing on PET quantification using a variously weighted least squares algorithm. IEEE Nuclear Science Symposium and Medical Imaging Conference, 2351–2353.
Hamarsheh, Q. (2012). Unified matrix processor design for FCT-based filtering, convolution and correlation of signals. Second International Conference on Digital Information and Communication Technology and its Applications, 293–299.
Chan, C., Fulton, R., Barnett, R., Feng, D.D., Meikle, S. (2014). Postreconstruction nonlocal means filtering of whole-body PET with an anatomical prior. IEEE Transactions on Medical Imaging, 33(3), 636–650.
Ok, A.O. (2014). A new approach for the extraction of aboveground circular structures from Near-Nadir VHR satellite imagery. IEEE Transactions on Geoscience and Remote Sensing, 52(6), 3125–3140.
Franchini, S., Gentile, A., Sorbello, F., Vassallo, G., Vitabile, S. (2013). A specialized architecture for color image edge detection based on clifford algebra, Seventh International Conference on Complex. Intelligent, and Software Intensive Systems (CISIS), 128–135.
Niclass, C., Soga, M., Matsubara, H., Ogawa, M., Kagami, M. (2014). A 0.18- μm CMOS SoC for a 100-m-range 10-frame/s 200 × 96-pixel time-of-flight depth sensor. IEEE Journal of Solid-State Circuits, 49(1), 315–330.
Talmon, R., Cohen, I., Gannot, S. (2013). Single-channel transient interference suppression with diffusion maps. IEEE Transactions on Audio, Speech, and Language Processing, 21(1), 132–144.
Zhang, J., Fu, N., Peng, X. (2014). Compressive circulant matrix based analog to information conversion. IEEE Signal Processing Letters, 21(4), 428–431.
Chen, W. (2014). Determination of displacement from an image sequence based on time-reversal invariance. IEEE Transactions on Geoscience and Remote Sensing, 52(5), 2575–2592.
Zamarreno-Ramos, C., Linares-Barranco, A., Serrano-Gotarredona, T., Linares-Barranco, B. (2013). Multicasting mesh AER: A scalable assembly approach for reconfigurable neuromorphic structured AER systems. Application to convNets, IEEE Transactions on Biomedical Circuits and Systems, 7(1), 82–102.
Li, W.X.Y., Cheung, R.C.C., Chan, R.H.M., Song, D., Berger, T.W. (2013). A reconfigurable architecture for real-time prediction of neural activity. IEEE International Symposium on Circuits and Systems, 1869–1872.
Roy, D. (2005). Machine vision: theory, algorithms, practicalities. Singapore: Elsevier.
Iandola, F.N., Sheffield, D., Anderson, M.J., Phothilimthana, P.M., Keutzer, K. (2013). Communication-minimizing 2D convolution in GPU registers, 20th IEEE International Conference on Image Processing (ICIP) (2116–2120).
Wang, X.X., & Shi, B.E. (2010). GPU implemention of fast Gabor filters. Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), 373–376.
Hartung, S., Shukla, H., Miller, J.P., Pennypacker, C. (2012). GPU acceleration of image convolution using spatially-varying kernel. 19th IEEE International Conference on Image Processing (ICIP), 1685–1688.
Krill, B., & Amira, A. (2011). Efficient reconfigurable architectures of generic cyclic convolution. IEEE 15th International Symposium on Consumer Electronics, 560–564.
Mohammad, K., & Agaian, S. (2009). Efficient FPGA implementation of convolution. IEEE International Conference on Systems, Man and Cybernetics, 3478–3483.
Vega-Rodriguez, M.A., Sanchez-Perez, J.M., Gomez-Pulido, J.A. (2004). An optimized architecture for implementing image convolution with reconfigurable hardware. Proceedings of the 2004 World Automation Congress, 16, 131–136.
Hashemi, M.R., & Eshghi, M. (2012). Design of a reconfigurable parallel convolver. 19th International Conference on Systems, Signals and Image Processing, 181–184.
Zhang, B., Mei, K., Zheng, N. (2013). Coarse-grained dynamically reconfigurable processor for vision pre-processing. Journal of Signal Processing Systems.
Zhang, H., Xia, M., Hu, G. (2007). A multiwindow partial buffering scheme for FPGA-Based 2-D convolvers. IEEE Transactions on Circuits and Systems II: Express Briefs, 54(2), 200–204.
Cardells-Tormo, F., & Molinet, P.L. (2006). Area-efficient 2-D shift-variant convolvers for FPGA-based digital image processing. IEEE Transactions on Circuits and Systems II: Express Briefs, 53(2), 105–109.
Ohsang Kwon., Nowka K. Swartzlander E.E. (2000). A 16-bit × 16-bit MAC design using fast 5:2 compressors. IEEE International Conference on Application-Specific Systems, Architectures, and Processors, 235–243.
Rao, D.V., & Patil, S. (2006). Implementation and evaluation of image processing algorithms on reconfigurable architecture using C-based hardware descriptive languages. International Journal of Engineering and Applied Computer Sciences, 1(1), 9–34.
Joginipelly, A., Varela, A., Charalampidis, D., Schott, R., Fitzsimmons, Z. (2012). Efficient FPGA implementation of steerable Gaussian smoothers. 44th Southeastern Symposium on System Theory (SSST), 78–82.
Elboher, E., & Werman, M. (2012). Efficient and accurate Gaussian image filtering using running sums. 12th International Conference on Intelligent Systems Design and Applications, 897–902.
Charalampidis, D. (2009). Efficient directional Gaussian smoothers. IEEE Geoscience and Remote Sensing Letters, 6(3), 383–387.
Chip-Hong, C., Jiangmin, G., Mingyan, Z. (2004). Ultra low-voltage low-Power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits. IEEE Transactions on Circuits and Systems-I, 51(10), 1985–1997.
Veeramachaneni, S., Krishna, M.K., Avinash, L., Puppala, S.R., Srinivas, M.B. (2007). Novel architectures for high-speed and low-power 3-2, 4-2 and 5-2 compressors, 6th International Conference on Embedded Systems., 20th International Conference on VLSI Design (324–329).
Alexey, L. (2011). A SIMD cellular processor array visionchip with asynchronous processing capabilities. IEEE Transactions on Circuits and Systems I: Regular Papers, 58(10), 2420–2431.
Wan-cheng, Z., Qiu-yu, F., Nan-jian, W. (2011). A programmable vision chip based on multiple levels of parallel processors. IEEE Journal of Solid-State Circuits, 46(9), 1–16.
Camunas-Mesa, L., Zamarreno-Ramos, C., Linares-Barranco, A., Acosta-Jimenez, A.J., Serrano-Gotarredona, T., Linares-Barranco, B. (2012). An event-driven multi-kernel convolution processor module for event-driven vision sensors. IEEE Journal of Solid-State Circuits, 47(2), 504–517.
Liu, Z., Song, Y., Shao, M., Li, S., Li, L., Ishiwata, S., Nakagawa, M., Goto, S., Ikenaga, T. (2009). HDTV1080p H.264/AVC encoder chip design and performance analysis. IEEE Journal of Solid-State Circuits, 44(2), 594–608.
Acknowledgements
This work is supported by Project funded by China Postdoctoral Science Foundation (2014M550492), National Natural Science Foundation of China (61231018), and Natural Science Basic Research Plan in Shaanxi Province of China (2013JQ8025).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rao, L., Zhang, B. & Zhao, J. Hardware Implementation of Reconfigurable 1D Convolution. J Sign Process Syst 82, 1–16 (2016). https://doi.org/10.1007/s11265-015-0969-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-015-0969-5