Abstract
In this paper, we present faster than real-time implementation of a class of dense stereo vision algorithms on a low-power massively parallel SIMD architecture, the CSX700. With two cores, each with 96 Processing Elements, this SIMD architecture provides a peak computation power of 96 GFLOPS while consuming only 9 Watts, making it an excellent candidate for embedded computing applications. Exploiting full features of this architecture, we have developed schemes for an efficient parallel implementation with minimum of overhead. For the sum of squared differences (SSD) algorithm and for VGA (640 × 480) images with disparity ranges of 16 and 32, we achieve a performance of 179 and 94 frames per second (fps), respectively. For the HDTV (1,280 × 720) images with disparity ranges of 16 and 32, we achieve a performance of 67 and 35 fps, respectively. We have also implemented more accurate, and hence more computationally expensive variants of the SSD, and for most cases, particularly for VGA images, we have achieved faster than real-time performance. Our results clearly demonstrate that, by developing careful parallelization schemes, the CSX architecture can provide excellent performance and flexibility for various embedded vision applications.
Similar content being viewed by others
References
Who, M., Mahlke, S., Mudge, T., Chakrabarti, C.: Mobile supercomputers for the next-generation cell phone. IEEE Comput. 43(1), 81–85 (2010)
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47, 7–42 (2001)
van der Mark, W., Gavrila, D.M.: Real-time dense stereo for intelligent vehicles. IEEE Trans. Intell. Trans. Syst. 7(1), 38–50 (2006)
Di Stefano, L., Marchionni, M., Mattoccia, S.: A PC-based real-time stereo vision system. Int. J. Mach. Graphics Vis. 13(3), 197–220 (2004)
ClearSpeed Technology: ClearSpeed whitepaper: CSX processor architecture. http://www.clearspeed.com (2007)
Tilera Corporation, http://www.tilera.com/
Hosseini, F., Fijany, A., Safari, S., Chellali, R., Fontaine, J.-G.: Real-Time parallel implementation of SSD stereo vision algorithm on CSX SIMD architecture. 5th International Symposium on Advances in Visual Computing (ISVC’09), pp. 808–818 (2009)
McCullagh, B.: Real-time disparity map computation using the cell broadband engine. J. Real-Time Image Process. doi:10.1007/s11554-010-0155-8
Yang, R., Pollefeys, M.: A versatile stereo implementation on commodity graphics hardware. J. Real-Time Imaging 11(1), 7–18 (2005)
Zhu, K., Butenuth, M., d’Angelo, P.: Comparison of dense stereo using CUDA. In: Workshop of Computer Vision on GPUs (CVGPU) in Conjunction with ECCV, on CD (2010)
Chang, N., Lin ,T.-M., Tsai, T.-H., Tseng, Y.-C., Chang, T.-S.: Real-time DSP implementation on local stereo matching. In: IEEE International Conference on Multimedia and Expo, pp. 2090–2093 (2007)
Jia, Y., Zhang, X., Li, M., An, L.: A miniature stereo vision machine (MSVM-III) for dense disparity mapping. 17th Int. Conf. Pattern Recognit. 1, 728–731 (2004)
Georgoulas, C., Andreadis, I.: A real-time fuzzy hardware structure for disparity map computation. J. Real-Time Image Process. (2010). doi:10.1007/s11554-010-0157-6
Woodfill, J.I., Gordon, G., Buck, R.: Tyzx DeepSea high speed stereo vision system.In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 41–45 (2004)
Kuhn, M., Moser, S., Isler, O., Gurkaynak, F.K., Burg, A., Felber, N., Kaeslin, H., Fichtner, W.: Efficient ASIC implementation of a real-time depth mapping stereo vision system. IEEE Midwest Symp. Circuits Syst. 3, 1478–1481 (2003)
Ambrosch, K., Humenberger, M., Kubinger, W., Steininger, A.: Hardware implementation of an SAD based stereo vision algorithm. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–6 (2007)
Hirschmüller, H., Innocent, P.R., Garibaldi, J.: Real-time correlation-based stereo vision with reduced border errors, Int. J. Comput. Vis. 47(1–3), 229–246 (2002)
ClearSpeed Technology: CSX600 Hardware Programming Manual, Jan 2008, document No. 06-RM-1305 Revision:1.A. http://www.clearspeed.com
ClearSpeed Technology, CSX600/CSX700 Instruction Set Reference Manual, Aug 2008, document No. 06-RM-1137 Revision: 4.A. http://www.clearspeed.com,
Heuveline, V., Weiß, J.-P.: Lattice boltzmann methods on the clearspeed advance™ accelerator board. Eur. Phys. J. Special Top. 171(1), 31–36 (2009)
Soviany, C.: Embedding data and task parallelism in image processing applications, Ph.D. thesis, Delft University of Technology, Netherlands (2003)
ClearSpeed Technology, Visual Profiler, Feb 2008, document No. 06-RM-1136 Revision:4.B. http://www.clearspeed.com
Scharstein, D., Szeliski, R.: http://vision.middlebury.edu/stereo/
Hosseini, F., Fijany, A., Fontaine, J.-G.: Highly parallel implementation of Harris corner detector on CSX SIMD architecture. In: Proceeding of 4th Workshop on Highly Parallel Processing on a Chip (HPPC’10) in conjunction with Euro-par (2010)
Fijany, A., Hosseini, F.: Image processing applications on a low power highly parallel SIMD architecture.In: IEEE Aerospace Conference, pp. 1–12 (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hosseini, F., Fijany, A., Safari, S. et al. Fast implementation of dense stereo vision algorithms on a highly parallel SIMD architecture. J Real-Time Image Proc 8, 421–435 (2013). https://doi.org/10.1007/s11554-011-0211-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-011-0211-z