Skip to main content
Log in

Highly efficient image registration for embedded systems using a distributed multicore DSP architecture

  • Original Research paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

We present a complete approach to highly efficient image registration for embedded systems, covering all steps from theory to practice. An optimization-based image registration algorithm using a least-squares data term is implemented on an embedded distributed multicore digital signal processor (DSP) architecture. All relevant parts are optimized, ranging from mathematics, algorithmics, and data transfer to hardware architecture and electronic components. The optimization for the rigid alignment of two-dimensional images is performed in a multilevel Gauss–Newton minimization framework. We propose a reformulation of the necessary derivative computations, which eliminates all sparse matrix operations and allows for parallel, memory-efficient computation. The pixelwise parallellism forms an ideal starting point for our implementation on a multicore, multichip DSP architecture. The reduction of data transfer to the particular DSP chips is key for an efficient calculation. By determining worst cases for the subimages needed on each DSP, we can substantially reduce data transfer and memory requirements. This is accompanied by a sophisticated padding mechanism that eliminates pipeline hazards and speeds up the generation of the multilevel pyramid. Finally, we present a reference hardware architecture consisting of four TI C6678 DSPs with eight cores each. We show that it is possible to register high-resolution images within milliseconds on an embedded device. In our example, we register two images with 4096 × 4096 pixels within 93 ms, while off-loading the CPU by a factor of 20 and requiring 3.12 times less electrical energy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. The Jacobian is a derivative to the transformation parameters \(w\) and should not be confused with the image gradient obtained, e.g., by the Sobel operator. The same applies to the (approximated) Hessian.

  2. e.g., 4096 × 4096 px on 1 DSP: Ethernet measurement 1058 ms, PCIe prediction 211 ms, PCIe measurement 202 ms.

References

  1. Advantech (2013) DSPC-8681—half-length PCI express card with 4 TMS320C6678 DSPs. http://downloadt.advantech.com/ProductFile/PIS/DSPC-8681/Product%20-%20Datasheet/DSPC-8681_DS(03.31.14)20140519134025.pdf

  2. Alavi, A., et al.: Is PET-CT the only option? Eur. J. Nucl. Med. Mol. Imag. 34, 819–821 (2007)

    Article  Google Scholar 

  3. Brown, L.G.: A survey of image registration techniques. ACM Comput. Surv. 24(4), 325–376 (1992)

    Article  Google Scholar 

  4. Capek, K.: Optimisation strategies applied to global similarity based image registration methods. In: International Conferences in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG), vol 2, 369–374 (1999)

  5. Castro-Pareja, C.R., Jagadeesh, J.M., Shekhar, R.: FAIR: a hardware architecture for real-time 3-D image registration. IEEE Trans. Inf. Technol. Biomed 7(4), 426–434 (2003)

    Article  Google Scholar 

  6. Dennis, J.J.E., Schnabel, R.B.: Numerical methods for unconstrained optimization and nonlinear equations. SIAM (1983)

  7. Evans, J.R., Arslan, T.: The implementation of an evolvable hardware system for real time image registration on a system-on-chip platform. In: Evolvable Hardware, 2002. Proceedings. NASA/DoD Conference on, IEEE, 142–146 (2002)

  8. Eyre, J., Bier, J.: The evolution of DSP processors. IEEE Signal Process. Mag 17(2), 43–51 (2000)

    Article  Google Scholar 

  9. Fischer, B., Modersitzki, J.: Ill-posed medicine—an introduction to image registration. Inverse Problems 24(3):034,008 (2008)

  10. Geronimo, D., Lopez, A.M., Sappa, A.D., Graf, T.: Survey of pedestrian detection for advanced driver assistance systems. IEEE Trans. Pattern Anal. Mach. Intell 32(7), 1239–1258 (2010)

    Article  Google Scholar 

  11. Gigengack, F., Ruthotto, L., Burger, M., Wolters, C.H., Jiang, X., Schafers, K.P.: Motion correction in dual gated cardiac PET using mass-preserving image registration. IEEE Trans. Med. Imag31(3), 698–712 (2012)

    Article  Google Scholar 

  12. Gonzalez, R.C., Woods, R.E.: Digital Imag. Process., vol 2. Addison-Wesley (1992)

  13. Haber, E., Modersitzki, J.: A multilevel method for image registration. SIAM J. Sci. Compu 27(5), 1594–1607 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  14. Haber, E., Modersitzki, J.: Intensity gradient based registration and fusion of multi-modal images. Methods Inf. Med 46, 292–9 (2007)

    Google Scholar 

  15. Hossny, M., Nahavandi, S., Creighton, D., Bhatti, A.: Towards autonomous image fusion. In: Control Automation Robotics and Vision (ICARCV), 2010 11th International Conference on, IEEE, 1748–1754 (2010)

  16. Intel Corporation Desktop 3rd generation Intel Core processor family, desktop Intel Pentium processor family, and desktop Intel Celeron processor family. http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/3rd-gen-core-desktop-vol-1-datasheet.pdf (2013)

  17. Irani, M., Peleg, S.: Improving resolution by image registration. CVGIP: Graph. Models Imag. process53(3), 231–239 (1991)

    Google Scholar 

  18. Kabus, S., Lorenz, C.: Fast elastic image registration. Grand Challenges in Medical Image Analysis, 81–89 (2010)

  19. Karam, L.J., AlKamal, I., Gatherer, A., Frantz, G.A., Anderson, D.V., Evans, B.L.: Trends in multicore DSP platforms. IEEE Signal Process. Mag 26(6), 38–49 (2009)

    Article  Google Scholar 

  20. Kessler, C.W.: Compiling for VLIW DSPs. In: Handbook of Signal Processing Systems, Springer, 1177–1214 (2013)

  21. König, L., Rühaak, J.: A fast and accurate parallel algorithm for non-linear image registration using normalized gradient fields. In: Biomedical Imaging (ISBI), 2014 IEEE 11th International Symposium on, IEEE, 580–583 (2014)

  22. Kontron, A.G.: Infotainment POS/POI. http://www.kontron.com/resources/collateral/industry_brochures/pos_poi_2010_global_single.pdf(2009)

  23. Kontron, A.G.: Embedded computer solutions for advanced automation control. http://www.kontron.com/resources/collateral/industry_brochures/folder_automation_2013.pdf (2013)

  24. Leon, F.P., Kammel, S.: Image fusion techniques for robust inspection of specular surfaces. In: AeroSense 2003, International Society for Optics and Photonics, 77–86 (2003)

  25. Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P.: Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imag 16(2), 187–198 (1997)

    Article  Google Scholar 

  26. Mahapatra, N.R., Venkatrao, B.: The processor-memory bottleneck: problems and solutions. Crossroads 5(3es):2 (1999)

  27. Mattes, D., Haynor, D.R., Vesselle, H., Lewellen, T.K., Eubank, W.: PET-CT image registration in the chest using free-form deformations. IEEE Trans. Med. Imag 22(1), 120–128 (2003)

    Article  Google Scholar 

  28. Modersitzki, J.: Numerical methods for image registration. Oxford University Press (2004)

  29. Modersitzki, J.: FAIR—Flexible algorithms for image registration. SIAM, Philadelphia (2009)

    Book  MATH  Google Scholar 

  30. Mueller, B., Olesch, J., Lotz, J., Barendt, S., Sedlaczek, O., Lahrmann, B., Grabe, N., Bestvater, F., Kauczor, U., Schnabel, P., Hoffmann, H., Fischer, B., Schirmacher, P., Warth, A., Breuhahn, K.: 3D reconstruction of lung adenocarcinomas—one module for the development of mathematical multiscale models of lung cancer. Der Pathologe 34(1), 140 (2013)

    Google Scholar 

  31. Nocedal, J., Wright, S.: Numerical optimization, 2nd edn. Springer, Berlin, Heidelberg (2006)

    MATH  Google Scholar 

  32. Reed, J.M., Hutchinson, S.: Image fusion and subpixel parameter estimation for automated optical inspection of electronic components. IEEE Trans. Indus. Electr 43(3), 346–354 (1996)

    Article  Google Scholar 

  33. Remagnino, P., Jones, G.: Automated registration of surveillance data for multi-camera fusion. In: Information Fusion, 2002. Proceedings of the Fifth International Conference on, IEEE, vol 2, 1190–1197 (2002)

  34. Rühaak, J., Heldmann, S., Kipshagen, T., Fischer, B.: Highly accurate fast lung CT registration. In: SPIE Medical Imaging, International Society for Optics and Photonics (2013)

  35. Rühaak, J., König, L., Hallmann, M., Papenberg, N., Heldmann, S., Schumacher, H., Fischer, B.: A fully parallel algorithm for multimodal image registration using normalized gradient fields. In: Biomedical Imaging (ISBI), 2013 IEEE 10th International Symposium on, 572–575 (2013)

  36. Saban, N.: Multicore DSP vs GPUs. http://www.sagivtech.com/contentManagment/uploadedFiles/fileGallery/Multi_core_DSPs_vs_GPUs_TI_for_distribution.pdf (2011)

  37. Schmitt, O., Modersitzki, J., Heldmann, S., Wirtz, S., Fischer, B.: Image registration of sectioned brains. Intern. J. Comp. Vision 73(1), 5–39 (2007)

    Article  Google Scholar 

  38. Sen, M., Hemaraj, Y., Plishker, W., Shekhar, R., Bhattacharyya, S.S.: Model-based mapping of reconfigurable image registration on FPGA platforms. J. Real-time Imag. Process 3(3), 149–162 (2008)

    Article  Google Scholar 

  39. Stotzer, E., Jayaraj, A., Ali, M., Friedmann, A., Mitra, G., Rendell, A., Lintault, I.: OpenMP on the low-power TI keystone II ARM/DSP system-on-chip. In: Rendell, A., Chapman, B., Müller, M. (eds.) OpenMP in the Era of Low Power Devices and Accelerators. Lecture Notes in Computer Science, vol 8122, 114–127. Springer, Berlin Heidelberg (2013)

  40. Texas Instruments: AM335x sitara processors. http://www.ti.com/lit/ds/symlink/am3359.pdf (2013)

  41. Texas Instruments: AM335x starter kit. http://www.ti.com/tool/tmdssk3358 (2014a)

  42. Texas Instruments: C6678 power consumption model (rev. d). http://www.ti.com/litv/zip/sprm545d (2014b)

  43. Texas Instruments: SYS/BIOS (TI-RTOS kernel) v6.40. http://www.ti.com/lit/ug/spruex3n/spruex3n.pdf (2014c)

  44. Texas Instruments: TMS320C6678 - multicore fixed and floating-point digital signal processor. http://www.ti.com/lit/ug/spruex3n/spruex3n.pdf (2014d)

  45. Texas Instruments : TMS320C6678 evaluation modules. www.ti.com/tool/tmdsevm6678 (2014e)

  46. Tramnitzke, F., Rühaak, J., König, L., Modersitzki, J., Köstler, H.: GPU Based Affine Linear Image Registration using Normalized Gradient Fields. In: Proc. Seventh International Workshop on High Performance Computing for Biomedical Image Analysis (HPC-MICCAI), Boston, MA, USA (2014)

  47. Vercauteren, T., Pennec, X., Perchant, A., Ayache, N.: Diffeomorphic demons: efficient non-parametric image registration. NeuroImage 45(1), S61–S72 (2009)

    Article  Google Scholar 

  48. Viola, P., Wells III, W.M.: Alignment by maximization of mutual information. Intern. J. Comp. Vision 24(2), 137–154 (1997)

    Article  Google Scholar 

  49. Wu, H., Kim, Y.: Fast wavelet-based multiresolution image registration on a multiprocessing digital signal processor. Intern. J. Imag. Syst. Technol. 9(1), 29–37 (1998)

    Article  Google Scholar 

  50. Zitová, B., Flusser, J.: Image registration methods: a survey. Imag. Vision Compu. 21(11), 977–1000 (2003)

    Article  Google Scholar 

Download references

Acknowledgments

The software created during this work is open source and can be accessed at http://www.github.com/RoelofBerg/fimreg.

In deep sorrow, we commemorate Prof. Dr. rer. nat. Bernd Fischer who passed away during the creation of this paper. Our thoughts are with his family.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roelof Berg.

Additional information

B. Fischer deceased during the creation of this paper.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Berg, R., König, L., Rühaak, J. et al. Highly efficient image registration for embedded systems using a distributed multicore DSP architecture. J Real-Time Image Proc 14, 341–361 (2018). https://doi.org/10.1007/s11554-014-0457-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-014-0457-3

Keywords

Navigation