Abstract
We implement image correlation, a fundamental component of many real-time imaging and tracking systems, on a graphics processing unit (GPU) using NVIDIA’s CUDA platform. We use our code to analyze images of liquid-gas phase separation in a model colloid-polymer system, photographed in the absence of gravity aboard the International Space Station (ISS). Our GPU code is 4,000 times faster than simple MATLAB code performing the same calculation on a central processing unit (CPU), 130 times faster than simple C code, and 30 times faster than optimized C++ code using single-instruction, multiple-data (SIMD) extensions. The speed increases from these parallel algorithms enable us to analyze images downlinked from the ISS in a rapid fashion and send feedback to astronauts on orbit while the experiments are still being run.










Similar content being viewed by others
References
Alerstam, E., Svensson T., Andersson-Engels, S.: Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration. JBO Lett. 13, 060504 (2008). doi:10.1117/1.3041496
Anderson, J.A., Lorenz, C.D., Travesset, A.: General purpose molecular dynamics simulations fully implemented on graphics processing units. J. Comput. Phys. 227, 5342–5359 (2008). doi:10.1016/j.jcp.2008.01.047
Bailey, A.E., Poon, W.C.K., Christianson, R.J., Schofield, A.B., Gasser, U., Prasad, V., Manley, S., Segre, P.N., Cipelletti, L., Meyer, W.V., Doherty, M.P., Sankaran, S., Jankovsky, A.L., Shiley, W.L., Bowen, J.P., Eggers, J.C., Kurta, C., Lorik, Jr., T., Pusey, P.N., Weitz, D.A.: Spinodal decomposition in a model colloid–polymer mixture in microgravity. Phys. Rev. Lett 99, 205701 (2007). doi:10.1103/PhysRevLett.99.205701
Belleman, R.G., Bédorf, J., Portegies Zwart, S.F.: High performance direct gravitational N-body simulations on graphics processing units II: an implementation in CUDA. New Astron. 13, 103–112 (2008). doi:10.1016/j.newast.2007.07.004
Bik, A.J.C.: The Software Vectorization Handbook. Intel, Hillsboro (2004)
Bodnár, I., Dhont J.K.G., Lekkerkerker, H.N.W.: Pretransitional phenomena of a colloid polymer mixture studied with static and dynamic light scattering. J. Chem. Phys. 100, 19614–19619 (1996)
Bodnár, I., Oosterbaan, W.D.: Indirect determination of the composition of the coexisting phases in a demixed colloid polymer mixture. J. Chem. Phys. 106, 7777–7780 (1997)
Castaño-Díez, D., Mozer, D., Schoenegger, A., Pruggnaller S., Frangakis, A.S.: Performance evaluation of image processing algorithms on the GPU. J. Struct. Biol. 164, 153–160 (2008). doi:10.1016/j.jsb.2008.07.006
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68, 1370–1380 (2008). doi:10.1016/j.jpdc.2008.05.014
Christiansen, M.: Adobe After Effects 7.0 Studio Techniques. Peachpit, Berkeley (2006)
Fernando, R., Kilgard, M.J.: The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics. Addison-Wesley, Boston (2003)
Fraser F., Schewe, J.: Real World Camera Raw with Adobe Photoshop CS3. Peachpit, Berkeley (2008)
Furukawa, H.: A dynamic scaling assumption for phase separation. Adv. Phys. 34, 703–750 (1985)
Garland, M., Le Grand, S., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., Phillips, E., Zhang, Y., Volkov, V.: Parallel Computing Experiences with CUDA. IEEE Micro 28, 13–27 (2008)
Gumerov, N.A., Duraiswami, R.: Fast multipole methods on graphics processors. J. Comput. Phys. 227, 8290–8313 (2008). doi:10.1016/j.jcp.2008.05.023
Harris, C., Haines K., Staveley-Smith, L.: GPU accelerated radio astronomy signal convolution. Exp. Astron. 22, 129–141 (2008). doi:10.1007/s10686-008-9114-9
Ibrahim, K.Z., Bodin, F., Pène, O.: Fine-grained parallelization of lattice QCD kernel routine on GPUs. J. Parallel Distrib. Comput. 68, 1350–1359 (2008). doi:10.1016/j.jpdc.2008.06.009
Li, H., Kolpas, A., Petzold, L., Moehlis, J.: Parallel simulation for a fish schooling model on a general-purpose graphics processing unit. Concurr. Comput. Pract. Exp. (2008). doi:10.1002/cpe.1330
Liu, S., Li, P., Luo, Q.: Fast blood flow visualization of high-resolution laser speckle imaging data using graphics processing unit. Opt. Express 16, 14321–14329 (2008). doi:10.1364/OE.16.014321
Liu, W., Schmidt, B., Voss, G., Müller-Wittig, W.: Accelerating molecular dynamics simulation using Graphics Processing Units with CUDA. Comp. Phys. Comm. 179, 634–641 (2008). doi:10.1016/j.cpc.2008.05.008
Lozano, O.M., Otsuka, K.: Real-time Visual Tracker by Stream Processing. J. Signal Process. Syst. (2008). doi:10.1007/s11265-008-0250-2
Lu, P.J., Conrad, J.C., Wyss, H.M., Schofield, A.B., Weitz, D.A.: Fluids of Clusters in Attractive Colloids. Phys. Rev. Lett. 96, 028306 (2006). doi:10.1103/PhysRevLett.96.028306
Lu, P.J., Sims, P.A., Oki, H., Macarthur, J.B., Weitz, D.A.: Target-locking acquisition with real-time confocal (TARC) microscopy. Opt. Express 15, 8702–8712 (2007). doi:10.1364/OE.15.008702
Lu, P.J., Zaccarelli, E., Ciulla, F., Schofield, A.B., Sciortino, F., Weitz, D.A.: Gelation of particles with short-range attraction. Nature 453, 499–503 (2008). doi:10.1038/nature06931
Lu, P.J.: Gelation and Phase Separation of Attractive Colloids. Harvard University Ph.D. Thesis (2008)
Manavski, S.A., Valle, G.: CUDA compatible GPU cards as efficient hardware accelerators for Smith–Waterman sequence alignment. BCM Bioinf. 9(Suppl 2), S10 (2008). doi:10.1186/1471-2105-9-S2-S10
Marziale, L., Richard III, G.C., Roussev, V.: Massive threading: Using GPUs to increase the performance of digital forensics tools. Digital Investigation 4S, S73–S81 (2007). doi:10.1016/j.diin.2007.06.014
McCool, M., Du Toit, S.: Metaprogramming GPUs with Sh. Peters, Wellesley (2004)
Nguyen, H. (ed.): GPU Gems 3. Addison-Wesley, Upper Saddle River (2007)
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A.E., Purcell, T.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26, 80–113 (2007)
Pharr, M. (ed.): GPU Gems 2. Addison-Wesley, Upper Saddle River (2005)
Roeh, D.W., Kindratenko V.V., Brunner, R.J.: Accelerating cosmological data analysis with graphics processors. In Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. ACM, Washington (2009)
Ruiz, A., Ujaldon, M., Cooper, L., Huang, K.: Non-rigid Registration for Large Sets of Microscopic Images on Graphics Processors, J. Sign. Process. Syst. (2008) doi:10.1007/s11265-008-0208-4
Samant, S.S., Xia, J., Muyan-Özçelik, P., Owens, J.D.: High performance computing for deformable image registration: Towards a new paradigm in adaptive radiotherapy. Med. Phys. 35, 3546–3553 (2008). doi:10.1118/1.2948318
Schatz, M.C., Trapnell, C., Delcher, A.L., Varshney, A.: High-throughput sequence alignment using Graphics Processing Units. BCM Bioinformatics 8, 474 (2007). doi:10.1186/1471-2105-8-474
Schenk, O., Christen, M., Burkhart, H.: Algorithmic perfomance studies on graphics processing units. J. Parallel Distrib. Comput. 68, 1360–1369 (2008). doi:10.1016/j.jpdc.2008.05.008
Shimobaba, T., Ito, T., Masuda, N., Abe, Y., Ichihashi, Y., Nakayama, H., Takada, N., Shiraki, A., Sugie, T.: Numerical calculation library for diffraction integrals using the graphic processing unit: the GPU-based wave optics library. J. Opt. A: Pure Appl. Opt. 10, 075308 (2008). doi:10.1088/1464-4258/10/7/075308
Shimobaba, T., Sato, Y., Miura, J., Takenouchi, M., Ito, T.: Real-time digital holographic microscopy using the graphics processing unit. Opt. Express 16, 11776–11781 (2008). doi:10.1364/OE.16.011776
Sintorn, E., Assarsson, U.: Fast parallel GPU-sorting using a hybrid algorithm. J. Parallel Distrib. Comput. 68, 1381–1388 (2008). doi:10.1016/j.jpdc.2008.05.012
Stantchev, G., Dorland W., Gumerov, N.: Fast parallel Particle-To-Grid interpolation for plasma PIC simulations on the GPU. J. Parallel Distrib. Comput. 68, 1339–1349 (2008). doi:10.1016/j.jpdc.2008.05.009
Stone, J.E., Phillips, J.C., Freddolino, P.L., Hardy, D.J., Trabuco, L.G., Schulten, K.: Accelerating Molecular Modeling Applications with Graphics Processors. J. Comput. Chem. 28, 2618–2640 (2007). doi:10.1002/jcc.20829
Stone, S.S., Haldar, J.P., Tsao, S.C., Hwu, W.-m.W., Sutton, B.P., Liang, Z.-P.: Accelerating advanced MRI reconstructions on GPUs. J. Parallel Distrib. Comput. 68, 1307–1317 (2008). doi:10.1016/j.jpdc.2008.05.013
Taylor, S.: Intel Integrated Performance Primitives. Intel, Hillsboro, OR (2004)
Thibault, J.C., Senocak, I.: CUDA Implementation of a Navier–Stokes solver in multi-GPU desktop platforms for incompressible flows. In 47th AIAA Aerospace Sciences Meeting and Exhibit (2009)
Van Meel, J.A., Arnold, A., Frenkel, D., Portegies Zwart, S.F., Belleman, R.G.: Harvesting graphics power for MD simulations. Mol. Simulation 34, 259–266 (2008). doi:10.1080/08927020701744295
Wirawan, A., Kwoh, C.K., Hieu, N.T., Schmidt, B.: CBESW: sequence alignment on the Playstation 3. BCM Bioinf. 9 377 (2008). doi:10.1186/1471-2105-9-377
Zaccarelli, E., Lu, P.J., Ciulla, F., Weitz, D.A., Sciortino, F.: Gelation as arrested phase separation in short-ranged attractive colloid-polymer mixtures. J. Phys. Condens. Matter 20, 494242 (2008). doi:10.1088/0953-8984/20/49/494242
Acknowledgments
This work was supported by NASA grant NNX08AE09G and the NVIDIA professor partnership program. We thank A. Bik, J. Curley, A. Ghuloum, D. Luebke, E. Phillips, B. Saar, H. Saito, M. Schnubbel-Stutz, L. Vogt, and many helpful individuals throughout NASA and its contractors.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lu, P.J., Oki, H., Frey, C.A. et al. Orders-of-magnitude performance increases in GPU-accelerated correlation of images from the International Space Station. J Real-Time Image Proc 5, 179–193 (2010). https://doi.org/10.1007/s11554-009-0133-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-009-0133-1