GPU-Based Iterative Medical CT Image Reconstructions

Yu, Xiaodong; Wang, Hao; Feng, Wu-chun; Gong, Hao; Cao, Guohua

doi:10.1007/s11265-018-1352-0

GPU-Based Iterative Medical CT Image Reconstructions

Published: 08 March 2018

Volume 91, pages 321–338, (2019)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Xiaodong Yu¹,
Hao Wang¹,
Wu-chun Feng¹,
Hao Gong² &
…
Guohua Cao²

650 Accesses
17 Citations
Explore all metrics

Abstract

The algebraic reconstruction technique (ART) is an iterative algorithm for CT (i.e., computed tomography) image reconstruction that delivers better image quality with less radiation dosage than the industry-standard filtered back projection (FBP). However, the high computational cost of ART requires researchers to turn to high-performance computing to accelerate the algorithm. Alas, existing approaches for ART suffer from inefficient design of compressed data structures and computational kernels on GPUs. Thus, this paper presents our CUDA-based CT image reconstruction tool based on the algebraic reconstruction technique (ART) or cuART. It delivers a compression and parallelization solution for ART-based image reconstruction on GPUs. We address the under-performing, but popular, GPU libraries, e.g., cuSPARSE, BRC, and CSR5, on the ART algorithm and propose a symmetry-based CSR format (SCSR) to further compress the CSR data structure and optimize data access for both SpMV and SpMV_T via a column-indices permutation. We also propose sorting-based global-level and sorting-free view-level blocking techniques to optimize the kernel computation by leveraging different sparsity patterns of the system matrix. The end result is that cuART can reduce the memory footprint significantly and enable practical CT datasets to fit into a single GPU. The experimental results on a NVIDIA Tesla K80 GPU illustrate that our approach can achieve up to 6.8x, 7.2x, and 5.4x speedups over counterparts that use cuSPARSE, BRC, and CSR5, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

The sorting-based global-level blocking is easy to be implemented, while the sorting-free view-level blocking delivers faster preprocessing time and less data padding and can also enable the adapted algorithm to converge faster.
This kernel leverages the merits of our SCSR format and blocking techniques to provide significant performance improvements.

References

IMV Medical Information Division. (2007). IMV 2006 CT Market Summary Report Table of Contents.
Gordon, R., Bender, R., Herman, G. (1970). Algebraic reconstruction techniques (art) for three-dimensional electron microscopy and x-ray photography. Journal of Theoretical Biology, 29(3), 471–481.
Article Google Scholar
Guan, H., & Gordon, R. (2005). A projection access order for speedy convergence of art (algebraic reconstruction technique): A multilevel scheme for computed tomography. Physics in Medicine and Biology, 39(11), 1994.
Google Scholar
Mueller, K., Yagel, R., Cornhill, J.F. (1997). The weighted-distance scheme: a globally optimizing projection ordering method for art. IEEE Transactions on Medical Imaging, 16(2), 223–230.
Article Google Scholar
Zhang, S., Zhang, D., Gong, H., Ghasemalizadeh, O., Wang, G., Cao, G. (2014). Fast and accurate computation of system matrix for area integral model-based algebraic reconstruction technique. Optical Engineering, 53(11), 113101:1–113101:9.
Google Scholar
Laurent, C., Peyrin, F., Chassery, J.-M., Amiel, M. (1998). Parallel image reconstruction on mimd computers for three-dimensional cone-beam tomography. Parallel Computing, 24(9), 1461–1479.
Article Google Scholar
Melvin, C. (2006). Design, Development and Implementation of a Parallel Algorithm for Computed Tomography Using Algebraic Reconstruction Technique. Canadian theses. University of Manitoba (Canada).
Grüll, F., Kunz, M., Hausmann, M., Kebschull, U. (2012). An implementation of 3d electron tomography on fpgas. In 2012 International Conference on Reconfigurable Computing and FPGAs (ReConFig) (pp. 1–5).
Pang, W.-M., Qin, J., Lu, Y., Xie, Y., Chui, C.-K., Heng, P.-A. (2011). Accelerating simultaneous algebraic reconstruction technique with motion compensation using cuda-enabled gpu. International Journal of Computer-Assisted Radiology and Surgery, 6(2), 187–199.
Article Google Scholar
Zhao, X., Hu, J.-J., Yang, T. (2013). Gpu-based iterative cone-beam ct reconstruction using empty space skipping. Journal of X-ray Science and Technology, 21(1), 53–69.
Google Scholar
Liu, R., Luo, Y., Yu, H. (2014). Gpu-based acceleration for interior tomography. IEEE Access, 2, 757–770.
Article Google Scholar
Guo, M., & Gao, H. (2017). Memory-efficient algorithm for stored projection and backprojection matrix in helical ct. Medical Physics, 44(4), 1287–1300.
Article Google Scholar
Ashari, A., Sedaghati, N., Eisenlohr, J., Sadayappan, P. (2014). An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on gpus. In 28th ACM Int’l Conf. on Supercomputing (pp. 273–282).
Buluç, A., Fineman, J., Frigo, M., Gilbert, J., Leiserson, C. (2009). Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In 21st ACM Symposium on Parallelism in Algorithms and Architectures (pp. 233–244).
Liu, W., & Vinter, B. (2015). Csr5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In 29th ACM Int’l Conf. on Supercomputing, ICS ’15 (pp. 339–350).
Yu, X., Wang, H., Feng, W.-C., Gong, H., Cao, G. (2016). cuart: Fine-grained algebraic reconstruction technique for computed tomography images on gpus. In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (pp. 165–168).
Yu, X., Wang, H., Feng, W.-C., Gong, H., Cao, G. (2017). An enhanced image reconstruction tool for computed tomography on gpus. In Proceedings of the Computing Frontiers Conference, CF’17 (pp. 97–106): ACM.
Kak, A.C. (1984). Image Reconstruction from Projections. In Ekstrom, M. (Ed.) Digital Image Processing Techniques, chapter 4, (pp. 111–171). Orlando: Academic Press, INC.
Gilbert, P. (1972). Iterative methods for the three-dimensional reconstruction of an object from projections. Journal of theoretical biology, 36(1), 105–117.
Article Google Scholar
Andersen, A.H., & Kak, A.C. (1984). Simultaneous algebraic reconstruction technique (sart): a superior implementation of the art algorithm. Ultrasonic Imaging, 6(1), 81–94.
Article Google Scholar
Liu, W., & Vinter, B. (2015). Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors. Parallel Computing, 49, 179–193.
Article MathSciNet Google Scholar
Yan, S., Li, C., Zhang, Y., Zhou, H. (2014). yaspmv: Yet another spmv framework on gpus. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14 (pp. 107–118): ACM.
Ashari, A., Sedaghati, N., Eisenlohr, J., Parthasarath, S., Sadayappan, P. (2014). Fast sparse matrix-vector multiplication on gpus for graph applications. In SC14 (pp. 781–792).
Greathouse, J., & Daga, M. (2014). Efficient sparse matrix-vector multiplication on gpus using the csr storage format. In SC14 (pp. 769–780).
Merrill, D., & Garland, M. (2016). Merge-based sparse matrix-vector multiplication (spmv) using the csr storage format. In 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’16 (pp. 43:1–43:2).
Steinberger, M., Zayer, R., Seidel, H.-P. (2017). Globally homogeneous, locally adaptive sparse matrix-vector multiplication on the gpu. In Proceedings of the International Conference on Supercomputing, ICS ’17 (pp. 13:1–13:11). New York: ACM.
Hou, K., Feng, W.-C., Che, S. (2017). Auto-tuning strategies for parallelizing sparse matrix-vector (spmv) multiplication on multi- and many-core processors. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (pp. 713–722).
Wang, H., Liu, W., Hou, K., Feng, W.-C. (2016). Parallel transposition of sparse data structures. In Proceedings of the International Conference on Supercomputing, ICS ’16 (p. 2016).
Nourian, M., Wang, X., Yu, X., Feng, W.-C., Becchi, M. (2017). Demystifying automata processing: Gpus, fpgas or micron’s ap?. In Proceedings of the International Conference on Supercomputing, ICS ’17 (pp. 1:1–1:11). New York: ACM.
Hou, K., Wang, H., Feng, W.-C. (2015). Aspas: A framework for automatic simdization of parallel sorting on x86-based many-core processors, (pp. 383–392). New York: ACM.
Google Scholar
Yu, X., Hou, K., Wang, H., Feng, W.-C. (2017). A framework for fast and fair evaluation of automata processing hardware. In 2017 IEEE International Symposium on Workload Characterization (IISWC) (pp. 120–121).
Yu, X., Hou, K., Wang, H., Feng, W.-C. (2017). Robotomata: A framework for approximate pattern matching of big data on an automata processor. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 283–292).
Yu, X., Lin, B., Becchi, M. (2014). Revisiting state blow-up: Automatically building augmented-fa while preserving functional equivalence. IEEE Journal on Selected Areas in Communications, 32(10), 1822–1833.
Article Google Scholar
Yu, X., Feng, W.-C., Yao, D., Becchi, M. (2016). O3fa: A scalable finite automata-based pattern-matching engine for out-of-order deep packet inspection. In 2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS) (pp. 1–11).
Yu, X., & Becchi, M. (2013). Gpu acceleration of regular expression matching for large datasets: Exploring the implementation space. In ACM Int’l Conf. on Computing Frontiers, CF ’13 (pp. 18:1–18:10). New York: ACM.
Zhang, J., Wang, H., Feng, W.-C. (2015). cublastp: Fine-grained parallelization of protein sequence search on cpu+gpu. IEEE/ACM Transactions on Computational Biology and Bioinformatics, PP(99), 1–1.
Article Google Scholar
Yu, X., & Becchi, M. (2013). Exploring different automata representations for efficient regular expression matching on gpus. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13 (pp. 287–288). New York: ACM.
Hou, K., Liu, W., Wang, H., Feng, W.-C. (2017). Fast segmented sort on gpus. In Proceedings of the International Conference on Supercomputing, ICS ’17 (pp. 12:1–12:10). New York: ACM.
Yu, X. (2013). Deep packet inspection on large datasets: Algorithmic and parallelization techniques for accelerating regular expression matching on many-core processors. Master’s thesis, University of Missouri–Columbia.
Keck, B., Hofmann, H., Scherl, H., Kowarschik, M., Hornegger, J. (2009). Gpu-accelerated sart reconstruction using the cuda programming environment. In SPIE Medical Imaging (pp. 72582B–72582B): International Society for Optics and Photonics.
Naumov, M., Chien, L.S., Vandermersch, P., Kapasi, U. (2010). cusparse library. In GPU Technology Conference.
Aktulga, H.M., Buluç, A., Williams, S., Yang, C. (2014). Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium (pp. 1213–1222).
Nickolls, J., Buck, I., Garland, M., Skadron, K. (2008). Scalable parallel programming with cuda. Queue, 6(2), 40–53.
Article Google Scholar
Tao, Y., Deng, Y., Mu, S., Zhang, Z., Zhu, M., Xiao, L., Ruan, L. (2015). Gpu accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication. Concurrency and Computation: Practice and Experience, 27(14), 3771–3789.
Article Google Scholar
Shepp, L.A., & Logan, B.F. (1974). The fourier reconstruction of a head section. IEEE Transactions on Nuclear Science, 21(3), 21–43.
Article Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Article Google Scholar
Xu, Q., Yu, H., Mou, X., Zhang, L., Hsieh, J., Wang, G. (2012). Low-dose x-ray ct reconstruction via dictionary learning. IEEE Transactions on Medical Imaging, 31(9), 1682–1697.
Article Google Scholar
Du, Y., Wang, X., Xiang, X., Wei, Z. (2016). Evaluation of hybrid SART+OS+TV iterative reconstruction algorithm for optical-CT gel dosimeter imaging. Physics in Medicine & Biology, 61(24), 8425.
Article Google Scholar
Garduño, E., Herman, G.T., Davidi, R. (2011). Reconstruction from a few projections by 1 -minimization of the Haar transform. Inverse Problems, 27(5), 055006.
Article MathSciNet MATH Google Scholar
Vandeghinste, B., Goossens, B., Van Holen, R., Vanhove, C., Piurica, A., Vandenberghe, S., Staelens, S. (2013). Iterative ct reconstruction using shearlet-based regularization. IEEE Transactions on Nuclear Science, 60(5), 3305–3317.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
Xiaodong Yu, Hao Wang & Wu-chun Feng
Department of Biomedical Engineering and Mechanics, Virginia Tech, Blacksburg, VA, USA
Hao Gong & Guohua Cao

Authors

Xiaodong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wu-chun Feng
View author publications
You can also search for this author in PubMed Google Scholar
Hao Gong
View author publications
You can also search for this author in PubMed Google Scholar
Guohua Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodong Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, X., Wang, H., Feng, Wc. et al. GPU-Based Iterative Medical CT Image Reconstructions. J Sign Process Syst 91, 321–338 (2019). https://doi.org/10.1007/s11265-018-1352-0

Download citation

Received: 17 August 2017
Revised: 08 December 2017
Accepted: 13 February 2018
Published: 08 March 2018
Issue Date: March 2019
DOI: https://doi.org/10.1007/s11265-018-1352-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GPU-Based Iterative Medical CT Image Reconstructions

Abstract

Access this article

Similar content being viewed by others

GPU-accelerated iterative reconstruction for limited-data tomography in CBCT systems

Fast reconstruction of 3D volumes from 2D CT projection data with GPUs

FL-MISR: fast large-scale multi-image super-resolution for computed tomography based on multi-GPU acceleration

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GPU-Based Iterative Medical CT Image Reconstructions

Abstract

Access this article

Similar content being viewed by others

GPU-accelerated iterative reconstruction for limited-data tomography in CBCT systems

Fast reconstruction of 3D volumes from 2D CT projection data with GPUs

FL-MISR: fast large-scale multi-image super-resolution for computed tomography based on multi-GPU acceleration

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation