skip to main content
10.1145/2488551.2488589acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections

Parallel implementation of a X-ray tomography reconstruction algorithm based on MPI and CUDA

Published: 15 September 2013 Publication History


Most small-animal X-ray computed tomography (CT) scanners are based on cone-beam geometry with a flat-panel detector orbiting in a circular trajectory. Image reconstruction in these systems is usually performed by approximate methods based on the algorithm proposed by Feldkamp, Davis and Kress (FDK). Currently there is a strong need to speedup the reconstruction of X-Ray CT data in order to extend its clinical applications. The evolution of the semiconductor detector panels has resulted in an increase of detector elements density, which produces a higher amount of data to process. This work focuses on future high-resolution studies (density up to 4096 pixeles), in which multiple level of parallelism will be needed in the reconstruction. In addition, this paper addresses the future challenges of processing high-resolution images in many-core and distributed architectures. In our evaluation section we demonstrate that our solution is 17% faster than recent related works.


C T Badea, M Drangova, D W Holdsworth, and G A Johnson. In vivo small-animal imaging using micro-CT and digital subtraction angiography. Physics in Medicine and Biology, 53(19):R319, 2008.
L. A. Feldkamp, L. C. Davis, and J. W. Kress. Practical cone-beam algorithm. J. Opt. Soc. Am. A, 1(6):612--619, Jun 1984.
A. C. Kak and Malcolm Slaney. Principles of Computerized Tomographic Imaging. IEEE Press, 1998. available online at
Daren Lee, Ivo Dinov, Bin Dong, Boris Gutman, Igor Yanovsky, and Arthur W. Toga. Cuda optimization strategies for compute- and memory-bound neuroimaging algorithms. Computer Methods and Programs in Biomedicine, 106(3):175--187, 2012.
W. B. Ligon and R. B. Ross. An Overview of the Parallel Virtual File System. In Proceedings of the Extreme Linux Workshop, June 1999.
Message Passing Interface Forum. MPI2: Extensions to the Message Passing Interface, 1997.
S. Mukherjeet, N. Moore, J. Brock, and M. Leeser. Cuda and opencl implementations of 3d ct reconstruction for biomedical imaging. In IEEE Conference on High Performance Extreme Computing (HPEC), 2012, pages 1--6, 2012.
NVIDIA Corporation. NVIDIA CUDA Compute Unified Device Architecture Programming Guide. NVIDIA Corporation, 2007.
E. Papenhausen, Z. Zheng, and K. Mueller. GPU-accelerated back-projection revisited: Squeezing performance by careful tuning. In Workshop on High Performance Image Reconstruction (HPIR), pages 19--22, 2011.
Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, and Wen-mei W. Hwu. Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, PPoPP '08, pages 73--82, New York, NY, USA, 2008. ACM.
Dana Schaa and David Kaeli. Exploring the multiple-gpu design space. In IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing, pages 1--12, Washington, DC, USA, 2009. IEEE Computer Society.
Holger Scherl, Markus Kowarschik, Hannes G. Hofmann, Benjamin Keck, and Joachim Hornegger. Evaluation of state-of-the-art hardware architectures for fast cone-beam ct reconstruction. Parallel Computing, 38(3):111--124, 2012.
J. J. Vaquero, S. Redondo, E. Lage, M. Abella, A. Sisniega, G. Tapias, M. L. S. Montenegro, and M. Desco. Assessment of a New High-Performance Small-Animal X-Ray Tomograph. IEEE Transactions on Nuclear Science, 55(3):898--905, june 2008.
Fang Xu and Klaus Mueller. Real-time 3D computed tomographic reconstruction using commodity graphics hardware. Physics in Medicine and Biology, 52(12):3405, 2007.
Hanming Zhang, Bin Yan, Lizhong Lu, Lei Li, and Yongjun Liu. High performance parallel backprojection on multi-gpu. In 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2012, pages 2693--2696, 2012.
Xing Zhao, Jing-Jing Hu, and Peng Zhang. GPU-based 3D cone-beam CT image reconstruction for large data volume. Journal of Biomedical Imaging, 2009:8:1--8:8, January 2009.
Yining Zhu, Yunsong Zhao, and Xing Zhao. A multi-thread scheduling method for 3d ct image reconstruction using multi-gpu. Journal of X-Ray Science and Technology, 20(2):187--197, 01 2012.

Cited By

View all
  • (2019)iFDKProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356163(1-24)Online publication date: 17-Nov-2019
  • (2015)A comparative study of an X-ray tomography reconstruction algorithm in accelerated and cloud computing systemsConcurrency and Computation: Practice & Experience10.1002/cpe.359927:18(5538-5556)Online publication date: 25-Dec-2015
  • (2014)Three-Level Parallelism for FDK Algorithm Using Multi-GPU Based Cluster SystemProceedings of the 2014 IEEE 13th International Symposium on Parallel and Distributed Computing10.1109/ISPDC.2014.28(184-188)Online publication date: 24-Jun-2014
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Other conferences
EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting
September 2013
289 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].


  • ARCOS: Computer Architecture and Technology Area, Universidad Carlos III de Madrid



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2013


Request permissions for this article.

Check for updates

Author Tags

  1. CUDA
  2. MPI
  3. image processing
  4. parallel architectures


  • Research-article

Funding Sources


EuroMPI '13
EuroMPI '13: 20th European MPI Users's Group Meeting
September 15 - 18, 2013
Madrid, Spain

Acceptance Rates

EuroMPI '13 Paper Acceptance Rate 22 of 47 submissions, 47%;
Overall Acceptance Rate 66 of 139 submissions, 47%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics


Cited By

View all
  • (2019)iFDKProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356163(1-24)Online publication date: 17-Nov-2019
  • (2015)A comparative study of an X-ray tomography reconstruction algorithm in accelerated and cloud computing systemsConcurrency and Computation: Practice & Experience10.1002/cpe.359927:18(5538-5556)Online publication date: 25-Dec-2015
  • (2014)Three-Level Parallelism for FDK Algorithm Using Multi-GPU Based Cluster SystemProceedings of the 2014 IEEE 13th International Symposium on Parallel and Distributed Computing10.1109/ISPDC.2014.28(184-188)Online publication date: 24-Jun-2014
  • (2014)High-performance X-ray tomography reconstruction algorithm based on heterogeneous accelerated computing systems2014 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2014.6968781(331-338)Online publication date: Sep-2014

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media