Distributed and hardware accelerated computing for clinical medical imaging using proton computed tomography (pCT)

https://doi.org/10.1016/j.jpdc.2013.07.016Get rights and content

Highlights

  • Accelerated proton computed tomography (pCT).

  • Parallelized pCT on 60-node cluster, with MPI and CUDA.

  • Reconstruct 2 billion protons in 6 min.

  • Speed-up of about 180X over pCT with one CPU + one GPU.

Abstract

Proton computed tomography (pCT) is an imaging modality that has been in development to support targeted dose delivery in proton therapy. It aims to accurately map the distribution of relative stopping power. Because protons traverse material media in non-linear paths, pCT requires individual proton processing. Image reconstruction then becomes a time-consuming process. Clinical-use scenarios that require images from billions of protons in less than ten or fifteen minutes have motivated us to use distributed and hardware-accelerated computing methods to achieve fast image reconstruction. Combined use of MPI and GPUs demonstrates that clinically viable image reconstruction is possible. On a 60-node CPU/GPU computer cluster, we achieved efficient strong and weak scaling when reconstructing images from two billion histories in under seven minutes. This represents a significant improvement over the previous state-of-the-art in pCT, which took almost seventy minutes to reconstruct an image from 131 million histories on a single-CPU, single-GPU computer.

Introduction

Proton therapy is a form of radiation treatment that uses protons as agents for delivering dose. Because of the proton’s physical properties, dose delivery can be highly directed or localized to organs of interest. In order to utilize fully this potential capability for targeted-dose delivery, it is necessary to have a priori knowledge of how a proton loses (or deposits) its energy as it traverses material media. The characteristics of the energy deposition depend on the size and generally non-homogeneous electron density of the medium. Proton energy deposition is quantified by the relative stopping power (RSP), which is the ratio between the proton’s stopping powers in the medium and water.

Currently, RSPs used for proton treatment planning are obtained from measurements with X-ray computed tomography (xCT). There is a necessary calibration step that converts xCT measurements (in Hounsfield units) to RSPs. Because the energies involved in xCT and proton therapy are orders of magnitude different, the calibration of xCT-derived RSPs is not very accurate, with margins of error of about 3%. Even such small uncertainties in RSPs might prove critical in delivering dose to the correct location(s).

Proton computed tomography (pCT) is an imaging modality developed for the purpose of directly obtaining the RSPs. The concept for pCT was first proposed by Cormack in 1976  [7], but recently there has been growing interest in the development of clinical pCT devices  [27], [6], [30]. The omission of the calibration step that depends on a completely different imaging modality makes pCT-measured RSPs potentially more accurate.

In order for pCT to be a useful tool for proton treatment planning, it is necessary to provide images within clinically meaningful times of ten to fifteen minutes  [22]. There are two major obstacles to achieving this. The first is that in order to obtain statistically reliable images, a large number of protons must be collected. The second is that the physics of a proton’s interactions with material media results in non-straight paths. Therefore, data-reduction techniques used in other imaging modalities with probes that travel in straight lines (such as photons in emission and computed tomography) cannot be used in pCT. Reconstruction in pCT must take into account the path of each proton on an event-by-event basis. Performing such event-by-event analysis and coalescing the results from that analysis to reconstruct a coherent tomographic image on a large number of protons require considerable computation and memory. From a real-world example, image reconstruction from 131 million protons on a single-CPU single-GPU computer (i.e., sequential code) takes almost seventy minutes, which extrapolates to almost nine hours for one billion protons.

In this paper, we present our investigations on reducing the time to reconstruct pCT images to within clinically meaningful time scales, while preserving image quality, via a suitable combination of the Message Passing Interface (MPI) and GPUs (i.e., parallel code). There has been growing interest in the use of GPUs in medical physics, where they have been shown to improve image reconstruction  [24]. GPUs have been used to accelerate tomographic reconstruction in many fields, including examples from positron emission tomography  [2], magnetic induction tomography  [19], optoacoustic tomography  [32], and transmission tomography  [31]. These imaging methods rely on a different physical model from pCT, producing fundamentally different approaches to reconstruction. Like xCT, all of these other imaging methods are based on probes that travel in straight lines. One study in electron tomography  [13] used parallelized implementations of an iterative algorithm, using MPI on a CPU cluster. The performance of implementations of iterative solvers on GPUs has been widely studied  [10], [34], [35], [18], demonstrating dramatic speedups with one or more GPUs versus the CPU in a single-node system. Of particular relevance to our study are recent reports on the advantages of GPUs in processing list-mode (event-by-event) data in emission computed tomography  [3], [8].

Section snippets

Theory

Protons in pCT are typically detected with a configuration  [27] such as shown in Fig. 1. The object to be imaged (the target) is positioned in a rotating assembly between two “telescopes” that each comprises a pair of position-sensing planes. A mono-energetic beam of protons is directed from the left and towards the leftmost plane as depicted in Fig. 1. The sensor planes in each telescope measure two positions through which a proton has passed. The positions measured at the left and right

Faster image reconstruction through parallelization and hardware acceleration

The basic parallelization strategy is to partition the proton history data across a set of distributed memory compute nodes, coordinate the algorithm through message passing, and to utilize the GPUs on each node.

The reconstruction begins by reading all binary input histories into memory. The parallel code has each MPI process reading its own subset of histories. Preliminary calculations on the input data are done before the actual reconstruction (i.e., solving the linear equation). These

Material and methods

The baseline for this study is based on code from the dissertation of Penfold  [22]. The baseline code (i.e., sequential code) implements pCT reconstruction on a single-CPU, single-GPU computer. The code groups proton histories into blocks, and uses NVIDIA’s CUDA (Computer Unified Device Architecture) version 5.0  [21] to process simultaneously the histories in each block. The sequential code solves the linear equation with a version of a block-iterative algorithm called diagonally-relaxed

Results and discussion

Because the primary goal of pCT is to map the distribution of RSP values, it is important to demonstrate that the results from the parallelized version are consistent with those from the sequential code. Such comparison was performed via a quantitative analysis of the computed RSP measurements from both codes. Table 1 summarizes the mean RSP values for the four materials in the LUCY phantom. These mean RSP values were obtained from an ROI analysis on these four materials using identical

Conclusions

We have demonstrated that image reconstruction in pCT can be accelerated through a hybrid approach that uses both MPI and GPUs. Our implementation scales almost linearly with problem size and scale. This has significant implications for proton therapy and treatment planning, where it is important to obtain pCT images in clinically reasonable times. Through our hybrid MPI–GPU approach we have reconstructed pCT images, while preserving the state-of-the-art image quality, from proton data sets

Future work

We will continue our investigation to reduce image reconstruction time by attempting to address the load imbalance in the linear solver step, porting the linear solver step to GPUs, and investigating MIC-based processors. We will also continue to work with our collaborators at LLUMC to improve the state-of-the-art in reconstructed image quality.

Acknowledgments

US Department of Defense contract no. W81XWH-10-1-0170 and US Department of Energy contract no. DE-SC0005135 sponsored this work. Access to the University of Chicago’s Petascale Active Data Store (PADS) system was supported in part by the National Science Foundation under grant OCI-0821678 and by the University of Chicago. Access to the Magellan system Argonne National Laboratory was provided by the Argonne Leadership Computing Facility and supported with American Recovery and Reinvestment

Nicholas T. Karonis, Ph.D., is a professor of computer science at Northern Illinois University. His research interest is high-performance computing applications and tools, particularly as they apply to computational grids and cloud computing. Karonis received a Ph.D. in computer science from Syracuse University. Contact him at [email protected].

References (35)

  • Y. Censor et al.

    On diagonally-relaxed orthogonal projection methods

    SIAM J. Sci. Comput.

    (2008)
  • G.A.P. Cirrone et al.

    Monte Carlo studies of a proton computed tomography system

    IEEE Trans. Nucl. Sci.

    (2007)
  • A.M. Cormack et al.

    Quantitative proton tomography: preliminary experiments

    Phys. Med. Biol.

    (1976)
  • J.Y. Cui et al.

    Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA

    Med. Phys.

    (2011)
  • B. Erdelyi

    A comprehensive study of the most likely path formalism for proton-computed tomography

    Phys. Med. Biol.

    (2009)
  • L.A. Feldkamp et al.

    Practical cone-beam algorithm

    J. Opt. Soc. Amer. A

    (1984)
  • D. Gordon et al.

    Component-averaged row projections: a robust block-parallel scheme for sparse linear systems

    SIAM J. Sci. Comput.

    (2005)
  • Cited by (25)

    • The role of Monte Carlo simulation in understanding the performance of proton computed tomography

      2022, Zeitschrift fur Medizinische Physik
      Citation Excerpt :

      Similarly to reconstruction techniques using straight lines, a linear system of equations is obtained and the same reconstruction techniques can be used. Most algorithms attempt to find the solution which best fits the data in the least-squares sense but they differ in how they solve the problem [46–49]. They can also differ on prior assumptions, e.g., that the reconstructed object should be piecewise constant [16,50,51] or similar to an x-ray prior [50,52].

    • Network health and e-Science in commercial clouds

      2016, Future Generation Computer Systems
      Citation Excerpt :

      A pCT reconstruction is primarily comprised of four stages, data distribution, computing cuts and margins, most likely path (MLP) calculation and an iterative linear solver. Karonis et al. [29] have developed parallel MPI codes that enable large, two billion proton history, images to be reconstructed within ten minutes on a dedicated HPC cluster.

    View all citing articles on Scopus

    Nicholas T. Karonis, Ph.D., is a professor of computer science at Northern Illinois University. His research interest is high-performance computing applications and tools, particularly as they apply to computational grids and cloud computing. Karonis received a Ph.D. in computer science from Syracuse University. Contact him at [email protected].

    Kirk L. Duffin, Ph.D., is an associate professor of computer science at Northern Illinois University. In addition to his work in high performance computing, his research interests include scientific visualization, virtual reality, image based modeling and rendering, and spatio-temporal data representations. Duffin received a Ph.D. in computer science from Brigham Young University. Contact him at [email protected].

    Caesar E. Ordoñez, Ph.D., has been working in the field of medical imaging for over twenty years. His experience is on the simulation and analysis of novel medical imaging devices for positron emission tomography (PET) and single-photon emission computed tomography. Ordonez received his Ph.D. in nuclear physics from the Massachusetts Institute of Technology.

    Bela Erdelyi, Ph.D., is a member of the Beam Physics group at NIU and the Physics Division at ANL He worked on a variety of problems involving the creation, acceleration, transport and manipulation of charged particle beams with an array of applications to nuclear, particle and medical physics. He has also been researching the connections among beam physics, modern mathematics and advanced numerical methods in physics research.

    Thomas D. Uram is a member of the research staff at Argonne National Laboratory and the Computation Institute at the University of Chicago. His research interests include large-scale data intensive computing, GPGPU computing, heterogeneous computing architectures, and hybrid programming models.

    Eric C. Olson is a software developer specializing in large scale parallel computing, visualization, and interactive applications. Eric worked on interactive scientific data visualization at Argonne National Labs and the University of Chicago for ten years. He is now working on large scale problems as a senior software developer at Webfilings LLC.

    George Coutrakon, Ph.D., earned his Ph.D. in physics from State University of New York in 1983. He is currently a medical physicist and associate professor of physics at Northern Illinois University (NIU). Before coming to NIU in Oct. 2008, Dr. Coutrakon was director of proton accelerator operations at Loma Linda University Medical Center (LLUMC) in Southern California since 1990. His current research is in the field of proton computed tomography, or proton CT. He has written proton therapy papers on microdosimetry, beam optics, proton accelerators, and proton CT. Dr. Coutrakon currently teaches medical physics courses and works with NIU, Fermilab, and Loma Linda researchers in building a new proton CT scanner designed to image human anatomy using proton beams instead of conventional X-rays. He is the principal investigator for a DOD grant with oversight for the new proton CT scanner to be completed to start beam testing by mid-2013.

    Michael E. Papka, Ph.D., is a senior scientist at Argonne National Laboratory, where he also fulfills two senior leadership roles in the Computing, Environment and Life Sciences Directorate: deputy associate laboratory director, and director of the Argonne Leadership Computing Facility. He is a senior fellow of the University of Chicago/Argonne Computation Institute and an associate professor of computer science at Northern Illinois University. He conducts interdisciplinary research in large-scale scientific visualization and data analysis. His interests include technology in support of science and its use in educational settings. He is a senior member of IEEE and ACM.

    View full text