1 Introduction and Background

In medical imaging, CT (computerized tomography) [18, 24] is one of the most significant tests to perform a diagnosis. Thus, it is imperative to develop reconstruction algorithms that provide high-quality images as well as high computational time efficiency. Having fast algorithms is essential to have short-term results of a CT scan available to medical professionals.

The reconstruction methods can be divided into two types, depending on their nature. On the one hand, we find the analytical algorithms. They are based on the application of the Fourier transform on the data obtained from the projection of x-rays on an object, called sinogram. On the other hand, we have the algebraic methods, whether direct or iterative, which make a mathematical approach to the reconstruction problem.

In clinical practice, the most widespread methods are the analytical ones. This is due to the reduced computational time cost involved, which can be vital in emergency diagnoses. However, algebraic methods allow reducing the number of projections, and therefore the radiation to which we expose the patient.

Since hardware elements evolve at high speed and also their cost decreases constantly, using algebraic approaches to solve these problems has become possible. Although they involve a high computational cost, nowadays it is less significant thanks to parallel computing (multiple CPUs, GPU computing, clusters, etc.) [12]. If we add the enhancement of main memory in new equipments, it is now feasible to implement these methods for large size problems as is our case.

The aim of this work is to achieve the resolution of the CT image reconstruction problem by means of two direct algebraic factorization methods. The first one, called multifrontal sparse QR (SPQR), and implemented in the library ‘SuiteSparseQR, a multifrontal multithreaded sparse QR factorization package’ [7], allows solving the equation of reconstruction problem more directly than the iterative methods we studied previously (such as ART [1], SART [2] or LSQR [22]). This is achieved by calculating the QR factorization of the sparse system matrix A to simulate its pseudo-inverse.

The second method is the singular values decomposition (SVD) of the system matrix A, which is carried out through the parallel implementation of the method included in the SLEPc [14] to simulate, as in the previous case, the pseudo-inverse of the matrix.

Since these methods are not widespread in clinical practice, the validity of both approaches will be tested. We will check the quality of the reconstructed image by working with a smaller number of projections in order to reduce the radiation induced to the patient, and analyze the computation resources necessary to perform the decomposition of the matrix A. This point will be vital for the use of these algebraic methods, since the matrices are large and can suppose a high demand for RAM.

2 Materials and Methods

2.1 CT Image Reconstruction Problem

In a CT reconstruction problem, using an algebraic approach we need to solve the equations system defined by the Eq. (1), where A (2) represents the system matrix, a sparse matrix that measures the weight of the influence that each ray has on the reconstructed image [5, 16]. The size of the matrix A is M \(\times \) N, where M is the number of traced rays, and N is the size in pixels of the image to be reconstructed. The vector g (3) represents the sinogram or projections vector and u (4) the solution image. Both the system matrix and the sinogram vector have been calculated with Joseph’s Forward Projection method [17].

$$\begin{aligned} A*u= g \end{aligned}$$
(1)
$$\begin{aligned} A= a_{i,j}\in \mathbb {R}^{MxN} \end{aligned}$$
(2)
$$\begin{aligned} g =[g_{1}, g_{2},\ldots , g_{M}]^{T} \in \mathbb {R}^{M} \end{aligned}$$
(3)
$$\begin{aligned} u =[u_{1}, u_{2},\ldots , u_{N}]^{T} \in \mathbb {R}^{N} \end{aligned}$$
(4)

The projections have been generated for a phantom corresponding to the mathematical representation of the human head developed by the Forbild Phantom Group [10], using the medical image program CONRAD [20] to generate the phantom reference images. This is defined by simple geometric objects that represent head elements with different densities such as bone, tissue, gray matter, etc.

When reconstructing a CT image for a given physical configuration of the scanner, the associated matrix A is always the same. What changes is the sinogram (g), that represents the studied object. If it were possible to calculate a pseudo-inverse \(A^{+}\) of the matrix A [13, 19], the image could be obtained more directly without solving a large equations system. This means the computational cost would be reduced to a matrix-vector product. But this idea not viable since the explicit calculation of the matrix \(A^{+}\) is prohibitive for a high resolution and many views in the reconstruction. The reason why is that \(A^{+}\) can be dense and contaminated due to rounding errors.

Another possibility is to calculate implicitly an approximation of the range and use it to simulate the pseudo-inverse. The two methods that we have considered to solve this approach are explained in the following subsections

2.2 Projections Reduction

In the acquisition of the data, the important factors that influence the reconstruction of the image are two, the number of samples per projection (detectors) and the number of projections. A reduced number of any of these two variables will cause the formation of artifacts in the reconstructed image when using analytical methods: rings by the Gibbs phenomenon, streaks and lines and Moiré pattern, amongst others [3].

To prevent such problems, we should take into account the Nyquist theorem [21], which says the sampling rate must be greater than twice the bandwidth of the sampled signal. If we undersample we get the aforementioned artifacts. The classical analytical procedure of image reconstruction (filtered back projection FBP) is a fast process but it needs a complete set of projections to obtain high-quality images. The exact number depends on the physical characteristics of the scanner, but typically the minimum number of projections to take is 360.

If the aim is to reduce the dose absorbed by the patient, we should take the fewer number of projections possible, so we should use different methods that can work with less projections, such as iterative or direct methods. In our previous works [6, 8, 9, 23] it has been shown that it is possible to use few views, between 30 and 90, while using iterative methods to reconstruct high-quality images. However, the computational cost is high for these algorithms.

In order to take an approach that is not iterative, we propose the SPQR and SVD methods. In this way, the computational cost of the reconstruction could be reduced, so we could reconstruct images faster in real time. We will check if its application to the CT image reconstruction allows reducing the number of views in equal measure than the methods that we used previously.

2.3 Sparse Pivoting QR Factorization

The first new direct method used to solve the equations system is the Multifrontal Sparse QR (SPQR) [7]. It is a method that performs a QR [13] factorization of a large sparse matrix in a sequence of dense frontal matrices. In this way, we can form the pseudo-inverse of the system matrix.

The software used to perform this factorization is implemented in the SuiteSparseQR library, which uses BLAS and LAPACK, and Intel Threading Building Block to exploit parallelism. This allows processing heavy matrices, which is key in the reconstruction of CT images since A is large for high resolutions.

The QR factorization of the matrix A comprises its decomposition as a product of two matrices (5), where Q is orthogonal and R is upper triangular.

$$\begin{aligned} A = QR \end{aligned}$$
(5)
$$\begin{aligned} AP = QR \end{aligned}$$
(6)

However, when A is considerably large and sparse, this decomposition can be prohibitive since Q could not be sparse. Here, the decomposition with pivoting (6) must be used. The resolution would be:

  • If m \(\ge \) n we have to solve (7). That means a matrix-vector product of \(Q^{T}\) by the vector g, the resolution of an upper triangular system with the matrix R and a rearrangement with permutation matrix P.

  • If \(m<n\) we have to solve (8), that is reordering the vector g, then solve a triangular system and finally make the product of the resulting vector by the matrix Q.

$$\begin{aligned} u = P*(R^{-1}(Q^{T} *g)) \end{aligned}$$
(7)
$$\begin{aligned} u = Q*(R^{-T}(P^{T}*g)) \end{aligned}$$
(8)

2.4 Q-Less Sparse QR Factorization

If we want to spare memory, we can compute the QR decomposition without saving the Q explicitly. In this way we would have to transform the equations to obtain the solution vector u, which would be as follows:

  • If m \(\ge \) n we have to solve the system Q * R * g = u. By multiplying both sides of equation by \(A^{T}\), we obtain: \((Q*R)^{T}*Q*R*g=A^{T}*u\). Since Q is orthogonal, performing the matrix operations to solve for u, we reach the solution Eq. (9).

    $$\begin{aligned} u = R^{-1} * R^{-T} (A^{T} * g) \end{aligned}$$
    (9)
  • If m < n we would work with the QR decomposition of \(A^{T}\). By a similar reasoning to the previous case, the solution of the system would correspond to calculate (10), which requires the same operations but in a different order.

    $$\begin{aligned} u = A^{T}*R^{-1}*(R^{-T}*g) \end{aligned}$$
    (10)

Note that the Q matrix is not used for the calculation of u and therefore this process requires fewer memory resources.

2.5 Singular Values Decomposition

The singular values decomposition performs the k-order factorization (11) [4, 11, 19]. In this factorization, \(U \in \mathbb {R}^{Mxk} \) and \(V \in \mathbb {R}^{kxN}\) are matrices whose columns are orthogonal and are called left-hand and right-hand singular vectors, where k is the number of singular values computed. \(\varSigma \ \) \(\in \mathbb {R}^{kxk}\) is a diagonal matrix that has the singular values in decreasing order. The software we use to apply this algorithm is the SVD solver included in the eigenvalues calculation parallel library SLEPc. Applying this factorization, we can transform the equations system into the product (13), taking the pseudo-inverse as (12)

$$\begin{aligned} A = U\varSigma V^{T} \end{aligned}$$
(11)
$$\begin{aligned} A^{+}= V\varSigma ^{-1}U^{T} \end{aligned}$$
(12)
$$\begin{aligned} u = A^{+}g \end{aligned}$$
(13)

Note that the storage cost of this decomposition is of order (M * k) + (k * N) elements. When k grows, the memory resources necessary to carry out the decomposition increase. Once the decomposition of the system matrix has been calculated, resolving the problem consists on the product of two dense matrices by a vector,thus the computational cost is of order ((M + N) * k) flops.

Since we can do this decomposition ‘offline’, it can be calculated and stored and it can be ready when a CT image has to be reconstructed. Therefore, we can replace an iterative method with a much simpler and less computationally expensive matrix multiplication problem. In our application, we are interested in the case where k = N, in which case we have enough information to reconstruct the image with great precision.

3 Results and Discussion

In order to check the validity of both methods, we used a cluster belonging to the Universitat Politécnica de Valéncia to perform the factorizations. The cluster consists of 4 RX500S7 servers with four Intel Xeon E5-4620 8 core processors (32 cores per node) and 256 GB DDR3 RAM (8 GB/core ratio) per node.

3.1 Memory Requirements

For the SPQR factorization, we have used the code developed for Matlab. We reserve a single node in the cluster. In this node which we can consume a maximum of 250 GB of main memory. With these resources, it has been possible to compute the QR decomposition corresponding to the system matrix for an image resolution from 32 \(\times \) 32 up to 256 \(\times \) 256 with 30 views. Besides, using the Q-less QR factorization we have been able to use up to 90 views with the 256 \(\times \) 256 resolution, which is important as we will explain in Sect. 3.3. For the analogous case of resolution 512 \(\times \) 512, considering 30 views, the matrix has not been factorized due to lack of RAM.

Regarding the SVD decomposition, all cases of 30 views have been computed, except for the 512 \(\times \) 512 resolution case. For this process, we have reserved up to 32 processors with 15 GB of RAM per processor, which means up to 240 GB. Despite making use of distributed memory, we can not reserve over one server node at a time, similar to the sequential case. In addition, the SVD process requires more memory than the SPQR, so with the same characteristics, it has not been possible to reach the case of 90 views for the 256 \(\times \) 256 resolution.

3.2 Computational Time Efficiency

Although the factorization time is not critical in this scenario, it is convenient to have an approximation of the computation time required for each matrix size. In addition, we want to make a comparison between the two proposed methods. Table 1 shows the results in seconds of computational time cost to carry out the decompositions.

Table 1. Factorization time

As we can observe, the factorization time in both cases is satisfactory. Although it may seem that with the SVD decomposition it is too high, we must bear in mind that this calculation will be performed only once and ‘offline’, which makes it a non-critical calculation. As shown in the table, the computational time efficiency of the SPQR method is greater, which can be justified because this calculation is performed in a single node, and therefore does not require distributed memory as does the SVD. Since SVD implements the communication through MPI messages, it generates an extra time associated with interprocess communications, which makes the temporary efficiency much worse.

3.3 Image Quality

Regarding the quality of the images obtained by performing a reconstruction with both methods, the results are reflected in Table 2. To measure the quality, we used the PSNR metric [15], which shows the noise level of a image compared to a reference image, and also the SSIM metric [15], which indicates the structural similarity between both images, based on the shape of the elements it contains.

Table 2. Reconstructed images quality

In both cases we can see that up to the resolution 128 \(\times \) 128, the results of the reconstruction are very good, getting a PSNR greater than 200 in all cases (considering that a PSNR close to 100 can be considered a perfect reconstruction to the perception of the eye human). In addition, all cases obtain an SSIM equal to 1, which means that structurally the image is accurate. The result of the reconstruction by SVD with resolution 128 \(\times \) 128 can be seen in Fig. 1b. If we compare it with the reference image for the same resolution (Fig. 1a), we observe that they are almost identical and the reconstruction does not have artifacts or noise. However, for the 256 \(\times \) 256 resolution and 30 views, the result is of very poor quality for both methods, being the PSNR of both around 30. The reconstructed images, that we can see in Fig. 2 are very noisy and although we get to perceive elements of the phantom, it is not seen clearly enough.

Fig. 1.
figure 1

Reference and reconstructed 128 \(\times \) 128 images

Fig. 2.
figure 2

Reconstructions 256 \(\times \) 256 30 views

The poor quality of this reconstruction is due to the fact that for this particular case (256 \(\times \) 256 and 30 views), the sub-matrix extracted from the system matrix is rank deficient, as shown in Table 3. Therefore, the equations system is ill-conditioned, which leads to missing information necessary for the reconstruction. We observe that in the rest of the cases with smaller resolutions the sub-matrix retains the full rank, so we have no problem when reconstructing using these algebraic methods.

If we analyze the rank of the sub-matrices when we work with the largest resolutions, we can observe that we need to use more views in order to reach full rank. For 256 \(\times \) 256 pixels reconstructions, we should use at least 90 views, which means we still reduce the number of views with respect to traditional methods. For 512 \(\times \) 512 resolution, we need over 180 views, approximately 260, to keep the full rank on the sub-matrix. That means that even if we could still reduce a few views compared with other methods, the resulting reconstruction problem would have larger dimensions than we can compute.

Table 3. Rank study

We have verified that the rank of the matrix used has direct effect on the reconstructed image. In Fig. 3 we present the singular vector of a full-rank sub-matrix. As we can see, we have a few dominant values, approximately 1000, and then they start to decrease. When we reach the 8000th value, it looks like they stabilize and are close to 0. If we look to the detail window on the plot, we can observe that the last 1600 values, even if they are close to 0, vary between 0.03 and 0.01, which is still a significant number in this case. Therefore, this means we can not disregard any singular value.

Fig. 3.
figure 3

Singular vector 128 \(\times \) 128

In Fig. 4 we can observe how varying the number of singular values, we get very different reconstructions. As explained before, we do not reach optimal quality until we use the full rank. We observe the same effect with SPQR if we vary the number of views.

Fig. 4.
figure 4

Reconstruction varying singular values 128 \(\times \) 128

In Fig. 5 we see the difference between taking 30, 60, and 90 for the 256 \(\times \) 256 resolution matrix. With 60 projections we increase the rank, so the image is slightly better than with 30, as can be observed in the edges of the phantom. But it is not until we reach 90 views that the image is of good quality.

Fig. 5.
figure 5

SPQR 256 \(\times \) 256 reconstructions

4 Conclusion

In the present work, we have proposed two sparse matrix factorization methods that can be used for CT image reconstruction. In this way, we simulate the use of the pseudoinverse of the system matrix to transform the initial problem solution into a simpler one. The main challenge of these methods is the high consumption of memory to perform the computation, so we had to make use of a high performance computing cluster.

In this cluster it has been possible to calculate the SPQR factorization up to a resolution of 256 \(\times \) 256 pixels and 90 views. In addition, the SVD up to 256 \(\times \) 256 and 30 views. Although with the SVD method we have not obtained satisfactory results for higher resolutions, through SPQR we have managed to perform a reconstruction of very high quality with 90 views and full rank. This translates into a very significant difference in the doses of x-rays to which patients should be exposed. Taking into account that the direct methods are based on the Nyquist theorem, for our scanner configuration, they would use significantly more views. However, we are reducing them to 90 in the case of resolution 256 \(\times \) 256 and to 30 in lower ones.

Since the calculation of the factorizations for this type of reconstruction problem can be done before the moment of the reconstruction itself and stored, the computation times required for all cases are acceptable, the SPQR method being faster because it does not need distributed memory.

In addition, we have transformed the problem of reconstruction into a problem of matrix-vector multiplication and resolution of a triangular system in the case of the QR and a matrix-vector product in the case of the SVD. These resolutions are highly parallelizable in both CPU and GPU, which means that having the matrices stored we could accomplish the reconstructions faster than with the previous methods.

Regarding the image quality obtained, it is very satisfactory for all the cases in which the sub-matrix generated is full range, obtaining images of higher quality than those obtained by iterative methods that we have presented in previous works, such as LSQR [4]. However, the moment the sub-matrix becomes rank deficient, the reconstructed image is of very poor quality due to the lack of information. In future works, it is proposed to introduce regularization algorithms to increase the range of sub-matrices, or a filter in the image that, as with the LSQR method, are combined in a certain way that allows a more quality approximation to be made despite having an ill-conditioned problem.

In addition, it is necessary to analyze the implementation of the factorizations in order to improve the use of RAM memory, and in this way to factorize the corresponding matrix for 512 \(\times \) 512 pixels, which is the objective resolution. For this problem, we set out to use out-of-core computing techniques that make use of the disk to avoid main memory problems.

In conclusion, we can say that we have tested the viability of the SPQR and SVD direct algebraic methods applied to CT image reconstruction. In spite of not having reconstructed images of very high quality, we have verified that with these methods it is possible to reduce the dose of radiation to a great extent if the matrix can be computed.

At this point of our work we have two resolution options: For rapid reconstructions with low dose and medium resolutions, factoring methods. For high-resolution reconstructions and slightly worse quality, the LSQR + FISTA + STF iterative method, which is slower but guarantees good results.