Keywords

1 Introduction

Diffusion Weighted Magnetic Resonance Imaging (DWMRI) is a powerful tool for creating extremely detailed 3D representations of White Matter (WM) architecture. It maps the structure of neural connections by measuring the local orientation of water molecule diffusion [6]. With the development of DWMRI, models are built to delineate the underlying anatomy of the WM tissue based on the diffusion data. Since water diffusion orientation correlates with the main neuron bundles trajectories, a natural approach is to track consistent diffusion pathways. This approach is called tractography (Fig. 1) and it creates a set of fibers (termed tractogram) representing the brain’s main neuron bundles [9]. Each fiber is represented by an array of 3D points sampled using fixed step size.

Fig. 1.
figure 1

DWMRI image (source: https://mrimaster.com/characteriseimagedwi.html) and a tractogram modeling the WM architecture (colors for visualization only).

Analyzing tractograms may pose several challenges. First, neural pathways are continuous objects that vary in length significantly (between 10 mm and 100 mm). In representing the fibers as a set of 3D points, one can use a fixed sampling frequency, ending up with signals of different dimension; Alternatively, fixed number of points can be used, leading to an inadequate representation of long fibers. To overcome this, alternative representation schemes are proposed, e.g. in [8] fibers and fiber bundles are presented as Gaussian Processes and this representation enables using an inner-product based metric for fiber similarity.

Modern tractography tools often produce highly-dense tractograms, containing millions of fibers, resulting in extremely large files. In population studies, multiple brains are analyzed and compared, therefore the storage issue is further exacerbated. The challenges above show the need to find more efficient representations for fibers, as well as fiber compression schemes. There have been some efforts in recent years to address this matter. For example, [7] suggested a compression framework for fibers which includes optimization on the number and location of samples along a given fiber in addition to quantization and encoding steps, which reduce the memory needed to store a full tractogram.

An alternative approach is the one of sparse coding that provides a better-compressed representation of the data compared to classic methods such as PCA and LDA and was shown to be beneficial in various applications [4]. In this scheme, each data point is assumed to be decomposed of a small number of columns (known as atoms) of a given dictionary, which may be either predefined (e.g., DCT or wavelet), or alternatively, can be inferred from the input data using dictionary learning methods. The latter tends to improve the sparse representation. While many dictionary learning strategies exist such as MOD and K-SVD [4], they are mainly designed for the case where the data lies on the same grid or has a fixed limited size. Therefore, the existing strategies are less suitable for dealing with data points of varying sizes, or with continuous data. Indeed, continuous dictionaries have been used before such as in the case of data admitting sparsity in the Fourier domain [3]. Yet, these dictionaries have not been learned, which limits their effectiveness in many applications.

Contribution: In this work we propose a novel continuous dictionary learning (CDL) scheme, which allows processing data with samples that do not necessarily lie on a grid or have a fixed size. This strategy provides a novel fiber representation, which is uniform in size and approximates with high accuracy fibers of any length. In addition, we motivate the use of the suggested representation by presenting a compatible metric that allows the calculation of inter-fiber similarities. Such calculations are the basic building blocks of frequently performed tasks in tractograms analysis, such as clustering, classification, and registration.

2 Methods

Standard Dictionary Learning (SDL): The SDL strategies have been designed for data with a fixed size (e.g., image patches). Given a data point \(f \in \mathfrak {R}^d\) that admits a sparse representation \(x \in R^K\) in a given dictionary \(D \in \mathfrak {R}^{d \times K}\) (typically \(K>d\)), i.e., \(f = Dx\), one may recover x from f by minimizing

$$\begin{aligned} \min _{x} \Vert x \Vert _0 ~ ~ s.t. ~ ~ f=Dx. \end{aligned}$$
(1)

If D is unknown, it may be learned from a given set of data points \(f_i\) by solving:

$$\begin{aligned} \min _{D,X} \Vert F-DX \Vert _F ^2 ~ ~s.t. ~ ~\forall i ~ ~\Vert x_i \Vert _0 \leqslant T_0, \end{aligned}$$
(2)

where \(F \in \mathbb R^{d \times N}\) and \(X \in \mathbb R^{K \times N}\) contain in their columns the data points \(f_i \in \mathbb R^d\) and their sparse representations \(x_i \in \mathbb R^K\) respectively; \(T_0\) is the target sparsity; and \(\left\| \cdot \right\| _F\) is the Frobenius norm. While both Eqs. 1 and 2 are NP hard problems, many approximation methods have been proposed for them such as orthogonal matching pursuit (OMP) for the first and K-SVD for the second [4].

In order to use SDL such as K-SVD for data which is continuous or of varying size, one needs to sample (perhaps with the use of an interpolation) the data points with a fixed size and only then train a dictionary for the sampled versions. This kind of coding is clearly suboptimal for fibers and many other types of data in which the length of the data points varies significantly and therefore many of them might be overly or underly sampled. Thus, a continuous dictionary that may be sampled in accordance to a given data sample is much needed.

Continuous Dictionary Learning (CDL): A possible way for coding data points of varying size is to use a continuous dictionary (CD), in which each atom is a continuous function. Then, given a certain data point, the functions of the dictionary can be sampled at the same sampling locations of this point. Then any standard sparse coding strategy may be used to recover the representation of this signal. Notice that this method provides us with a representation of the same size (as the number of the functions in the CD) for all the data points.

As SDL improves the sparse coding, it is expected that the same would happen also with CDL. Yet, it is impossible to learn the columns of the CD directly as is done in the discrete case. Inspired by the double sparsity model for dictionary learning [4], we propose to learn a parametric dictionary \(D=\varPhi A\) such that each of its atoms is a linear combination (defined by \(A \in \mathbb R^{K \times K}\)) of the CD \(\varPhi \). In this case, the number of parameters to be learned is finite but results with a CD. Given a set of N continuous functions \(f_i\), the CDL problem becomes

$$\begin{aligned} \min _{A,x_i} \sum _{i=1}^N\int _{t\in \varOmega _i} (f_i(t)-\varPhi (t) Ax_i)^2dt ~ ~s.t. ~\forall i ~ ~\Vert x_i \Vert _0 \leqslant T_0, \end{aligned}$$
(3)

where \(\varOmega _i\) is the domain of \(f_i\) (we assume it is included in the domain of the functions in \(\varPhi \)). The integration in Eq. 3 may be approximated by a finite sum over \(d_i\) samples of \(f_i\) and the functions in \(\varPhi \). Assuming a uniform sampling, this approximation results with the following minimization problem

$$\begin{aligned} \min _{A,x_i} \sum _{i=1}^N\sum _{t=1}^{d_i} (f_i(t)-\varPhi (t) Ax_i)^2 ~ ~s.t.~ \forall i ~ ~\Vert x_i \Vert _0 \leqslant T_0. \end{aligned}$$
(4)

For calculating A we use stochastic gradient descent with mini-batches of size b and a varying learning rate. At iteration n, given the previous approximation \(A_{n-1}\), we re-sample the dictionary \(\varPhi A_{n-1}\) for all the data points of the current mini-batch in their corresponding locations and calculate their sparse representation using OMP. Then we calculate \(A_n\) by taking a gradient step based on the current batch and its representation in \(A_{n-1}\).

Fiber Representation: In the context of fibers, we initialize A to be the identity matrix and \(\varPhi \) to be randomly selected fibers. We get a continuous version for each by using a cubic spline interpolation, which both promotes smoothness and preserves the locations of the fibers’ sampled points. To improve \(\varPhi \), we may select a smaller number of atoms for it at the beginning of the training and then every several iterations add to it as an atom the data point that has the worst representation. The matrix A is expanded accordingly (the new entries are initialized to zero except the corner, which is set to be 1 to select the new atom to the dictionary). This technique improves the training.

Sparse Inter-fiber Similarity: We base our suggested similarity measure on the known Cosine similarity which was found to be beneficial for fibers [10]:

$$\begin{aligned} sim^{cos} \left( f_i,f_j \right) =\frac{\left\langle f_i,f_j \right\rangle }{\Vert f_i \Vert \Vert f_j \Vert } \end{aligned}$$
(5)

where \(\left\langle \cdot \right\rangle \) stands for inner product and \( \Vert \cdot \Vert \) for \(L_2\) norm. Cosine similarity cannot be applied to signals of different length. It also cannot be used directly on the sparse representations due to the non-orthogonal nature of the dictionary.

We propose a modified cosine similarity measure that suits the suggested representation. This is a modification of the method suggested in [2]. The non-orthogonality is treated by the addition of weighting matrix that contains the similarity measure between atoms. The approximation of the cosine similarity measure is achieved by replacing the original fibers with the reconstructed fibers:

$$\begin{aligned} sim^{cos} \left( f_i,f_j \right) \cong sim^{cos} \left( \tilde{f_i},\tilde{f_j}\right) =\frac{\langle \varPhi Ax_i,\varPhi Ax_j \rangle }{\Vert \varPhi Ax_i \Vert \Vert \varPhi Ax_j \Vert }=\frac{ x_i^T A^T \varPhi ^T \varPhi A x_j }{\Vert \varPhi Ax_i \Vert \Vert \varPhi Ax_j \Vert }, \end{aligned}$$
(6)

We strive to avoid adding unnecessary reconstruction error and to perform the similarity computation without going back to the original space. Therefore, we propose to substitute the denominator’ norms by the norms of the original fibers, which can be saved before the sparse coding phase and later saved as part of the compressed data. This step would add one additional value to the representation, which does not have a significant impact on the representation length while allowing to use the original fiber norm for similarity approximation. The modified compressed representation \(\widehat{x}\) will contain the value: \(norm_i=\Vert f_i \Vert \), concatenated to the sparse coefficients vector \(x_i\). In addition, we define the similarity matrix \(S = A^T \varPhi ^T \varPhi A\in \mathbb R^{K\times K}\), whose entries measure the similarity between the atoms in \(\varPhi A\). Based on the above, we propose a new similarity measure between fibers, the Cosine using Continuous Dictionary Similarity (CCDS), which relies only on the compressed representation and can be calculated efficiently:

$$\begin{aligned} sim^{CCDS} \left( \widehat{x_i}, \widehat{x_j}\right) =\frac{x_i^T S x_j}{\Vert norm_i \Vert \Vert norm_j \Vert } \end{aligned}$$
(7)

3 Experiments and Results

Data: The proposed method was evaluated on MRI datasets of 5 healthy volunteers from the Human Connectome Project (HCP). Acquisition protocol included 3 shells (b-values of 1000, 2000, 3000), 96 unique directions for each, isotropic resolution of 1.25 mm and imaging matrix of \(144 \times 168 \times 111\) pixels [1]. Diffusion data was preprocessed using the HCP diffusion pipelines [5]. WM fibers were obtained using Q-Space Diffeomorphic Reconstruction (QSDR) and streamline tractography using DSI-StudioFootnote 1 software. Tractography was terminated upon reaching 1M fibers. In all the following experiments, we used three types of sets per brain, randomly selected from the full tractogram: 20,000 fibers used as the training-set (TrS) for the dictionaries, 2,000 used as the validation-set (VS) and 10,000 other fibers served as the testing set (TS).

Sparse Coding of Fibers using Different Types of Dictionaries: For each of the 5 tractograms, we created 3 different dictionaries based on the TrS, using the CDL and the other two alternatives described in Sect. 2: CD and SDL. The training was performed with the following configurations for each technique:

\(\underline{\textit{CDL:}}\) The sparsity constraint, \(T_0\), was set to 7 non-zero coefficients. Initial dictionary of size 500 was chosen to be the best out of 5 random initializations, evaluated on the VS. Every 10 iterations, a new atom was added to \(\varPhi \), increasing it to a final size of 700 atoms. These parameters were found in our experiments to provide a reasonable compromise between error rate and compression rate. Batch size was set to \(b=500\). A Learning rate (LR) of \(LR(n) = min(10^{-6},6*10^{-6}/log(n))\) was used, where n is the iteration number. The training process was stopped after 4k iterations, as the average reconstruction error on the VS was no longer descending.

\(\underline{\textit{CD:}}\) The dictionary is the initial CD selected for CDL.

\(\underline{\textit{SDL:}}\) The fibers under evaluation were interpolated and sampled at 20 3D points along the fiber’s path, equidistantly. This choice of m was found to be good in [11]. Instead of learning the dictionary on an entire fiber set, a representative subset of fibers is used in order to reduce the learning time. This subset was selected by randomly choosing 10000 fibers out of the full set. Such a train-set is diverse and empirically well represent the full tractogram. The K-SVD learning process was initiated with a set of K fibers representing the centroids of K cluster of the TrS, computed using K-means. The dictionary size was set to 700, \(T_0\) was set to 7 and the process was terminated after 200 K-SVD iterations, where no significant improvement was recorded in the VS reconstruction error.

Fig. 2.
figure 2

(a) RE convergence of SDL (blue) and CDL (magenta) and CD fixed value (dashed-black); (b) 500 original fibers (blue) and their matching reconstructions (red); (c) sample pairs of fibers with their RE; (d) Cosine vs. CCDS similarities for one brain.

Performance Evaluation: In order to evaluate the accuracy of the new representation relative to the original data from the tractography process, we need to reconstruct each fiber \(f_i\). This is done by sampling the continuous dictionary in the same sampling locations t of the original fiber and applying \( \hat{f_i}(t)=\varPhi (t)Ax\). Next, the distance between the two sets of 3D points, \(f_i, \hat{f_i}\) is evaluated. We assume that a good approximation for the distance between the two curves can be inferred from the distances between compatible pairs of sampling points from the two curves. We computed two measurements: average Euclidean distance over all point pairs, and maximal Euclidean distance among all point pairs. For a set of fibers, we computed mean and median values of the above two measurements, over all fibers. The aforementioned evaluation scheme was slightly modified to match the SDL framework: Using the SDL, one can reconstruct fibers with a prefixed number of samples as opposed to the original representation that had a varying number of sampling points. To solve this we interpolated the SDL reconstructed output fibers using spline interpolation. Next, the fibers were sampled equidistantly with the same number of sampling points as was in the original tractogram. Figure 2(a) presents the reconstruction error (RE) measured as the mean distance in the VS as a function of the learning iteration of SDL and CDL. For the CD, a fixed value is plotted as no learning is performed. Figure 2(b) and (c) visualizes original fibers from TS and their reconstructions using SDL. Table 1 shows the RE of the TS for all 5 brains. Notice that even the CD without learning provides a more accurate coding than the SDL. The addition of learning further improves the representation.

Table 1. Performance evaluation: Reconstruction error in mm using the suggested method (CDL) compared to SDL and CD, tested on the TS of 5 different brains.

Approximation of Cosine Similarity Using CCDS: The proposed CCDS similarity measure was evaluated by comparing the distances between the fibers in their original representation (Eq. 5) and the distances between the same fibers as calculated in the sparse representation using CCDS (Eq. 7). The distances were evaluated on 10,000 randomly selected fibers from the TS. Figure 2(d) shows the correlation graph of cosine vs. CCDS similarities for one of the brains.

4 Discussion and Conclusions

This work proposes a novel continuous dictionary learning scheme, which allows training a dictionary on diverse training sets such that samples do not need to lie on a grid or have a fixed size. The significance of the new methodology is the ability to train continuous dictionaries. This strategy provides a novel fiber representation, which is uniform in size and approximates with high accuracy fibers of any length.

We have tested the suggested method on 5 tractograms that were created from HCP data, leading to very satisfying results. With only 7 non-zero coefficients per fiber, we achieved average reconstruction error of 0.83 mm and max reconstruction error of 2.22 mm. Based on the average number of points per fiber in the original representation, our compression ratio is comparable with the state-of-the-art (95.6%) [7] that relies on a tractography with two times shorter step, which means initial representation two times larger. According to Table 1, CDL clearly outperformed the two alternative examined methods. Omitting the learning stage gives about 25% higher average reconstruction error. This emphasizes the importance of the learning process. Using K-SVD on a fixed size version of the data led to the least satisfying results. We believe K-SVD did not succeed due to the varying length of the WM fibers. The experiments conducted use constrained nonzero coefficients number, \(T_0 \le 7\), and a dictionary size of 700. Additional tests (not included herein due to lack of space), suggest that if more accuracy is needed, it may be achieved with higher \(T_0\) and a bigger dictionary, obviously with additional computational cost.

The suggested representation is advantageous over other compression schemes also due to the ability to estimate inter-fiber similarity using the proposed CCDS measure, directly in the compressed form, thus allowing common operations on fibers without decompressing the data. In addition, common metrics that can be used on signals of different length are usually computationally expensive, as opposed to CCDS, which is highly efficient. The legitimacy of CCDS was verified by achieving high correlation to the known Cosine similarity on the original representations (Fig. 2(d)). The proposed continuous dictionary learning methodology is general and can be applied to in other application with similar data characteristics.