1 Introduction

Modelling the correlations between pathology images and MRI images has been investigated (e.g. [9]). The former images can be used for definitive diagnosis and the latter images can be obtained non-invasively. Models that represent the correlations of these images would improve the confidence of diagnosis and can be used for predicting histopathological status from the corresponding MRI images. In this study, we construct a non-parametric model that represents the correlation between the voxel value of an MRI image and corresponding histopathological images of a pancreas tumor of a KPC mouse. This model plays an important role in a system that acquires various kinds of information currently obtained from histopathological images with probability information from MR images. For constructing such a model, we employ a conditional Laplacian Pyramid of Generative Adversarial Network (LAPGAN) [1].

A Generative Adversarial Network (GAN) [4] can construct a sufficiently representative latent model of target images, while simultaneously learning a generator and a discriminator: The generator can create sample images that are intended to come from the same distribution with the training data and the discriminator examines samples to determine whether they are real or fake. The latent model of target images can be represented by a manifold, from which the generator can create the fake images by sampling [10]. In many cases, a noise signal, \(\varvec{z}\), is input to the generator and the noise signal, \(\varvec{z}\), can be used as the local coordinate system on the manifold [2]. Generators learned by conditional GANs take not only noise signals but also other condition signals as inputs, where the condition signals restrict the output sample images. The condition signals also can be used as the local coordinate system on the manifold. When the input values of the condition signals are fixed, the generator creates fake images that correspond to a sub-manifold restricted by the conditions. The manifolds with the local coordinate systems can represent the correlations between the target images and the condition signals [10]. In this study, we construct a generator that takes noise signals and a voxel value of an MRI image as input and outputs corresponding pathology image patches sampled from the sub-manifold determined by the MRI voxel value.

The spatial resolution of an MRI image and that of a microscope pathology image are largely different. Each single voxel of an MRI image corresponds to a large image patch of the pathology image. We hence assume that each voxel value correlates with low-resolution features of the pathology images and employ a LAPGAN, which can generate a cascade of image generators, each of which creates a sample fake image that represents the difference between a high-resolution image and the corresponding low-resolution input image [1].

2 Method

2.1 Outline of the Proposed Method

Figure 1 shows the outline of the construction of a multi-scale pancreas tumor model from an MRI image and the corresponding pathology images of a KPC mouse. The MRI image of a whole body of a KPC mouse was captured just before the pancreas tumor was extracted. A 3D pathology image of the tumor was reconstructed from a spatial series of 2D microscope images of the tumor and the 3D pathology image was non-rigidly registered to the tumor region in the MRI image in order to obtain a set of training data for the LAPGAN, in which each datum is a pair of the voxel value of the MRI image and the corresponding image patch in the microscope image. Applying conditional LAPGAN to the training data, we construct a cascade of generators, which can generate a pathology patch image that corresponds to the input voxel value of the MRI image.

Fig. 1.
figure 1

Construction of a multiscale model of pancreas tumor using an MRI image and the corresponding pathology images

2.2 Images Used in the Experiments

The training images for the conditional LAPGAN was obtained as follows. An MRI image of the whole body of the KPC mouse was captured just before the organs including the whole part of the tumor was extracted. The spatial resolution of the MRI image was 0.1536 mm \(\times \) 0.1536 mm \(\times \) 0.5 mm. The tumor was spherical and its diameter was about 2 cm. Two MRI images of the extracted organs were obtained before and after the organs were formaline-fixed. Registering the two MRI images, we found that the organs around the tumor shrunk but the tumor itself did not deform in the formaline fixation.

The extracted organs were paraffin-embedded after the formaline-fixation. We first cut the paraffin block into five small blocks, of which the thickness was about 5 mm, and one of the small blocks, which contained the center portion of the tumor, was sliced into a spatial series of about 800 thin sections. The thickness of the section was set to 4 \(\upmu \)m. The number of the sections obtained from the 5 mm = 5000 \(\upmu \)m thick block was less than \(1250 = 5000/4\) because of the loss generated by the slicing. We dyed the thin sections by the Hematoxylin and Eosin (H&E) stain. The microscopy images of these stained sections were then captured with the spatial resolution, 0.22 \(\upmu \)m \(\times \) 0.22 \(\upmu \)m.

2.3 Reconstruction of a 3D Microscope Image

For the 3D reconstruction of the microscope image, we employed the non-rigid registration method proposed in [6]. Let the given 2D microscope images be denoted by \(I_1(\varvec{x}), I_2(\varvec{x}), \dots , I_M(\varvec{x})\), where M is the total number of the 2D microscope images and \(\varvec{x} = [x,y]^\top \) denotes the 2D image coordinates. It is assumed that the given images are roughly aligned, for example, by a rigid registration method. Let the deformation mapping computed for \(I_i(\varvec{x})\) be denoted by \(\phi _i\) (\(i=1, 2, \dots , M\)). Then, the 3D image, J(xyz), is reconstructed as \(J(x,y,z=i) = I_i(\phi _i^{-1}\circ \varvec{x})\). The mapping, \(\phi _i\), is computed from a set of landmarks located in \(I_i(\cdot )\). Let \(\varvec{p}^j_i\) denote the 2D image coordinates of the j-th landmark (\(j=1, 2, \dots , N\)) in the i-th image, \(I_i(\cdot )\) and let the coordinates of the destination of the j-th landmark in \(I_i(\cdot )\), to which the landmark should be mapped by \(\phi _i\), be denoted by \(\varvec{q}^j_i\). Then, with the regularization with respect to the deformation rigidity, the mapping is obtained by solving a minimization problem: \(\phi _i = \arg \min _\phi \sum _j\Vert \varvec{q}_i^j - \phi \circ \varvec{p}_i^j\Vert ^2\). The method we employed [6] determines the destination, \(\varvec{q}_i^j\), of each landmark by smoothing the trajectory of the j-th landmark in the 3D image space.

It should be noted that each of the mappings, \(\phi _i\), is determined not by referring to only consecutive two images but by referring to all the given images.

2.4 Registration Between MRI Image and Pathology Image

The tumor region in the MRI image and the tumor region in the reconstructed 3D microscope image were registered. A mutual information based non-rigid registration method [8] was employed. Assuming that the deformation of the tumor in the 3D microscope image mainly occurred when each thin section of the tumor specimen was placed on the slide glass, we restricted the movement of the control points to a plane where the z-coordinate is constant: Let the mapping to be computed be denoted by \(\psi \) and let \(\varvec{X}^\prime = \psi \circ \varvec{X}\), where the three-vectors, \(\varvec{X}^\prime = [x^\prime , y^\prime , z^\prime ]^\top \) and \(\varvec{X} = [x,y,z]^\top \), denote the 3D coordinates in the reconstructed microscope image, \(J(\cdot )\). We computed \(\psi \) that maximizes the mutual information under a condition that \(z^\prime = \psi \circ z = z\) is satisfied. The mapping, \(\psi \), then keeps the plane, \(z=i\), corresponding to each microscope image, \(I_i(\cdot )\), flat.

As the result of the registration described above, each voxel in the MRI image is corresponded to a specific region in the 3D microscope image as shown in Fig. 2.

Fig. 2.
figure 2

Registration between the tumor region in the MRI image and the reconstructed 3D pathology image. (A): A pathology image of the tumor region. The pale pink portion includes necrosis region. (B): The corresponding MRI slice image. As shown in (C), the registration makes the correspondence between each voxel in the MRI image and a portion in the pathology image. (Color figure online)

2.5 Construction of Training Image Data for LAPGAN

Let the index of the voxel in the MRI image be denoted by \(m\in \mathbb {N}\) and let the 3D region in the reconstructed 3D microscope image, \(J(\psi ^{-1}\circ \varvec{X})\), that corresponds to the m-th voxel of the MRI image be denoted by \(\varGamma _m\). Let the portion in the plane, \(z=i\), included in \(\varGamma _m\) be denoted by \(\varGamma _{mi}\). The region, \(\varGamma _{mi}\), in the deformed i-th microscope image, \(J(x^\prime , y^\prime , z^\prime =i)\), corresponds to the m-th voxel in the MRI image. Let the value of the m-th voxel of the MRI image be denoted by \(v_m\) and let a set of \(256 \times 256\) image patches included in \(\varGamma _{mi}\) be denoted by \(\{\mathcal{I}_{mis}(\varvec{x}) | s = 1, 2, \dots S_{mi}\}\), where \(S_{mi}\) denotes the number of patches sampled from \(\varGamma _{mi}\). We sample the patches only from the H&E stained images and augmented the patches by applying random rotation to \(\mathcal{I}_{mis}\) for increasing the number of training data. Then, we obtain a set of training data,

$$\begin{aligned} \mathcal{D} = \{(v_m, \mathcal{I}_{mis}) | m, i, s \in \mathbb {N}\}, \end{aligned}$$
(1)

in which each datum is a pair of the voxel value of the MRI image and the corresponding patch in H&E stained microscope images.

For avoiding the mode collapse, we quantize the MRI voxel value, \(v_m\), by applying a K-means clustering method so that the variety of the data that have same condition is increased. Let the number of the clusters be denoted by K and let the clusters of voxel values be denoted by \(\mathcal{C}_k\) (\(k=1, 2, \dots , K\)). Then, we can obtain K sets, \(\mathcal{D}_k\), of training image patches from \(\mathcal{D}\) such that

$$\begin{aligned} \mathcal{D}_k = \{\mathcal{I}_{mis} | v_m\in \mathcal{C}_k; m, i, s \in \mathbb {N}\}, \end{aligned}$$
(2)

where the image patches included in \(\mathcal{D}_k\) correspond to the MRI voxel values that are included in \(\mathcal{C}_k\). We use the index, k, as the condition for the LAPGAN.

The LAPGAN constructs a series of image generative models within a Laplacian pyramid framework. In the Laplacian pyramid framework, an image is represented in a coarse-to-fine fashion, that is, by a series of band-passed images plus a low-frequency residual. Let \(D_\downarrow \) denote a downsampling operation with a factor, two (2): When the size of an input image, I, is \(W\times W\), then \(D_\downarrow \circ I\) is a new image of size \(W/2\times W/2\). Following the paper [1], we first built a Gaussian pyramid, \(g(\mathcal{I}_{mis})\) from each image patch, \(\mathcal{I}_{mis}\), such that \(g(\mathcal{I}_{mis}) = [\mathcal{I}_{mis}^0, \mathcal{I}_{mis}^1, \dots , \mathcal{I}_{mis}^L]\), where

$$\begin{aligned} \mathcal{I}_{mis}^{l+1} = D_\downarrow \circ G_\sigma \circ \mathcal{I}_{mis}^l, \end{aligned}$$
(3)

\(G_\sigma \) denotes a Gaussian smoothing with the variance, \(\sigma ^2\), \(\mathcal{I}_{mis}^{0} = \mathcal{I}_{mis}\), L denotes the number of the levels in the pyramid, and \(l=0,1,\dots , L-1\) denotes the level. From the Gaussian pyramid, \(g(\mathcal{I}_{mis})\), we then constructed the series of the band-passed images, \(\mathcal{B}^l_{mis}\), which can be computed from \(\mathcal{I}_{mis}^l\).

The LAPGAN we implemented constructs a series of \(L+1\) image generators: The l-th generator (\(l=0, 1, \dots , L-1\)) generates the band-passed image, \(\mathcal{B}^{l}_{mis}\), from \(\mathcal{I}^{l+1}_{mis}\) and the last (the L-th) generator can generate the lowest-frequency image residual, \(\mathcal{I}^L_{mis}\), from a Gaussian noise image. The cascade of the \(L+1\) generators, in the descending order of l, can generate the original image, \(\mathcal{I}_{mis}\). The l-th generator (\(l=0, 1, \dots , L-1\)) is constructed from sets of the training data, \(\{\mathcal{D}^{l}_k| k=1, 2, \dots , K\}\), where \( \mathcal{D}^l_k = \{(\mathcal{I}^{l+1}_{mis}, \mathcal{B}^l_{mis}) | v_m\in \mathcal{C}_k,\;m,i,s\in \mathbb {N}\}. \) The dataset, \(\mathcal{D}^l_k\), consists of the pairs of the downsampled image of the low-frequency residual, \(\mathcal{I}^{l+1}_{mis}\), and the band-passed image, \(\mathcal{B}^l_{mis}\), both are obtained from the microscope images that correspond to the voxel value, \(v_m\in \mathcal{C}_k\), of the MRI image. The last (the L-th) image generator is constructed from the sets of the images, \(\{\bar{\mathcal{D}}^L_{mis} | k = 1, 2, \dots , K\}\), where \(\bar{\mathcal{D}}^L_k = \{\mathcal{I}^{L}_{mis} | v_m\in \mathcal{C}_k,\;m, i, s\in \mathbb {N}\}\). The last generator does not need the band-passed images for the training.

2.6 Conditional LAPGAN

We constructed \(L+1\) image generators, \(\mathsf {G}^0, \mathsf {G}^1, \dots , \mathsf {G}^L\), by using the LAPGAN. The inputs of the last generator, \(\mathsf {G}^L\), are the Gaussian noise image, \(\varvec{z}\), and the index, k, of the class of the corresponding MRI voxel value. The output is a residual image of the lowest-frequency, \(\mathcal{I}^L\), which is indistinguishable from the training images, \(\mathcal{I}^L_{mis}\in \bar{\mathcal{D}}^L_k\). One can generate variety of such the indistinguishable images by changing the input Gaussian noise images. The inputs of the other generators, \(\mathsf {G}^l\) (\(l=0, 1, \dots , L-1\)) are the lower-frequency image, \(\mathcal{I}^{l+1}\), the Gaussian noise image, \(\varvec{z}\), and the index, k. The output of \(\mathsf {G}^l\) is a band-passed image, \(\mathcal{B}^l\), that can generate the higher-frequency image, \(\mathcal{I}^l\), from the input lower-frequency image, \(\mathcal{I}^{l+1}\) as \( \mathcal{I}^l = \mathcal{B}^l + O_\mathrm{sm}\circ U_\uparrow \circ \mathcal{I}^{l+1}\), where the resultant image, \(\mathcal{I}^l\), is indistinguishable from the training images, \(\mathcal{I}^l_{mis}\in \mathcal{D}^l_k\). Changing the input Gaussian noise image, one can generate variety of images that are indistinguishable from the training data for the discriminator.

Let the discriminators corresponding to \(\mathsf {G}^l\) be denoted by \(\mathsf {D}^l\). Given the dataset, \(\{\bar{\mathcal {D}}^L_k | k = 1, 2, \dots , K\}\), the LAPGAN constructs the L-th generator, \(\mathsf {G}^L\), and discriminator, \(\mathsf {D}^L\), by solving the problem, \(\min _{\mathsf {G}}\max _{\mathsf {D}} F^L(\mathsf {G}, \mathsf {D})\), where (some superscripts, L, are abbreviated)

(4)

Given the dataset, \(\{\mathcal {D}^l_k | k = 1, 2, \dots , K\}\), the LAPGAN constructs the l-th generator, \(\mathsf {G}^l\) for \(l=0,1,\dots , L-1\), by solving the problem, \(\min _{\mathsf {G}}\max _{\mathsf {D}} F^l(\mathsf {G}, \mathsf {D})\), where (again, some superscripts are abbreviated):

(5)

We employed CNNs for the generators and discriminators. We initialized the networks by using the method proposed in [3] and employed Adam [7] and the batch normalization [5] for the stochastic optimization.

3 Results

Setting \(K=4\), we divided voxel values of the MRI image into four (4) clusters. Examples of the pathology images, \(\mathcal{I}_{mis}\), included in \(\mathcal{D}_1\) (the brightest portion) and \(\mathcal{D}_4\) (the darkest portion) are shown in Fig. 3. The dataset included about one million image patches. The distribution of the patch patterns are different among the clusters, \(\mathcal{D}_k\). Setting the number of the cascade \(L=3\), we constructed \(L+1\) generators. The cascade of the generators, \({\mathsf {G}}^L, {\mathsf {G}}^{L-1}, \dots , {\mathsf {G}}^0\), can output a fake pathology image from a given Gaussian noise, \(\varvec{z}\), and the condition, \(k\in [1,K]\). Figure 4 shows examples of the randomly chosen generated images. Corresponding to the bright voxel values of the MRI image, the cascade of the generators sampled patches similar to those in the necrosis portion with a higher probability.

Fig. 3.
figure 3

Examples of the training pathology images, \(\mathcal{I}_{mis}\), included in \(\mathcal{D}_1\) and \(\mathcal{D}_4\)

Fig. 4.
figure 4

Examples of the fake pathology images generated by the cascade of the generators for the condition \(k=1\) and \(k=4\).

4 Discussion and Conclusion

We constructed a multiscale model of pancreas tumor that can generate a H&E stained pathology image from a voxel value of the corresponding MRI image. For obtaining a set of training images, we first reconstructed a 3D pathology image of the pancreas tumor and then registered it to the tumor region in the MRI image. For the construction of the multiscale model, we employed the conditional LAPGAN. The resultant generators output pathology image patches that look like those in the necrosis region with a high probability when the brighter voxel value of the MRI image is input. We constructed the model from a partial region of only one pancreas tumor observed from one KPC mouse. The future works include to annotate histopathological/genetic information to each portion of the histopathology image and to construct a model that can derive the histopathological/genetic information with confidence from each voxel of a given MR image.