Abstract
Due to the large data size of 3D MR brain images and the blurry boundary of the pathological tissues, tumor segmentation work is difficult. This paper introduces a discriminative classification algorithm for semi-automated segmentation of brain tumorous tissues. The classifier uses interactive hints to obtain models to classify normal and tumor tissues. A non-parametric Bayesian Gaussian random field in the semi-supervised mode is implemented. Our approach uses both labeled data and a subset of unlabeled data sampling from 2D/3D images for training the model. Fast algorithm is also developed. Experiments show that our approach produces satisfactory segmentation results comparing to the manually labeled results by experts.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-008-0104-3/MediaObjects/10044_2008_104_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-008-0104-3/MediaObjects/10044_2008_104_Fig2_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-008-0104-3/MediaObjects/10044_2008_104_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-008-0104-3/MediaObjects/10044_2008_104_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-008-0104-3/MediaObjects/10044_2008_104_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-008-0104-3/MediaObjects/10044_2008_104_Fig6_HTML.gif)
Similar content being viewed by others
Notes
For the positive semi-definite case, we can add extra regularization as the jitter noise [59].
Namely, if we want to induce \({\varvec{\Updelta}}_{N+1}\) from \({\varvec{\Updelta}}_N\) directly, it need compute D ii of each new give point. This is very time consuming.
The weight matrix is near semi-positive definite, so we use the pseudo-inverse or add the extra regularization to find the square root of A in practice.
References
Song Y, Zhang C, Lee J, Wang F (2006) A discriminative method for semi-automated tumorous tissues segmentation of MR brain images. In: Proceedings of CVPR workshop on mathematical methods in biomedical image analysis (MMBIA). p 79
Pham DL, Xu C, Prince JL (2000) Current methods in medical image segmentation. Annu Rev Biomed Eng 2:315–337
Liew AWC, Yan H (2006) Current methods in the automatic tissue segmentation of 3D magnetic resonance brain images. Curr Med Imaging Rev 2(1):91–103
Leemput KV, Maes F, Vandermeulen D, Suetens P (1999) Automated model-based tissue classification of MR images of the brain. IEEE Trans Med Imaging 18(10):897–908
Pham D, Prince J (1999) Adaptive fuzzy segmentation of magnetic resonance images. IEEE Trans Med Imaging 18(9):737–752
Zhang Y, Brady M, Smith SM (2001) Segmentation of brain MR images through a hidden Markov random field model and the expectation maximization algorithm. IEEE Trans Med Imaging 20(1):45–57
Marroquín JL, Vemuri BC, Botello S, Calderón F, Fernández-Bouzas A (2002) An accurate and efficient Bayesian method for automatic segmentation of brain MRI. IEEE Trans Med Imaging 21(8):934–945
Liew AWC, Yan H (2003) An adaptive spatial fuzzy clustering algorithm for 3d MR image segmentation. IEEE Trans Med Imaging 22(9):1063–1075
Prastawa M, Gilmore JH, Lin W, Gerig G (2004) Automatic segmentation of neonatal brain MRI. In: Proceedings of medical image computing and computer-assisted intervention (MICCAI). pp 10–17
Hall L, Bensaid A, Clarke L, Velthuizen R, Silbiger M, Bezdek J (1992) A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Trans Med Imaging 3(5):672–682
Sammouda R, Niki N, Nishitani H (1996) A comparison of Hopfield neural network and Boltzmann machine in segmenting MR images of the brain. IEEE Trans Nucl Sci 43(6):3361–3369
Zhou J, Chan KL, Chongand VFH, Krishnan SM (2005) Extraction of brain tumor from MR images using one-class support vector machine. In: Proceedings of 27th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBS). pp 6411–6414
Moon N, Bullitt E, Leemput KV, Gerig G (2002) Automatic brain and tumor segmentation. In: Proceedings of 5th international conference on medical image computing and computer-assisted intervention (MICCAI). pp 372–379
Shen S, Sandham W, Granat M, Sterr A (2005) MRI fuzzy segmentation of brain tissue using neighborhood attraction with neural-network optimization. IEEE Trans Med Imaging 9(3):459–467
Li C, Goldgof D, Hall L (1993) Knowledge-based classification and tissue labeling of MR images of human brain. IEEE Trans Med Imaging 12(4):740–750
Clark M, Hall L, Goldgof D, Velthuizen R, Murtagh F, Silbiger M (1998) Automatic tumor segmentation using knowledge-based techniques. IEEE Trans Med Imaging 17(2):187–201
Cuadra M, Pollo C, Bardera A, Cuisenaire O, Villemure JG, Thiran JP (2004) Atlas-based segmentation of pathological MR brain images using a model of lesion growth. IEEE Trans Med Imaging 23(10):1301–1314
Zhu Y, Yan Z (1997) Computerized tumor boundary detection using a hopfield neural network. IEEE Trans Med Imaging 16(1):55–67
Droske M, Meyer B, Rumpf M, Schaller C (2001) An adaptive level set method for medical image segmentation. In: Proceedings of 17th international conference information processing in medical imaging (IPMI). Davis, CA, USA, pp 416–422
Lefohn AE, Cates JE, Whitaker RT (2003) Interactive, GPU-based level sets for 3D segmentation. In: Proceedings of medical image computing and computer-assisted intervention (MICCAI). Springer, Montreal, QC, Canada, pp 564–572
Prastawa M, Bullitt E, Ho S, Gerig G (2004) Robust estimation for brain tumor segmentation. In: Proceedings of medical image computing and computer-assisted intervention (MICCAI), pp 10–17
Guermeur Y (2002) Combining discriminant models with new multi-class SVMs. Pattern Anal Appl 5(2):168–179
Tortorella F (2004) Reducing the classification cost of support vector classifiers through an ROC-based reject rule. Pattern Anal Appl 7(2):128–143
Debnath R, Takahide N, Takahashi H (2004) A decision based one-against-one method for multi-class support vector machine. Pattern Anal Appl 7(2):164–175
Sánchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10(3):189–201
Abe S (2007) Sparse least squares support vector training in the reduced empirical feature space. Pattern Anal Appl 10(3):203–214
Herrero JR, Navarro JJ (2007) Exploiting computer resources for fast nearest neighbor classification. Pattern Anal Appl 10(4):265–275
Tyree EW, Long JA (1998) A monte carlo evaluation of the moving method, k-means and two self-organising neural networks. Pattern Anal Appl 1(2):79–90
Chou CH, Su MC, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7(2):205–220
Frigui H (2005) Unsupervised learning of arbitrarily shaped clusters using ensembles of gaussian models. Pattern Anal Appl 8(1-2):32–49
Omran MGH, Salman A, Engelbrecht AP (2006) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332–344
Seeger M (2001) Learning with labeled and unlabeled data. Technical report, Institute for ANC, Edinburgh, UK. http://www.dai.ed.ac.uk/seeger/papers.html
Zhu X (2005) Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Belkin M, Niyogi P (2003) Using manifold structure for partially labeled classification. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 929–936
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 1(1):1–48
Krishnapuram B, Williams D, Xue Y, Hartemink A, Carin L, Figueiredo M (2005) On semi-supervised classification. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 721–728
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 321–328
Zhou D, Schölkopf B (2005) Regularization on discrete spaces. In: Proceedings of pattern recognition, 27th DAGM symposium (DAGM-symposium). Lecture notes in computer science. Springer, Vienna, pp 361–368
Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of twentieth international conference of machine learning (ICML). AAAI Press, Washington, DC, USA, pp 912–919
Zhu X, Lafferty J, Ghahramani Z (2003) Semi-supervised learning: from Gaussian fields to Gaussian processes. Technical report CMU-CS-03-175, Computer Sciences, Carnegie Mellon University. http://www.cs.cmu.edu/zhuxj/publications.html
Sindhwani V, Chu W, Keerthi SS (2007) Semi-supervised Gaussian process classifiers. In: Proceedings of international joint conferences on artificial intelligence (IJCAI), pp 1059–1064
Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the Nyström method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225
Grady L, Funka-Lea G (2004) Multi-label image segmentation for medical applications based on graph-theoretic electrical potentials. In: Proceedings of ECCV workshops on CVAMIA and MMBIA, pp 230–245
Suri JS, Singh S, Reden L (2002) Computer vision and pattern recognition techniques for 2-D and 3-D MR cerebral cortical segmentation (part i): a state-of-the-art review. Pattern Anal Appl 5(1):46–76
Suri JS, Singh S, Reden L (2002) Computer vision and pattern recognition techniques for 2-D and 3-D MR cerebral cortical segmentation (part i): a state-of-the-art review. Pattern Anal Appl 5(1):77–98
Liang F, Mukherjee S, West M (2007) The use of unlabeled data in predictive modeling. Stat Sci 22(2):189–205
Zhu S (2003) Statistical modeling and conceptualization of visual patterns. IEEE Trans Pattern Anal Mach Intell 25(6):691–712
German S, German D (1984) Stochastic relaxation, gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–742
McInerney T, Terzopoulos D (1996) Deformable models in medical image analysis: a survey. Med Image Anal 1(2):91–108
Xu C, Prince JL (1998) Snakes, shapes and gradient vector flow. IEEE Trans Image Process 7(3):359–369
Malladi R, Sethian J, Vemuri B (1995) Shape modeling with front propagation: a level set approach. IEEE Trans Pattern Anal Mach Intell 17(2):158–175
Boykov Y, Jolly MP (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: Proceedings of IEEE international conference on computer vision (ICCV), vol I. IEEE Computer Society, Vancouver, B.C., Canada, pp 105–112
Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239
Li Y, Sun J, Tang CK, Shum HY (2004) Lazy snapping. ACM Trans Graph 23(3):303–308
Rother C, Kolmogorov V, Blake A (2004) “Grab cut”: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314
WU Q, Dou W, Chen Y, Constans J (2005) Fuzzy segementaion of cerebral tumorous tissues in MR images via support vector machine and fuzzy clustering. In: Proceedings of world congress of International Fuzzy Systems Association (IFSA). Tsinghua University Press, Beijing
Ulusoy I, Bishop C (2005) Generative versus discriminative methods for object recognition. In: Proceedings of computer vision and pattern recognition (CVPR), vol 2, pp 258–265
Abrahamsen P (1997) A review of Gaussian random fields and correlation functions, 2nd edn. Technical report 917, Norwegian Computing Center
Neal RM (1997) Monte carlo implementation of Gaussian process models for Bayesian regression and classification. Technical report CRG-TR-97-2, Department of Computer Science, University of Toronto. http://www.cs.toronto.edu/radford/papers-online.html
Williams C, Barber D (1998) Bayesian classification with Gaussian processes. IEEE Trans Pattern Anal Mach Intell 20(12):1342–1351
MacKay DJC (1998). In: Introduction to Gaussian processes. NATO ASI, vol 168. Springer, Berlin, pp 133–165
Chung F (1997) Spectral graph theory. Number 92 in CBMS regional conference series in mathematics. American Mathematical Society, Providence
Seeger M (1999) Relationships between Gaussian processes, support vector machines and smoothing splines. Technical report, Institute for ANC, Edinburgh, UK. http://www.dai.ed.ac.uk/seeger/papers.html
Williams CKI, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 682–688
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Press W, Teukolsky S, Vetterling W, Flannery B (1992) Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge
Dou W, Ruan S, Chen Y, Bloyet D, Constans JM (2007) A framework of fuzzy information fusion for the segmentation of brain tumor tissues on mr images. Image Vis Comput 25(2):164–171
Dou W, Ren Y, Wu Q, Ruan S, Chen Y, Bloyet D, Constans JM (2007) Fuzzy kappa for the agreement measure of fuzzy classifications. Neurocomputing 70(4-6):726–734
Tao D, Li X, Wu X, Maybank SJ (2007) General tensor discriminant analysis and gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell 29(10):1700–1715
Tao D, Li X, Hu W, Maybank SJ, Wu X (2007) Supervised tensor learning. Knowl Inf Syst 13(1):1–42
Lawrence ND, Jordan MI (2005) Semi-supervised learning via Gaussian processes. In: Proceedings of advances in neural information processing systems (NIPS 17). MIT Press, Cambridge, pp 753–760
Acknowledgments
This work is funded by the Basic Research Foundation of Tsinghua National Laboratory for Information Science and Technology (TNList). We would like to thank the anonymous reviewers for their valuable suggestions. We would also like to give special thanks to Qian Wu, Weibei Dou and Yonglei Zhou for providing us their detailed experimental data and code.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Explanation of EBM
For the semi-supervised problem, we set labels of the unlabeled data to zeros initially. Thus, if t i = 0, the probability P(t i = 0|y) ≡ λ. The factor λ makes the function P(t i |y i ) with respect to t i be a probability, which means P(t i = 1|y i ) + P(t i = −1|y i ) + P(t i = 0|y i ) ≡ 1. As Fig. 7 shows, this model can be considered as a degenerated ordered category model (OCM) [71], where the variance of the probability P(t i = 0|y i ) is infinite. We define the margin as the range where P(t i = 0|y) is larger than P(t i = 1|y) and P(t i = −1|y). In the margin the difference between P(t i = 1|y) and P(t i = −1|y) is smaller than the difference outside the margin. Therefore, the margin of EBM represents the more uncertain labels. The parameter λ controls this margin.
Moreover, Fig. 7b–d show the relationship between the prior and the posterior probability of latent variable. In GP and GRF, we can assume that each latent variable is also conditionally Gaussian:
where μ i and σ i are related to the input points and labels (see the graphical model in Fig. 1b).
As Fig. 7b, c shows, for t i = 1 and t i = −1, the mean and the variance of posterior P(y i |t i = 1) and P(y i |t i = −1) are related to the likelihood P(t i |y i ) and the prior P(y i ). If μ i is near zero, the posterior of latent variable y i is affected by the label t i . It will have a positive estimated y i when the label is 1, and negative y i when the label is −1.
If the label is t i = 0, due to the Bayesian formulation, we have P(t i |y i )P(y i ) = P(y i |t i )P(t i ). The probabilities P(t i ) and P(t i |y i ) are both constants for t i = 0, so the posterior probability P(y i |t i = 0) only depends on the prior P(y i ). If μ i is still near zero, we will get a zero estimated y i by maximizing the posterior probability. This is why we choose a graph regularization-based prior. As mentioned earlier, each covariance between two points of this prior is related to all the training data. Thus, if there are a small amount of labeled data in the training set, μ i of an unlabeled x i will be affected by the labeled data more than the one choosing the traditional prior. Then, μ i is non-zero, which will lead to a non-zero estimated y i (shown in Fig. 7 c).
Furthermore, by comparing Fig. 7c and d, we can see that the margins do not affect the estimation of the latent variable. The estimated y i remains the same in spite of different margin models being imposed on the process y → t. However, any point whose latent variable y i that falls inside the margin will be labeled zero, which makes it remains unlabeled. This kind of points does not contribute to the prediction function (see in the prediction phase). Therefore, the classification boundary will be changed.
Appendix B: Derivation of hyper-parameter estimation
The hyper-parameters are the standard deviations Θ = {σ c , σ p } of the exponential weight, which is shown in (19). We take σ c as an example. The hyper-parameter is estimated by gradient search, which minimizes the negative logarithmic likelihood (10). According to Eqs. 5 and 8 by using the fact ∇Ψ = 0, and taking derivatives on both sides, we have
And according to Eq. 4, we have
where
Furthermore, differences of K N and \({{\mathbf{K}}}_N{\varvec{\Uppi}}_N\) are given by
and
where
Therefore, the derivative of objective function can be given by (11).
Appendix C: Derivation of prediction function
For predicting new test points, we first distinguish between different sizes of covariance matrix K with a subscript, such that: (1) K N is the covariance matrix of the N input training data, (2) K N+1 is (N + 1) × (N + 1) covariance matrix of the vector (y T N ,y N+1)T, where y N+1 is the latent variable of a new given point in the test set. However, it is difficult to compute K N+1 by K N in an explicit expression, since K N+1 itself depends on all the training data and each new point.Footnote 3 We make use of \({\mathbf{k}}_i = W_{N+1,i} = \exp (-{\|{\mathbf{x}}_{N+1}-{\mathbf{x}}_{i} \|^2} / {2\sigma}^2 )\) as the covariance between a new given point and the ith training point. Therefore, K N+1 can be given by
where ν is a scale factor to make k compatible with K N . Note that the covariance matrix K N depends on the global distance information, while the covariances of new point and the training points are only depend on the local distance information.
Then, K −1 N+1 is given by
By using the partitioned inverse equations [61], we have
Since the optimal \({\hat{{\mathbf{y}}}}_N\) has been estimated, we only need to minimize (15) with respect to y N+1: μ y N+1 + m T y N = 0, which leads to
Here the scale factor ν can be omitted.
Appendix D: Explanation of the decomposition
The authors in [42] proved that the weight matrix W can be approximately decomposed as
where
A is the sub-block of weight matrix generated by the sampling points, B is the weight matrix between the sampling points and the rest. We assume that W is denote by \({{\mathbf{W}}} = \left[ \begin{array}{*{20}l}{{\mathbf{A}}} & {{\mathbf{B}}} \\ {{\mathbf{B}}}^{\rm T} & {{\mathbf{C}}}\\\end{array} \right],\) where C is the weight matrix of the rest points. The matrix U S and \({\varvec{\Uplambda}}_{{\mathbf{S}}}\) is the eigenvectors and values of the matrix \({{\mathbf{S}}} = {{\mathbf{A}}} + {{\mathbf{A}}}^{-1/2} {{\mathbf{B}}} {{\mathbf{B}}}^{\rm T} {{\mathbf{A}}}^{-1/2} = {{\mathbf{U}}}_{{\mathbf{S}}} {\varvec{\Uplambda}}_{{\mathbf{S}}} {{\mathbf{U}}}_ {{\mathbf{S}}}^{\rm T},\) where A −1/2 denotes the symmetric positive definite square rootFootnote 4 of A. Then we can find that the constraint V V T = I is satisfied automatically and the approximated weight matrix is given by
The difference of C and B T A −1 B is just the Schur complement. In addition, for the purpose of our accelerated algorithm, we need to approximate the matrix \({\hat{{\mathbf{D}}}}^{- \frac{1}{2}} {\hat{{\mathbf{W}}}} {\hat{{\mathbf{D}}}}^{- \frac{1}{2}}.\) This has been exploited by [42], which replace the matrix A and B as
The vector \({\hat{{\mathbf{d}}}}\) can be evaluated by
where 1 is the column vector of ones. Thus, we can rewrite the decomposition of \({\hat{{\mathbf{D}}}}^{- \frac{1}{2}} {\hat{{\mathbf{W}}}} {\hat{{\mathbf{D}}}}^{- \frac{1}{2}}\) as \({{\mathbf{U}}}{\varvec{\Uplambda}}{{\mathbf{U}}}^{\rm T},\) where \({\varvec{\Uplambda}}\) is an M × M matrix.
Rights and permissions
About this article
Cite this article
Song, Y., Zhang, C., Lee, J. et al. Semi-supervised discriminative classification with application to tumorous tissues segmentation of MR brain images. Pattern Anal Applic 12, 99–115 (2009). https://doi.org/10.1007/s10044-008-0104-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-008-0104-3