Semi-supervised discriminative classification with application to tumorous tissues segmentation of MR brain images

Song, Yangqiu; Zhang, Changshui; Lee, Jianguo; Wang, Fei; Xiang, Shiming; Zhang, Dan

doi:10.1007/s10044-008-0104-3

Semi-supervised discriminative classification with application to tumorous tissues segmentation of MR brain images

Theoretical Advances
Published: 22 February 2008

Volume 12, pages 99–115, (2009)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Yangqiu Song¹,
Changshui Zhang¹,
Jianguo Lee¹,
Fei Wang¹,
Shiming Xiang¹ &
…
Dan Zhang¹

362 Accesses
Explore all metrics

Abstract

Due to the large data size of 3D MR brain images and the blurry boundary of the pathological tissues, tumor segmentation work is difficult. This paper introduces a discriminative classification algorithm for semi-automated segmentation of brain tumorous tissues. The classifier uses interactive hints to obtain models to classify normal and tumor tissues. A non-parametric Bayesian Gaussian random field in the semi-supervised mode is implemented. Our approach uses both labeled data and a subset of unlabeled data sampling from 2D/3D images for training the model. Fast algorithm is also developed. Experiments show that our approach produces satisfactory segmentation results comparing to the manually labeled results by experts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of Classifiers for Brain Tumor Segmentation

GLISTRboost: Combining Multimodal MRI Segmentation, Registration, and Biophysical Tumor Growth Modeling with Gradient Boosting Machines for Glioma Segmentation

Automated Brain Tumor Detection Using Discriminative Clustering Based MRI Segmentation

Notes

Semi-supervised methods could be either transductive or inductive [32, 33]. While a transductive method only works on the observed labeled and unlabeled training data, the inductive methods can naturally handle the unseen data that are not in training set [33].
For the positive semi-definite case, we can add extra regularization as the jitter noise [59].
Namely, if we want to induce ${\varvec{\Updelta}}_{N+1}$ from ${\varvec{\Updelta}}_N$ directly, it need compute D _ii of each new give point. This is very time consuming.
The weight matrix is near semi-positive definite, so we use the pseudo-inverse or add the extra regularization to find the square root of A in practice.

References

Song Y, Zhang C, Lee J, Wang F (2006) A discriminative method for semi-automated tumorous tissues segmentation of MR brain images. In: Proceedings of CVPR workshop on mathematical methods in biomedical image analysis (MMBIA). p 79
Pham DL, Xu C, Prince JL (2000) Current methods in medical image segmentation. Annu Rev Biomed Eng 2:315–337
Article Google Scholar
Liew AWC, Yan H (2006) Current methods in the automatic tissue segmentation of 3D magnetic resonance brain images. Curr Med Imaging Rev 2(1):91–103
Article Google Scholar
Leemput KV, Maes F, Vandermeulen D, Suetens P (1999) Automated model-based tissue classification of MR images of the brain. IEEE Trans Med Imaging 18(10):897–908
Article Google Scholar
Pham D, Prince J (1999) Adaptive fuzzy segmentation of magnetic resonance images. IEEE Trans Med Imaging 18(9):737–752
Article Google Scholar
Zhang Y, Brady M, Smith SM (2001) Segmentation of brain MR images through a hidden Markov random field model and the expectation maximization algorithm. IEEE Trans Med Imaging 20(1):45–57
Article Google Scholar
Marroquín JL, Vemuri BC, Botello S, Calderón F, Fernández-Bouzas A (2002) An accurate and efficient Bayesian method for automatic segmentation of brain MRI. IEEE Trans Med Imaging 21(8):934–945
Article Google Scholar
Liew AWC, Yan H (2003) An adaptive spatial fuzzy clustering algorithm for 3d MR image segmentation. IEEE Trans Med Imaging 22(9):1063–1075
Article Google Scholar
Prastawa M, Gilmore JH, Lin W, Gerig G (2004) Automatic segmentation of neonatal brain MRI. In: Proceedings of medical image computing and computer-assisted intervention (MICCAI). pp 10–17
Hall L, Bensaid A, Clarke L, Velthuizen R, Silbiger M, Bezdek J (1992) A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Trans Med Imaging 3(5):672–682
Google Scholar
Sammouda R, Niki N, Nishitani H (1996) A comparison of Hopfield neural network and Boltzmann machine in segmenting MR images of the brain. IEEE Trans Nucl Sci 43(6):3361–3369
Article Google Scholar
Zhou J, Chan KL, Chongand VFH, Krishnan SM (2005) Extraction of brain tumor from MR images using one-class support vector machine. In: Proceedings of 27th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBS). pp 6411–6414
Moon N, Bullitt E, Leemput KV, Gerig G (2002) Automatic brain and tumor segmentation. In: Proceedings of 5th international conference on medical image computing and computer-assisted intervention (MICCAI). pp 372–379
Shen S, Sandham W, Granat M, Sterr A (2005) MRI fuzzy segmentation of brain tissue using neighborhood attraction with neural-network optimization. IEEE Trans Med Imaging 9(3):459–467
Google Scholar
Li C, Goldgof D, Hall L (1993) Knowledge-based classification and tissue labeling of MR images of human brain. IEEE Trans Med Imaging 12(4):740–750
Article Google Scholar
Clark M, Hall L, Goldgof D, Velthuizen R, Murtagh F, Silbiger M (1998) Automatic tumor segmentation using knowledge-based techniques. IEEE Trans Med Imaging 17(2):187–201
Article Google Scholar
Cuadra M, Pollo C, Bardera A, Cuisenaire O, Villemure JG, Thiran JP (2004) Atlas-based segmentation of pathological MR brain images using a model of lesion growth. IEEE Trans Med Imaging 23(10):1301–1314
Article Google Scholar
Zhu Y, Yan Z (1997) Computerized tumor boundary detection using a hopfield neural network. IEEE Trans Med Imaging 16(1):55–67
Article Google Scholar
Droske M, Meyer B, Rumpf M, Schaller C (2001) An adaptive level set method for medical image segmentation. In: Proceedings of 17th international conference information processing in medical imaging (IPMI). Davis, CA, USA, pp 416–422
Lefohn AE, Cates JE, Whitaker RT (2003) Interactive, GPU-based level sets for 3D segmentation. In: Proceedings of medical image computing and computer-assisted intervention (MICCAI). Springer, Montreal, QC, Canada, pp 564–572
Prastawa M, Bullitt E, Ho S, Gerig G (2004) Robust estimation for brain tumor segmentation. In: Proceedings of medical image computing and computer-assisted intervention (MICCAI), pp 10–17
Guermeur Y (2002) Combining discriminant models with new multi-class SVMs. Pattern Anal Appl 5(2):168–179
Article MATH MathSciNet Google Scholar
Tortorella F (2004) Reducing the classification cost of support vector classifiers through an ROC-based reject rule. Pattern Anal Appl 7(2):128–143
MathSciNet Google Scholar
Debnath R, Takahide N, Takahashi H (2004) A decision based one-against-one method for multi-class support vector machine. Pattern Anal Appl 7(2):164–175
MathSciNet Google Scholar
Sánchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10(3):189–201
Article MathSciNet Google Scholar
Abe S (2007) Sparse least squares support vector training in the reduced empirical feature space. Pattern Anal Appl 10(3):203–214
Article MathSciNet Google Scholar
Herrero JR, Navarro JJ (2007) Exploiting computer resources for fast nearest neighbor classification. Pattern Anal Appl 10(4):265–275
Article MathSciNet Google Scholar
Tyree EW, Long JA (1998) A monte carlo evaluation of the moving method, k-means and two self-organising neural networks. Pattern Anal Appl 1(2):79–90
Article MATH Google Scholar
Chou CH, Su MC, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7(2):205–220
MathSciNet Google Scholar
Frigui H (2005) Unsupervised learning of arbitrarily shaped clusters using ensembles of gaussian models. Pattern Anal Appl 8(1-2):32–49
Article MathSciNet Google Scholar
Omran MGH, Salman A, Engelbrecht AP (2006) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332–344
Article MathSciNet Google Scholar
Seeger M (2001) Learning with labeled and unlabeled data. Technical report, Institute for ANC, Edinburgh, UK. http://www.dai.ed.ac.uk/seeger/papers.html
Zhu X (2005) Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Belkin M, Niyogi P (2003) Using manifold structure for partially labeled classification. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 929–936
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 1(1):1–48
MathSciNet Google Scholar
Krishnapuram B, Williams D, Xue Y, Hartemink A, Carin L, Figueiredo M (2005) On semi-supervised classification. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 721–728
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 321–328
Zhou D, Schölkopf B (2005) Regularization on discrete spaces. In: Proceedings of pattern recognition, 27th DAGM symposium (DAGM-symposium). Lecture notes in computer science. Springer, Vienna, pp 361–368
Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of twentieth international conference of machine learning (ICML). AAAI Press, Washington, DC, USA, pp 912–919
Zhu X, Lafferty J, Ghahramani Z (2003) Semi-supervised learning: from Gaussian fields to Gaussian processes. Technical report CMU-CS-03-175, Computer Sciences, Carnegie Mellon University. http://www.cs.cmu.edu/zhuxj/publications.html
Sindhwani V, Chu W, Keerthi SS (2007) Semi-supervised Gaussian process classifiers. In: Proceedings of international joint conferences on artificial intelligence (IJCAI), pp 1059–1064
Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the Nyström method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225
Article Google Scholar
Grady L, Funka-Lea G (2004) Multi-label image segmentation for medical applications based on graph-theoretic electrical potentials. In: Proceedings of ECCV workshops on CVAMIA and MMBIA, pp 230–245
Suri JS, Singh S, Reden L (2002) Computer vision and pattern recognition techniques for 2-D and 3-D MR cerebral cortical segmentation (part i): a state-of-the-art review. Pattern Anal Appl 5(1):46–76
Article MathSciNet Google Scholar
Suri JS, Singh S, Reden L (2002) Computer vision and pattern recognition techniques for 2-D and 3-D MR cerebral cortical segmentation (part i): a state-of-the-art review. Pattern Anal Appl 5(1):77–98
Article MathSciNet Google Scholar
Liang F, Mukherjee S, West M (2007) The use of unlabeled data in predictive modeling. Stat Sci 22(2):189–205
Google Scholar
Zhu S (2003) Statistical modeling and conceptualization of visual patterns. IEEE Trans Pattern Anal Mach Intell 25(6):691–712
Article Google Scholar
German S, German D (1984) Stochastic relaxation, gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–742
Article Google Scholar
McInerney T, Terzopoulos D (1996) Deformable models in medical image analysis: a survey. Med Image Anal 1(2):91–108
Article Google Scholar
Xu C, Prince JL (1998) Snakes, shapes and gradient vector flow. IEEE Trans Image Process 7(3):359–369
Article MATH MathSciNet Google Scholar
Malladi R, Sethian J, Vemuri B (1995) Shape modeling with front propagation: a level set approach. IEEE Trans Pattern Anal Mach Intell 17(2):158–175
Article Google Scholar
Boykov Y, Jolly MP (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: Proceedings of IEEE international conference on computer vision (ICCV), vol I. IEEE Computer Society, Vancouver, B.C., Canada, pp 105–112
Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239
Article Google Scholar
Li Y, Sun J, Tang CK, Shum HY (2004) Lazy snapping. ACM Trans Graph 23(3):303–308
Article Google Scholar
Rother C, Kolmogorov V, Blake A (2004) “Grab cut”: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314
Article Google Scholar
WU Q, Dou W, Chen Y, Constans J (2005) Fuzzy segementaion of cerebral tumorous tissues in MR images via support vector machine and fuzzy clustering. In: Proceedings of world congress of International Fuzzy Systems Association (IFSA). Tsinghua University Press, Beijing
Ulusoy I, Bishop C (2005) Generative versus discriminative methods for object recognition. In: Proceedings of computer vision and pattern recognition (CVPR), vol 2, pp 258–265
Abrahamsen P (1997) A review of Gaussian random fields and correlation functions, 2nd edn. Technical report 917, Norwegian Computing Center
Neal RM (1997) Monte carlo implementation of Gaussian process models for Bayesian regression and classification. Technical report CRG-TR-97-2, Department of Computer Science, University of Toronto. http://www.cs.toronto.edu/radford/papers-online.html
Williams C, Barber D (1998) Bayesian classification with Gaussian processes. IEEE Trans Pattern Anal Mach Intell 20(12):1342–1351
Article Google Scholar
MacKay DJC (1998). In: Introduction to Gaussian processes. NATO ASI, vol 168. Springer, Berlin, pp 133–165
Chung F (1997) Spectral graph theory. Number 92 in CBMS regional conference series in mathematics. American Mathematical Society, Providence
Seeger M (1999) Relationships between Gaussian processes, support vector machines and smoothing splines. Technical report, Institute for ANC, Edinburgh, UK. http://www.dai.ed.ac.uk/seeger/papers.html
Williams CKI, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 682–688
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Press W, Teukolsky S, Vetterling W, Flannery B (1992) Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge
MATH Google Scholar
Dou W, Ruan S, Chen Y, Bloyet D, Constans JM (2007) A framework of fuzzy information fusion for the segmentation of brain tumor tissues on mr images. Image Vis Comput 25(2):164–171
Article Google Scholar
Dou W, Ren Y, Wu Q, Ruan S, Chen Y, Bloyet D, Constans JM (2007) Fuzzy kappa for the agreement measure of fuzzy classifications. Neurocomputing 70(4-6):726–734
Google Scholar
Tao D, Li X, Wu X, Maybank SJ (2007) General tensor discriminant analysis and gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell 29(10):1700–1715
Article Google Scholar
Tao D, Li X, Hu W, Maybank SJ, Wu X (2007) Supervised tensor learning. Knowl Inf Syst 13(1):1–42
Google Scholar
Lawrence ND, Jordan MI (2005) Semi-supervised learning via Gaussian processes. In: Proceedings of advances in neural information processing systems (NIPS 17). MIT Press, Cambridge, pp 753–760

Download references

Acknowledgments

This work is funded by the Basic Research Foundation of Tsinghua National Laboratory for Information Science and Technology (TNList). We would like to thank the anonymous reviewers for their valuable suggestions. We would also like to give special thanks to Qian Wu, Weibei Dou and Yonglei Zhou for providing us their detailed experimental data and code.

Author information

Authors and Affiliations

State Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Automation, Tsinghua University, Beijing, 100084, China
Yangqiu Song, Changshui Zhang, Jianguo Lee, Fei Wang, Shiming Xiang & Dan Zhang

Authors

Yangqiu Song
View author publications
You can also search for this author in PubMed Google Scholar
Changshui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Lee
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shiming Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Dan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yangqiu Song.

Appendices

Appendix A: Explanation of EBM

For the semi-supervised problem, we set labels of the unlabeled data to zeros initially. Thus, if t _i = 0, the probability P(t _i = 0|y) ≡ λ. The factor λ makes the function P(t _i|y _i) with respect to t _i be a probability, which means P(t _i = 1|y _i) + P(t _i = −1|y _i) + P(t _i = 0|y _i) ≡ 1. As Fig. 7 shows, this model can be considered as a degenerated ordered category model (OCM) [71], where the variance of the probability P(t _i = 0|y _i) is infinite. We define the margin as the range where P(t _i = 0|y) is larger than P(t _i = 1|y) and P(t _i = −1|y). In the margin the difference between P(t _i = 1|y) and P(t _i = −1|y) is smaller than the difference outside the margin. Therefore, the margin of EBM represents the more uncertain labels. The parameter λ controls this margin.

Moreover, Fig. 7b–d show the relationship between the prior and the posterior probability of latent variable. In GP and GRF, we can assume that each latent variable is also conditionally Gaussian:

$$ P(y_i\vert{{\mathbf{y}}}_{N-\{i\}}, {{\mathbf{X}}}_N) = N(\mu_i, \sigma_i){\mathop=\limits^\Delta} P(y_i) $$

(21)

where μ _i and σ _i are related to the input points and labels (see the graphical model in Fig. 1b).

As Fig. 7b, c shows, for t _i = 1 and t _i = −1, the mean and the variance of posterior P(y _i|t _i = 1) and P(y _i|t _i = −1) are related to the likelihood P(t _i|y _i) and the prior P(y _i). If μ _i is near zero, the posterior of latent variable y _i is affected by the label t _i. It will have a positive estimated y _i when the label is 1, and negative y _i when the label is −1.

If the label is t _i = 0, due to the Bayesian formulation, we have P(t _i|y _i)P(y _i) = P(y _i|t _i)P(t _i). The probabilities P(t _i) and P(t _i|y _i) are both constants for t _i = 0, so the posterior probability P(y _i|t _i = 0) only depends on the prior P(y _i). If μ _i is still near zero, we will get a zero estimated y _i by maximizing the posterior probability. This is why we choose a graph regularization-based prior. As mentioned earlier, each covariance between two points of this prior is related to all the training data. Thus, if there are a small amount of labeled data in the training set, μ _i of an unlabeled x _i will be affected by the labeled data more than the one choosing the traditional prior. Then, μ _i is non-zero, which will lead to a non-zero estimated y _i (shown in Fig. 7 c).

Furthermore, by comparing Fig. 7c and d, we can see that the margins do not affect the estimation of the latent variable. The estimated y _i remains the same in spite of different margin models being imposed on the process y → t. However, any point whose latent variable y _i that falls inside the margin will be labeled zero, which makes it remains unlabeled. This kind of points does not contribute to the prediction function (see in the prediction phase). Therefore, the classification boundary will be changed.

Appendix B: Derivation of hyper-parameter estimation

The hyper-parameters are the standard deviations Θ = {σ _c, σ _p } of the exponential weight, which is shown in (19). We take σ _c as an example. The hyper-parameter is estimated by gradient search, which minimizes the negative logarithmic likelihood (10). According to Eqs. 5 and 8 by using the fact ∇Ψ = 0, and taking derivatives on both sides, we have

$$ \frac{\partial{{\mathbf{y}}}_N}{\partial \sigma_c} = ({{{\mathbf{I}}} + {{\mathbf{K}}}_N{\varvec{\Uppi}}_N})^{- 1}\frac{\partial{{\mathbf{K}}}_N} {\partial \sigma_c}(- {\varvec{\upalpha}}_N) $$

(22)

And according to Eq. 4, we have

$$ \begin{aligned} \frac{\partial {{\mathbf{K}}}_N^{- 1}}{\partial \sigma_c} &= - {{\mathbf{D}}}^{- \frac{1}{2}}\frac{\partial{{\mathbf{W}}}}{\partial \sigma_c} {{\mathbf{D}}}^{- \frac{1}{2}} \\ &\quad + {{\mathbf{D}}}^{- 1}\left(\frac{\partial \sqrt{{\mathbf{D}}}} {\partial \sigma_c}{{\mathbf{W}}}\sqrt{{\mathbf{D}}} + \sqrt{{\mathbf{D}}}{{\mathbf{W}}} \frac{\partial \sqrt{{\mathbf{D}}} }{\partial \sigma_c}\right){{\mathbf{D}}}^{- 1} \end{aligned} $$

(23)

where

$$ \begin{aligned} \frac{\partial{{\mathbf{W}}}_{ij}}{\partial \sigma_c} &= {{\mathbf{W}}}_{ij} {\left\|{{{\mathbf{x}}}_c(i) - {{\mathbf{x}}}_c(j)} \right\|^2} / {\sigma_c^3}\\ \frac{\partial \sqrt{{{\mathbf{D}}}_{ii}}}{\partial \sigma_c}& = \frac{1}{2\sqrt{{{\mathbf{ D}}}_{ii}}}\sum_k {{{\mathbf{W}}}_{ik} {\left\|{{{\mathbf{x}}}_c(i) - {{\mathbf{x}}}_c(k)} \right\|^2}} / {\sigma_c^3}\\ \end{aligned}$$

(24)

Furthermore, differences of K _N and ${{\mathbf{K}}}_N{\varvec{\Uppi}}_N$ are given by

$$ \frac{\partial {{\mathbf{K}}}_N}{\partial \sigma_c} = - {{\mathbf{K}}}_N\frac{\partial{{\mathbf{K}}}_N^{- 1}}{\partial \sigma_c}{{\mathbf{K}}}_N $$

(25)

and

$$ \frac{\partial{{\mathbf{K}}}_N{\varvec{\Uppi}}_N}{\partial \sigma_c} = \frac{\partial{{\mathbf{K}}}_N}{\partial \sigma_c} {\varvec{\Uppi}}_N + {{\mathbf{K}}}_N\frac{\partial {\varvec{\Uppi}}_N}{\partial \sigma_c} $$

(26)

where

$$ \begin{aligned} \frac{\partial {\varvec{\Uppi}}_N}{\partial \sigma_c}& = \frac{\partial {\varvec{\Uppi}}_N} {\partial {{\mathbf{y}}}_N}\frac{\partial{{\mathbf{y}}}_N}{\partial \sigma_c} \\ &= {\rm diag}\left(\frac{t_i^3\exp (t_i y_i )\left( {1 - \exp (t_i y_i )} \right)}{\left( {1 + \exp (t_i y_i )} \right)^3} \right) {\rm diag}{\left(\frac{\partial{{\mathbf{y}}}_N}{\partial \sigma_c }\right)} \end{aligned} $$

(27)

Therefore, the derivative of objective function can be given by (11).

Appendix C: Derivation of prediction function

For predicting new test points, we first distinguish between different sizes of covariance matrix K with a subscript, such that: (1) K _N is the covariance matrix of the N input training data, (2) K _N+1 is (N + 1) × (N + 1) covariance matrix of the vector (y ^T_N ,y _N+1)^T, where y _N+1 is the latent variable of a new given point in the test set. However, it is difficult to compute K _N+1 by K _N in an explicit expression, since K _N+1 itself depends on all the training data and each new point.^{Footnote 3} We make use of ${\mathbf{k}}_i = W_{N+1,i} = \exp (-{\|{\mathbf{x}}_{N+1}-{\mathbf{x}}_{i} \|^2} / {2\sigma}^2 )$ as the covariance between a new given point and the ith training point. Therefore, K _N+1 can be given by

$$ {{\mathbf{K}}}_{N+1}= \left[\begin{array}{*{20}l} {{\mathbf{K}}}_N & \nu{{\mathbf{k}}}\\ \nu{{\mathbf{k}}}^{\rm T} & k_*\\\end{array}\right] $$

(28)

where ν is a scale factor to make k compatible with K _N. Note that the covariance matrix K _N depends on the global distance information, while the covariances of new point and the training points are only depend on the local distance information.

Then, K ⁻¹_N+1 is given by

$$ {{\mathbf{K}}}_{N+1}^{-1}= \left[\begin{array}{*{20}l} {{\mathbf{M}}} & {{\mathbf{m}}}\\ {{\mathbf{m}}}^{\rm T} & \mu\\ \end{array}\right] $$

(29)

By using the partitioned inverse equations [61], we have

$$ \begin{aligned} \mu &= (k_* - \nu^2 {{\mathbf{k}}}^{\rm T} {{\mathbf{K}}}_N^{-1} {{\mathbf{k}}})^{-1}\\ {{\mathbf{m}}} &= -\mu \nu {{\mathbf{K}}}_N^{-1} {{\mathbf{k}}} \\ {{\mathbf{M}}} &= {{\mathbf{K}}}_N^{-1} + \frac{1}{\mu} {{\mathbf{m}}} {{\mathbf{m}}}^{\rm T}\\ \end{aligned} $$

(30)

Since the optimal ${\hat{{\mathbf{y}}}}_N$ has been estimated, we only need to minimize (15) with respect to y _N+1: μ y _N+1 + m ^T y _N = 0, which leads to

$$ {\hat{y}}_{N + 1} = {{\mathbf{k}}}^{\rm T}{{{\mathbf{K}}}_N}^{- 1}{\hat{{\mathbf{y}}}}_N = {{\mathbf{k}}}^{\rm T}({{\mathbf{I - S}}}){\hat{{\mathbf{y}}}}_N. $$

(31)

Here the scale factor ν can be omitted.

Appendix D: Explanation of the decomposition

The authors in [42] proved that the weight matrix W can be approximately decomposed as

$$ {\hat{{\mathbf{W}}}} = {{\mathbf{V}}} {\varvec{\Uplambda}}_{{\mathbf{S}}} {{\mathbf{V}}}^{\rm T} $$

(32)

where

$$ {{\mathbf{V}}} = \left[\begin{array}{*{20}l} {{\mathbf{A}}} \\ {{\mathbf{B}}} \\\end{array} \right] {{\mathbf{A}}}^{-1/2} {{\mathbf{U}}}_{{\mathbf{S}}} {\varvec{\Uplambda}}_{{\mathbf{S}}}^{-1/2} $$

(33)

A is the sub-block of weight matrix generated by the sampling points, B is the weight matrix between the sampling points and the rest. We assume that W is denote by ${{\mathbf{W}}} = \left[ \begin{array}{*{20}l}{{\mathbf{A}}} & {{\mathbf{B}}} \\ {{\mathbf{B}}}^{\rm T} & {{\mathbf{C}}}\\\end{array} \right],$ where C is the weight matrix of the rest points. The matrix U _S and ${\varvec{\Uplambda}}_{{\mathbf{S}}}$ is the eigenvectors and values of the matrix ${{\mathbf{S}}} = {{\mathbf{A}}} + {{\mathbf{A}}}^{-1/2} {{\mathbf{B}}} {{\mathbf{B}}}^{\rm T} {{\mathbf{A}}}^{-1/2} = {{\mathbf{U}}}_{{\mathbf{S}}} {\varvec{\Uplambda}}_{{\mathbf{S}}} {{\mathbf{U}}}_ {{\mathbf{S}}}^{\rm T},$ where A ^−1/2 denotes the symmetric positive definite square root^{Footnote 4} of A. Then we can find that the constraint V V ^T = I is satisfied automatically and the approximated weight matrix is given by

$$ {\hat{{\mathbf{W}}}} = \left[\begin{array}{*{20}l} {{\mathbf{A}}} & {{\mathbf{B}}} \\ {{\mathbf{B}}}^{\rm T} & {{\mathbf{B^{\rm T} A^{-1}} \mathbf{B}}} \\ \end{array}\right] $$

(34)

The difference of C and B ^T A ⁻¹ B is just the Schur complement. In addition, for the purpose of our accelerated algorithm, we need to approximate the matrix ${\hat{{\mathbf{D}}}}^{- \frac{1}{2}} {\hat{{\mathbf{W}}}} {\hat{{\mathbf{D}}}}^{- \frac{1}{2}}.$ This has been exploited by [42], which replace the matrix A and B as

$$ \begin{aligned} {{\mathbf{A}}}_{ij} &\leftarrow\frac{{{\mathbf{A}}}_{ij}}{\sqrt{{\hat{{\mathbf{ d}}}_i} {\hat{{\mathbf{d}}}}_j}} \quad(i,j = 1,\ldots,n)\\ {{\mathbf{B}}}_{ij}& \leftarrow \frac{{{\mathbf{B}}}_{ij}}{\sqrt{\hat{{\mathbf{d}}}}_i {\hat{{\mathbf{d}}}}_{j+m}}\quad (i = 1,\ldots,n;j=1,\ldots,m) \end{aligned} $$

(35)

The vector ${\hat{{\mathbf{d}}}}$ can be evaluated by

$$ {\hat{{\mathbf{d}}}} = {\hat{{\mathbf{W}}}} {{\mathbf{1}}} = \left[\begin{array}{*{20}l} {{\mathbf{A}}}{{\mathbf{1}}}_M + {{\mathbf{B}}}{{\mathbf{1}}}_{N-M} \\ {{\mathbf{B}}}^T {{\mathbf{1}}}_M + {{\mathbf{B^T A^{-1}} \mathbf{B}}} {{\mathbf{1}}}_{N-M} \\ \end{array}\right] $$

(36)

where 1 is the column vector of ones. Thus, we can rewrite the decomposition of ${\hat{{\mathbf{D}}}}^{- \frac{1}{2}} {\hat{{\mathbf{W}}}} {\hat{{\mathbf{D}}}}^{- \frac{1}{2}}$ as ${{\mathbf{U}}}{\varvec{\Uplambda}}{{\mathbf{U}}}^{\rm T},$ where ${\varvec{\Uplambda}}$ is an M × M matrix.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, Y., Zhang, C., Lee, J. et al. Semi-supervised discriminative classification with application to tumorous tissues segmentation of MR brain images. Pattern Anal Applic 12, 99–115 (2009). https://doi.org/10.1007/s10044-008-0104-3

Download citation

Received: 23 March 2007
Accepted: 13 December 2007
Published: 22 February 2008
Issue Date: June 2009
DOI: https://doi.org/10.1007/s10044-008-0104-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-supervised discriminative classification with application to tumorous tissues segmentation of MR brain images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparison of Classifiers for Brain Tumor Segmentation

GLISTRboost: Combining Multimodal MRI Segmentation, Registration, and Biophysical Tumor Growth Modeling with Gradient Boosting Machines for Glioma Segmentation

Automated Brain Tumor Detection Using Discriminative Clustering Based MRI Segmentation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Explanation of EBM

Appendix B: Derivation of hyper-parameter estimation

Appendix C: Derivation of prediction function

Appendix D: Explanation of the decomposition

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Semi-supervised discriminative classification with application to tumorous tissues segmentation of MR brain images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparison of Classifiers for Brain Tumor Segmentation

GLISTRboost: Combining Multimodal MRI Segmentation, Registration, and Biophysical Tumor Growth Modeling with Gradient Boosting Machines for Glioma Segmentation

Automated Brain Tumor Detection Using Discriminative Clustering Based MRI Segmentation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Explanation of EBM

Appendix B: Derivation of hyper-parameter estimation

Appendix C: Derivation of prediction function

Appendix D: Explanation of the decomposition

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation