Skip to main content
Log in

Semi-supervised discriminative classification with application to tumorous tissues segmentation of MR brain images

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Due to the large data size of 3D MR brain images and the blurry boundary of the pathological tissues, tumor segmentation work is difficult. This paper introduces a discriminative classification algorithm for semi-automated segmentation of brain tumorous tissues. The classifier uses interactive hints to obtain models to classify normal and tumor tissues. A non-parametric Bayesian Gaussian random field in the semi-supervised mode is implemented. Our approach uses both labeled data and a subset of unlabeled data sampling from 2D/3D images for training the model. Fast algorithm is also developed. Experiments show that our approach produces satisfactory segmentation results comparing to the manually labeled results by experts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Semi-supervised methods could be either transductive or inductive [32, 33]. While a transductive method only works on the observed labeled and unlabeled training data, the inductive methods can naturally handle the unseen data that are not in training set [33].

  2. For the positive semi-definite case, we can add extra regularization as the jitter noise [59].

  3. Namely, if we want to induce \({\varvec{\Updelta}}_{N+1}\) from \({\varvec{\Updelta}}_N\) directly, it need compute D ii of each new give point. This is very time consuming.

  4. The weight matrix is near semi-positive definite, so we use the pseudo-inverse or add the extra regularization to find the square root of A in practice.

References

  1. Song Y, Zhang C, Lee J, Wang F (2006) A discriminative method for semi-automated tumorous tissues segmentation of MR brain images. In: Proceedings of CVPR workshop on mathematical methods in biomedical image analysis (MMBIA). p 79

  2. Pham DL, Xu C, Prince JL (2000) Current methods in medical image segmentation. Annu Rev Biomed Eng 2:315–337

    Article  Google Scholar 

  3. Liew AWC, Yan H (2006) Current methods in the automatic tissue segmentation of 3D magnetic resonance brain images. Curr Med Imaging Rev 2(1):91–103

    Article  Google Scholar 

  4. Leemput KV, Maes F, Vandermeulen D, Suetens P (1999) Automated model-based tissue classification of MR images of the brain. IEEE Trans Med Imaging 18(10):897–908

    Article  Google Scholar 

  5. Pham D, Prince J (1999) Adaptive fuzzy segmentation of magnetic resonance images. IEEE Trans Med Imaging 18(9):737–752

    Article  Google Scholar 

  6. Zhang Y, Brady M, Smith SM (2001) Segmentation of brain MR images through a hidden Markov random field model and the expectation maximization algorithm. IEEE Trans Med Imaging 20(1):45–57

    Article  Google Scholar 

  7. Marroquín JL, Vemuri BC, Botello S, Calderón F, Fernández-Bouzas A (2002) An accurate and efficient Bayesian method for automatic segmentation of brain MRI. IEEE Trans Med Imaging 21(8):934–945

    Article  Google Scholar 

  8. Liew AWC, Yan H (2003) An adaptive spatial fuzzy clustering algorithm for 3d MR image segmentation. IEEE Trans Med Imaging 22(9):1063–1075

    Article  Google Scholar 

  9. Prastawa M, Gilmore JH, Lin W, Gerig G (2004) Automatic segmentation of neonatal brain MRI. In: Proceedings of medical image computing and computer-assisted intervention (MICCAI). pp 10–17

  10. Hall L, Bensaid A, Clarke L, Velthuizen R, Silbiger M, Bezdek J (1992) A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Trans Med Imaging 3(5):672–682

    Google Scholar 

  11. Sammouda R, Niki N, Nishitani H (1996) A comparison of Hopfield neural network and Boltzmann machine in segmenting MR images of the brain. IEEE Trans Nucl Sci 43(6):3361–3369

    Article  Google Scholar 

  12. Zhou J, Chan KL, Chongand VFH, Krishnan SM (2005) Extraction of brain tumor from MR images using one-class support vector machine. In: Proceedings of 27th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBS). pp 6411–6414

  13. Moon N, Bullitt E, Leemput KV, Gerig G (2002) Automatic brain and tumor segmentation. In: Proceedings of 5th international conference on medical image computing and computer-assisted intervention (MICCAI). pp 372–379

  14. Shen S, Sandham W, Granat M, Sterr A (2005) MRI fuzzy segmentation of brain tissue using neighborhood attraction with neural-network optimization. IEEE Trans Med Imaging 9(3):459–467

    Google Scholar 

  15. Li C, Goldgof D, Hall L (1993) Knowledge-based classification and tissue labeling of MR images of human brain. IEEE Trans Med Imaging 12(4):740–750

    Article  Google Scholar 

  16. Clark M, Hall L, Goldgof D, Velthuizen R, Murtagh F, Silbiger M (1998) Automatic tumor segmentation using knowledge-based techniques. IEEE Trans Med Imaging 17(2):187–201

    Article  Google Scholar 

  17. Cuadra M, Pollo C, Bardera A, Cuisenaire O, Villemure JG, Thiran JP (2004) Atlas-based segmentation of pathological MR brain images using a model of lesion growth. IEEE Trans Med Imaging 23(10):1301–1314

    Article  Google Scholar 

  18. Zhu Y, Yan Z (1997) Computerized tumor boundary detection using a hopfield neural network. IEEE Trans Med Imaging 16(1):55–67

    Article  Google Scholar 

  19. Droske M, Meyer B, Rumpf M, Schaller C (2001) An adaptive level set method for medical image segmentation. In: Proceedings of 17th international conference information processing in medical imaging (IPMI). Davis, CA, USA, pp 416–422

  20. Lefohn AE, Cates JE, Whitaker RT (2003) Interactive, GPU-based level sets for 3D segmentation. In: Proceedings of medical image computing and computer-assisted intervention (MICCAI). Springer, Montreal, QC, Canada, pp 564–572

  21. Prastawa M, Bullitt E, Ho S, Gerig G (2004) Robust estimation for brain tumor segmentation. In: Proceedings of medical image computing and computer-assisted intervention (MICCAI), pp 10–17

  22. Guermeur Y (2002) Combining discriminant models with new multi-class SVMs. Pattern Anal Appl 5(2):168–179

    Article  MATH  MathSciNet  Google Scholar 

  23. Tortorella F (2004) Reducing the classification cost of support vector classifiers through an ROC-based reject rule. Pattern Anal Appl 7(2):128–143

    MathSciNet  Google Scholar 

  24. Debnath R, Takahide N, Takahashi H (2004) A decision based one-against-one method for multi-class support vector machine. Pattern Anal Appl 7(2):164–175

    MathSciNet  Google Scholar 

  25. Sánchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10(3):189–201

    Article  MathSciNet  Google Scholar 

  26. Abe S (2007) Sparse least squares support vector training in the reduced empirical feature space. Pattern Anal Appl 10(3):203–214

    Article  MathSciNet  Google Scholar 

  27. Herrero JR, Navarro JJ (2007) Exploiting computer resources for fast nearest neighbor classification. Pattern Anal Appl 10(4):265–275

    Article  MathSciNet  Google Scholar 

  28. Tyree EW, Long JA (1998) A monte carlo evaluation of the moving method, k-means and two self-organising neural networks. Pattern Anal Appl 1(2):79–90

    Article  MATH  Google Scholar 

  29. Chou CH, Su MC, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7(2):205–220

    MathSciNet  Google Scholar 

  30. Frigui H (2005) Unsupervised learning of arbitrarily shaped clusters using ensembles of gaussian models. Pattern Anal Appl 8(1-2):32–49

    Article  MathSciNet  Google Scholar 

  31. Omran MGH, Salman A, Engelbrecht AP (2006) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332–344

    Article  MathSciNet  Google Scholar 

  32. Seeger M (2001) Learning with labeled and unlabeled data. Technical report, Institute for ANC, Edinburgh, UK. http://www.dai.ed.ac.uk/seeger/papers.html

  33. Zhu X (2005) Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf

  34. Belkin M, Niyogi P (2003) Using manifold structure for partially labeled classification. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 929–936

  35. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 1(1):1–48

    MathSciNet  Google Scholar 

  36. Krishnapuram B, Williams D, Xue Y, Hartemink A, Carin L, Figueiredo M (2005) On semi-supervised classification. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 721–728

  37. Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 321–328

  38. Zhou D, Schölkopf B (2005) Regularization on discrete spaces. In: Proceedings of pattern recognition, 27th DAGM symposium (DAGM-symposium). Lecture notes in computer science. Springer, Vienna, pp 361–368

  39. Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of twentieth international conference of machine learning (ICML). AAAI Press, Washington, DC, USA, pp 912–919

  40. Zhu X, Lafferty J, Ghahramani Z (2003) Semi-supervised learning: from Gaussian fields to Gaussian processes. Technical report CMU-CS-03-175, Computer Sciences, Carnegie Mellon University. http://www.cs.cmu.edu/zhuxj/publications.html

  41. Sindhwani V, Chu W, Keerthi SS (2007) Semi-supervised Gaussian process classifiers. In: Proceedings of international joint conferences on artificial intelligence (IJCAI), pp 1059–1064

  42. Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the Nyström method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225

    Article  Google Scholar 

  43. Grady L, Funka-Lea G (2004) Multi-label image segmentation for medical applications based on graph-theoretic electrical potentials. In: Proceedings of ECCV workshops on CVAMIA and MMBIA, pp 230–245

  44. Suri JS, Singh S, Reden L (2002) Computer vision and pattern recognition techniques for 2-D and 3-D MR cerebral cortical segmentation (part i): a state-of-the-art review. Pattern Anal Appl 5(1):46–76

    Article  MathSciNet  Google Scholar 

  45. Suri JS, Singh S, Reden L (2002) Computer vision and pattern recognition techniques for 2-D and 3-D MR cerebral cortical segmentation (part i): a state-of-the-art review. Pattern Anal Appl 5(1):77–98

    Article  MathSciNet  Google Scholar 

  46. Liang F, Mukherjee S, West M (2007) The use of unlabeled data in predictive modeling. Stat Sci 22(2):189–205

    Google Scholar 

  47. Zhu S (2003) Statistical modeling and conceptualization of visual patterns. IEEE Trans Pattern Anal Mach Intell 25(6):691–712

    Article  Google Scholar 

  48. German S, German D (1984) Stochastic relaxation, gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–742

    Article  Google Scholar 

  49. McInerney T, Terzopoulos D (1996) Deformable models in medical image analysis: a survey. Med Image Anal 1(2):91–108

    Article  Google Scholar 

  50. Xu C, Prince JL (1998) Snakes, shapes and gradient vector flow. IEEE Trans Image Process 7(3):359–369

    Article  MATH  MathSciNet  Google Scholar 

  51. Malladi R, Sethian J, Vemuri B (1995) Shape modeling with front propagation: a level set approach. IEEE Trans Pattern Anal Mach Intell 17(2):158–175

    Article  Google Scholar 

  52. Boykov Y, Jolly MP (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: Proceedings of IEEE international conference on computer vision (ICCV), vol I. IEEE Computer Society, Vancouver, B.C., Canada, pp 105–112

  53. Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239

    Article  Google Scholar 

  54. Li Y, Sun J, Tang CK, Shum HY (2004) Lazy snapping. ACM Trans Graph 23(3):303–308

    Article  Google Scholar 

  55. Rother C, Kolmogorov V, Blake A (2004) “Grab cut”: interactive foreground extraction using iterated graph cuts. ACM Trans Graph 23(3):309–314

    Article  Google Scholar 

  56. WU Q, Dou W, Chen Y, Constans J (2005) Fuzzy segementaion of cerebral tumorous tissues in MR images via support vector machine and fuzzy clustering. In: Proceedings of world congress of International Fuzzy Systems Association (IFSA). Tsinghua University Press, Beijing

  57. Ulusoy I, Bishop C (2005) Generative versus discriminative methods for object recognition. In: Proceedings of computer vision and pattern recognition (CVPR), vol 2, pp 258–265

  58. Abrahamsen P (1997) A review of Gaussian random fields and correlation functions, 2nd edn. Technical report 917, Norwegian Computing Center

  59. Neal RM (1997) Monte carlo implementation of Gaussian process models for Bayesian regression and classification. Technical report CRG-TR-97-2, Department of Computer Science, University of Toronto. http://www.cs.toronto.edu/radford/papers-online.html

  60. Williams C, Barber D (1998) Bayesian classification with Gaussian processes. IEEE Trans Pattern Anal Mach Intell 20(12):1342–1351

    Article  Google Scholar 

  61. MacKay DJC (1998). In: Introduction to Gaussian processes. NATO ASI, vol 168. Springer, Berlin, pp 133–165

  62. Chung F (1997) Spectral graph theory. Number 92 in CBMS regional conference series in mathematics. American Mathematical Society, Providence

  63. Seeger M (1999) Relationships between Gaussian processes, support vector machines and smoothing splines. Technical report, Institute for ANC, Edinburgh, UK. http://www.dai.ed.ac.uk/seeger/papers.html

  64. Williams CKI, Seeger M (2001) Using the Nyström method to speed up kernel machines. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 682–688

  65. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  66. Press W, Teukolsky S, Vetterling W, Flannery B (1992) Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  67. Dou W, Ruan S, Chen Y, Bloyet D, Constans JM (2007) A framework of fuzzy information fusion for the segmentation of brain tumor tissues on mr images. Image Vis Comput 25(2):164–171

    Article  Google Scholar 

  68. Dou W, Ren Y, Wu Q, Ruan S, Chen Y, Bloyet D, Constans JM (2007) Fuzzy kappa for the agreement measure of fuzzy classifications. Neurocomputing 70(4-6):726–734

    Google Scholar 

  69. Tao D, Li X, Wu X, Maybank SJ (2007) General tensor discriminant analysis and gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell 29(10):1700–1715

    Article  Google Scholar 

  70. Tao D, Li X, Hu W, Maybank SJ, Wu X (2007) Supervised tensor learning. Knowl Inf Syst 13(1):1–42

    Google Scholar 

  71. Lawrence ND, Jordan MI (2005) Semi-supervised learning via Gaussian processes. In: Proceedings of advances in neural information processing systems (NIPS 17). MIT Press, Cambridge, pp 753–760

Download references

Acknowledgments

This work is funded by the Basic Research Foundation of Tsinghua National Laboratory for Information Science and Technology (TNList). We would like to thank the anonymous reviewers for their valuable suggestions. We would also like to give special thanks to Qian Wu, Weibei Dou and Yonglei Zhou for providing us their detailed experimental data and code.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yangqiu Song.

Appendices

Appendix A: Explanation of EBM

For the semi-supervised problem, we set labels of the unlabeled data to zeros initially. Thus, if t i = 0, the probability P(t i  = 0|y) ≡ λ. The factor λ makes the function P(t i |y i ) with respect to t i be a probability, which means P(t i =  1|y i ) + P(t i = −1|y i ) + P(t i = 0|y i ) ≡ 1. As Fig. 7 shows, this model can be considered as a degenerated ordered category model (OCM) [71], where the variance of the probability P(t i  = 0|y i ) is infinite. We define the margin as the range where P(t i  = 0|y) is larger than P(t i  = 1|y) and P(t i  = −1|y). In the margin the difference between P(t i  = 1|y) and P(t i  = −1|y) is smaller than the difference outside the margin. Therefore, the margin of EBM represents the more uncertain labels. The parameter λ controls this margin.

Fig. 7
figure 7

Illustration of extended Bernoulli model

Moreover, Fig. 7b–d show the relationship between the prior and the posterior probability of latent variable. In GP and GRF, we can assume that each latent variable is also conditionally Gaussian:

$$ P(y_i\vert{{\mathbf{y}}}_{N-\{i\}}, {{\mathbf{X}}}_N) = N(\mu_i, \sigma_i){\mathop=\limits^\Delta} P(y_i) $$
(21)

where μ i and σ i are related to the input points and labels (see the graphical model in Fig. 1b).

As Fig. 7b, c shows, for t i  = 1 and t i  = −1, the mean and the variance of posterior P(y i |t i  = 1) and P(y i |t i  = −1) are related to the likelihood P(t i |y i ) and the prior P(y i ). If μ i is near zero, the posterior of latent variable y i is affected by the label t i . It will have a positive estimated y i when the label is 1, and negative y i when the label is −1.

If the label is t i  = 0, due to the Bayesian formulation, we have P(t i |y i )P(y i ) = P(y i |t i )P(t i ). The probabilities P(t i ) and P(t i |y i ) are both constants for t i = 0, so the posterior probability P(y i |t i  = 0) only depends on the prior P(y i ). If μ i is still near zero, we will get a zero estimated y i by maximizing the posterior probability. This is why we choose a graph regularization-based prior. As mentioned earlier, each covariance between two points of this prior is related to all the training data. Thus, if there are a small amount of labeled data in the training set, μ i of an unlabeled x i will be affected by the labeled data more than the one choosing the traditional prior. Then, μ i is non-zero, which will lead to a non-zero estimated y i (shown in Fig. 7 c).

Furthermore, by comparing Fig. 7c and d, we can see that the margins do not affect the estimation of the latent variable. The estimated y i remains the same in spite of different margin models being imposed on the process y → t. However, any point whose latent variable y i that falls inside the margin will be labeled zero, which makes it remains unlabeled. This kind of points does not contribute to the prediction function (see in the prediction phase). Therefore, the classification boundary will be changed.

Appendix B: Derivation of hyper-parameter estimation

The hyper-parameters are the standard deviations Θ = {σ c , σ p } of the exponential weight, which is shown in (19). We take σ c as an example. The hyper-parameter is estimated by gradient search, which minimizes the negative logarithmic likelihood (10). According to Eqs. 5 and 8 by using the fact ∇Ψ = 0, and taking derivatives on both sides, we have

$$ \frac{\partial{{\mathbf{y}}}_N}{\partial \sigma_c} = ({{{\mathbf{I}}} + {{\mathbf{K}}}_N{\varvec{\Uppi}}_N})^{- 1}\frac{\partial{{\mathbf{K}}}_N} {\partial \sigma_c}(- {\varvec{\upalpha}}_N) $$
(22)

And according to Eq. 4, we have

$$ \begin{aligned} \frac{\partial {{\mathbf{K}}}_N^{- 1}}{\partial \sigma_c} &= - {{\mathbf{D}}}^{- \frac{1}{2}}\frac{\partial{{\mathbf{W}}}}{\partial \sigma_c} {{\mathbf{D}}}^{- \frac{1}{2}} \\ &\quad + {{\mathbf{D}}}^{- 1}\left(\frac{\partial \sqrt{{\mathbf{D}}}} {\partial \sigma_c}{{\mathbf{W}}}\sqrt{{\mathbf{D}}} + \sqrt{{\mathbf{D}}}{{\mathbf{W}}} \frac{\partial \sqrt{{\mathbf{D}}} }{\partial \sigma_c}\right){{\mathbf{D}}}^{- 1} \end{aligned} $$
(23)

where

$$ \begin{aligned} \frac{\partial{{\mathbf{W}}}_{ij}}{\partial \sigma_c} &= {{\mathbf{W}}}_{ij} {\left\|{{{\mathbf{x}}}_c(i) - {{\mathbf{x}}}_c(j)} \right\|^2} / {\sigma_c^3}\\ \frac{\partial \sqrt{{{\mathbf{D}}}_{ii}}}{\partial \sigma_c}& = \frac{1}{2\sqrt{{{\mathbf{ D}}}_{ii}}}\sum_k {{{\mathbf{W}}}_{ik} {\left\|{{{\mathbf{x}}}_c(i) - {{\mathbf{x}}}_c(k)} \right\|^2}} / {\sigma_c^3}\\ \end{aligned}$$
(24)

Furthermore, differences of K N and \({{\mathbf{K}}}_N{\varvec{\Uppi}}_N\) are given by

$$ \frac{\partial {{\mathbf{K}}}_N}{\partial \sigma_c} = - {{\mathbf{K}}}_N\frac{\partial{{\mathbf{K}}}_N^{- 1}}{\partial \sigma_c}{{\mathbf{K}}}_N $$
(25)

and

$$ \frac{\partial{{\mathbf{K}}}_N{\varvec{\Uppi}}_N}{\partial \sigma_c} = \frac{\partial{{\mathbf{K}}}_N}{\partial \sigma_c} {\varvec{\Uppi}}_N + {{\mathbf{K}}}_N\frac{\partial {\varvec{\Uppi}}_N}{\partial \sigma_c} $$
(26)

where

$$ \begin{aligned} \frac{\partial {\varvec{\Uppi}}_N}{\partial \sigma_c}& = \frac{\partial {\varvec{\Uppi}}_N} {\partial {{\mathbf{y}}}_N}\frac{\partial{{\mathbf{y}}}_N}{\partial \sigma_c} \\ &= {\rm diag}\left(\frac{t_i^3\exp (t_i y_i )\left( {1 - \exp (t_i y_i )} \right)}{\left( {1 + \exp (t_i y_i )} \right)^3} \right) {\rm diag}{\left(\frac{\partial{{\mathbf{y}}}_N}{\partial \sigma_c }\right)} \end{aligned} $$
(27)

Therefore, the derivative of objective function can be given by (11).

Appendix C: Derivation of prediction function

For predicting new test points, we first distinguish between different sizes of covariance matrix K with a subscript, such that: (1) K N is the covariance matrix of the N input training data, (2) K N+1 is (N + 1) × (N + 1) covariance matrix of the vector (y T N ,y N+1)T, where y N+1 is the latent variable of a new given point in the test set. However, it is difficult to compute K N+1 by K N in an explicit expression, since K N+1 itself depends on all the training data and each new point.Footnote 3 We make use of \({\mathbf{k}}_i = W_{N+1,i} = \exp (-{\|{\mathbf{x}}_{N+1}-{\mathbf{x}}_{i} \|^2} / {2\sigma}^2 )\) as the covariance between a new given point and the ith training point. Therefore, K N+1 can be given by

$$ {{\mathbf{K}}}_{N+1}= \left[\begin{array}{*{20}l} {{\mathbf{K}}}_N & \nu{{\mathbf{k}}}\\ \nu{{\mathbf{k}}}^{\rm T} & k_*\\\end{array}\right] $$
(28)

where ν is a scale factor to make k compatible with K N . Note that the covariance matrix K N depends on the global distance information, while the covariances of new point and the training points are only depend on the local distance information.

Then, K −1 N+1 is given by

$$ {{\mathbf{K}}}_{N+1}^{-1}= \left[\begin{array}{*{20}l} {{\mathbf{M}}} & {{\mathbf{m}}}\\ {{\mathbf{m}}}^{\rm T} & \mu\\ \end{array}\right] $$
(29)

By using the partitioned inverse equations [61], we have

$$ \begin{aligned} \mu &= (k_* - \nu^2 {{\mathbf{k}}}^{\rm T} {{\mathbf{K}}}_N^{-1} {{\mathbf{k}}})^{-1}\\ {{\mathbf{m}}} &= -\mu \nu {{\mathbf{K}}}_N^{-1} {{\mathbf{k}}} \\ {{\mathbf{M}}} &= {{\mathbf{K}}}_N^{-1} + \frac{1}{\mu} {{\mathbf{m}}} {{\mathbf{m}}}^{\rm T}\\ \end{aligned} $$
(30)

Since the optimal \({\hat{{\mathbf{y}}}}_N\) has been estimated, we only need to minimize (15) with respect to y N+1μ y N+1 + m T y N = 0, which leads to

$$ {\hat{y}}_{N + 1} = {{\mathbf{k}}}^{\rm T}{{{\mathbf{K}}}_N}^{- 1}{\hat{{\mathbf{y}}}}_N = {{\mathbf{k}}}^{\rm T}({{\mathbf{I - S}}}){\hat{{\mathbf{y}}}}_N. $$
(31)

Here the scale factor ν can be omitted.

Appendix D: Explanation of the decomposition

The authors in [42] proved that the weight matrix W can be approximately decomposed as

$$ {\hat{{\mathbf{W}}}} = {{\mathbf{V}}} {\varvec{\Uplambda}}_{{\mathbf{S}}} {{\mathbf{V}}}^{\rm T} $$
(32)

where

$$ {{\mathbf{V}}} = \left[\begin{array}{*{20}l} {{\mathbf{A}}} \\ {{\mathbf{B}}} \\\end{array} \right] {{\mathbf{A}}}^{-1/2} {{\mathbf{U}}}_{{\mathbf{S}}} {\varvec{\Uplambda}}_{{\mathbf{S}}}^{-1/2} $$
(33)

A is the sub-block of weight matrix generated by the sampling points, B is the weight matrix between the sampling points and the rest. We assume that W is denote by \({{\mathbf{W}}} = \left[ \begin{array}{*{20}l}{{\mathbf{A}}} & {{\mathbf{B}}} \\ {{\mathbf{B}}}^{\rm T} & {{\mathbf{C}}}\\\end{array} \right],\) where C is the weight matrix of the rest points. The matrix U S and \({\varvec{\Uplambda}}_{{\mathbf{S}}}\) is the eigenvectors and values of the matrix \({{\mathbf{S}}} = {{\mathbf{A}}} + {{\mathbf{A}}}^{-1/2} {{\mathbf{B}}} {{\mathbf{B}}}^{\rm T} {{\mathbf{A}}}^{-1/2} = {{\mathbf{U}}}_{{\mathbf{S}}} {\varvec{\Uplambda}}_{{\mathbf{S}}} {{\mathbf{U}}}_ {{\mathbf{S}}}^{\rm T},\) where A −1/2 denotes the symmetric positive definite square rootFootnote 4 of A. Then we can find that the constraint V V TI is satisfied automatically and the approximated weight matrix is given by

$$ {\hat{{\mathbf{W}}}} = \left[\begin{array}{*{20}l} {{\mathbf{A}}} & {{\mathbf{B}}} \\ {{\mathbf{B}}}^{\rm T} & {{\mathbf{B^{\rm T} A^{-1}} \mathbf{B}}} \\ \end{array}\right] $$
(34)

The difference of C and B T A −1 B is just the Schur complement. In addition, for the purpose of our accelerated algorithm, we need to approximate the matrix \({\hat{{\mathbf{D}}}}^{- \frac{1}{2}} {\hat{{\mathbf{W}}}} {\hat{{\mathbf{D}}}}^{- \frac{1}{2}}.\) This has been exploited by [42], which replace the matrix A and B as

$$ \begin{aligned} {{\mathbf{A}}}_{ij} &\leftarrow\frac{{{\mathbf{A}}}_{ij}}{\sqrt{{\hat{{\mathbf{ d}}}_i} {\hat{{\mathbf{d}}}}_j}} \quad(i,j = 1,\ldots,n)\\ {{\mathbf{B}}}_{ij}& \leftarrow \frac{{{\mathbf{B}}}_{ij}}{\sqrt{\hat{{\mathbf{d}}}}_i {\hat{{\mathbf{d}}}}_{j+m}}\quad (i = 1,\ldots,n;j=1,\ldots,m) \end{aligned} $$
(35)

The vector \({\hat{{\mathbf{d}}}}\) can be evaluated by

$$ {\hat{{\mathbf{d}}}} = {\hat{{\mathbf{W}}}} {{\mathbf{1}}} = \left[\begin{array}{*{20}l} {{\mathbf{A}}}{{\mathbf{1}}}_M + {{\mathbf{B}}}{{\mathbf{1}}}_{N-M} \\ {{\mathbf{B}}}^T {{\mathbf{1}}}_M + {{\mathbf{B^T A^{-1}} \mathbf{B}}} {{\mathbf{1}}}_{N-M} \\ \end{array}\right] $$
(36)

where 1 is the column vector of ones. Thus, we can rewrite the decomposition of \({\hat{{\mathbf{D}}}}^{- \frac{1}{2}} {\hat{{\mathbf{W}}}} {\hat{{\mathbf{D}}}}^{- \frac{1}{2}}\) as \({{\mathbf{U}}}{\varvec{\Uplambda}}{{\mathbf{U}}}^{\rm T},\) where \({\varvec{\Uplambda}}\) is an M × M matrix.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, Y., Zhang, C., Lee, J. et al. Semi-supervised discriminative classification with application to tumorous tissues segmentation of MR brain images. Pattern Anal Applic 12, 99–115 (2009). https://doi.org/10.1007/s10044-008-0104-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-008-0104-3

Keywords

Navigation