Skip to main content

Advertisement

Log in

Statistical validation metric for accuracy assessment in medical image segmentation

  • Original Article
  • Published:
International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Abstract

Objective Validation of medical image segmentation algorithms is an open question, considering variance of individual pathologies and the related clinical requirements for accuracy. In this paper, we propose a validation metric capable to distinguish between an over and under-segmentation and account for different clinical applications.

Materials and methods In this paper, we propose a validation metric representing a tradeoff between sensitivity and specificity. The metric has an advantage of differentiating between an over or under-segmentation which is an important feature for validating large sets of segmentation results, as human inspection is exhausting and time consuming. Although it is oriented to the accuracy measurement it is also closely related to the robustness of a method.

Results Features of the metrics are analyzed alongside their medical impact. A set of numerical simulations is performed in order to compare the proposed metric with standardly used discrepancy measures. The metric is illustrated with a clinical case study, presenting accuracy assessment of an algorithm for calvarial tumor segmentation, validated on six patients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Jannin P, Fitzpatrick JM, Hawkes DJ, Pennec X, Shahidi R and Vannier MW (2002). Validation of medical image processing in image guided therapy. IEEE Trans Medical Imaging 21(12): 1445–1449

    Article  Google Scholar 

  2. Udupa JK, LeBlanc VR, Schmidt H, Imielinska C, Saha PK, Grevera GJ, Zhuge Y, Molholt P, Jin Y, Currie LM (2002) A methodology for evaluating image segmentation algorithms. In: Proceedings of SPIE: medical imaging, pp. 266–277

  3. Udupa JK, LeBlanc VR, Zhuge Y, Imielinska C, Schmidt H, Currie LM, Hirsch BE and Woodburn J (2006). A framework for evaluating image segmentation algorithms. Comput Medical Imaging Graph 30(2): 75–87

    Article  Google Scholar 

  4. Warfield SK, Zou KH and Wells WM (2002). Validation of image segmentation and expert quality with an expectation—maximization algorithm. In: Dohi, T and Kikinis, R (eds) Proceedings of MICCAI 2002, the fifth international conference, pp 298–306. Springer, Heidelberg

    Google Scholar 

  5. Yitzhaky Y and Peli E (2003). A method for objective edge detection evaluation and detector parameter selection. IEEE Trans Pattern Anal Mach Intell 25(10): 1–7

    Google Scholar 

  6. Warfield SK, Zou KH and Wells WM (2004). Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Medical Imaging 23(7): 903–921

    Article  Google Scholar 

  7. Collins DL, Zijdenbos AP, Kollokian V, Sled JG, Kabani NJ, Holmes CJ and Evans AC (1998). Design and construction of a realistic digital brain phantom. IEEE Trans Medical Imaging 17(3): 463–468

    Article  CAS  Google Scholar 

  8. Zubal IG, Harrell CR, Smith EO, Smith AL, Krischlunas P (1995) Two dedicated software voxel-based antropomorphic (Torso and Head) phantoms. In: Dimbylow PJ (ed) Proceeding of the international conference at the national radilogical protection board, pp 105–111

  9. Cardoso JS and Corte-Real L (2005). Toward a generic evaluation of image segmentation. IEEE Trans Image Proces 14(11): 1773–1782

    Article  Google Scholar 

  10. Yoo TS, Ackerman MJ and Vannier M (2000). Toward a common validation methodology for segmentation and registration algorithms. In: Delp, S, DiGioia, A, and Jaramaz, B (eds) Proceedings of MICCAI 2000, the 3rd international conference, vol 1935 of Lecture Notes in Computer Science, pp 422–431. Springer, Heidelberg

    Google Scholar 

  11. Duncan JC and Ayache N (2000). Medical image analysis: Progress over two decades and the challenges ahead. IEEE Trans Pattern Anal Mach Intell 22(1): 85–106

    Article  Google Scholar 

  12. Jannin P, Grova C and Maurer CR (2006). Model for defining and reporting reference based validation protocols in medical image processing. Int J Comput Assisted Radiol Surg 1(2): 63–73

    Article  Google Scholar 

  13. Kraemer HC (1992) Evaluating medical tests. SAGE

  14. Dice LR (1945). Measures of the amount of ecologic association between species. Ecology 26: 297–302

    Article  Google Scholar 

  15. Prastawa M, Bullitt E, Ho S and Gerig G (2003). Robust estimation for brain tumor segmentation. In: Ellis, RE and Peters, TM (eds) Proceedings of MICCAI 2003, the sixth international conference, vol 2879 of Lecture Notes in Computer Science, pp 530–537. Springer, Heidelberg

    Google Scholar 

  16. Gerig G, Jomier M, Chakos M (2001) VALMETanew validation tool for assesing and improving 3D object segmentation. In: Niessen WJ, Viergever MA (eds) MICCAI 2001, the fourth international conference, vol 2208 of Lecture Notes in Computer Science, pp 516–528

  17. Zou KH, Wells WM, Kikinis R and Warfield SK (2003). Three validation metrics for automated probabilistic image segmentation of brain tumours. Statist Med 23(8): 1259–1282

    Article  Google Scholar 

  18. Zou KH, Warfield SK, Bharatha A, Tempany CMC, Kaus MR, Haker SJ, Wells WM, Jolesz FA and Kikinis R (2004). Statistical validation of image segmentation quality based on a spatial overlap index. Acad Radiol 11(2): 178–189

    Article  PubMed  Google Scholar 

  19. Jaccard P (1912). The distribution of flora in the alpine zone. New Phytol 11: 37–50

    Article  Google Scholar 

  20. Shan ZY, Ji Q, Gajjar A and Reddick WE (2005). A knowledge-guided active contour method of segmentation of cerebella on mr images of pediatric patients with medulloblastoma. J Magn Reson Imaging 21: 1–11

    Article  PubMed  Google Scholar 

  21. Roman-Roldan R, Gomez-Lopera JF, Atae-Allah C, Martinez-Aroza J and Luque-Escamilla PL (2001). A measure of quality for evaluating methods of segmentation and edge detection. Pattern Recogn 34: 969–980

    Article  Google Scholar 

  22. Goumeidane AB, Khamadja M, Belaroussi B, Benoit-Cattin H, Odet C (2003) New discrepancy measures for segmentation evaluation. In: International conference on image processing (ICIP), vol 2. IEEE, pp 411–414

  23. Kohavi R, Provost F (1998) Glossary of terms. Editorial for the special issue on applications of machine learning and the knowledge discovery process. J Mach Learn 30(2/3) (in press)

  24. Lasko TA, Bhagwat JG, Zou KH and Ohno-Machado L (2005). The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inf 38(5): 404–415

    Article  Google Scholar 

  25. Fleiss JL (1975). Measuring agreement between two judges on the presence or absence of a trait. Biometrics 31: 651–659

    Article  PubMed  CAS  Google Scholar 

  26. Hripcsak G and Rothschild AS (2005). Agreement, the f-measure and reliability in information retrieval. J Am Med Inf Assoc 12(3): 296–297

    Article  Google Scholar 

  27. Fuernkranz J and Flach PA (2005). Roc ‘n’ rule learning—towards a better understanding of covering algorithms. Mach Learn 58(1): 39–77

    Article  Google Scholar 

  28. Bradly A (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7): 1145–1159

    Article  Google Scholar 

  29. Horsch K, Giger ML, Venta LA and Vyborny CJ (2001). Automatic segmentation of breast lesions on ultrasound. Med Phys 28(8): 1652–1659

    Article  PubMed  CAS  Google Scholar 

  30. Grova C, Daunizeau J, Lina J-M, Bnar CG, Benali H and Gotman J (2006). Evaluation of EEG localization methods using realistic simulations of interictal spikes. Neuroimage 29(3): 734–753

    Article  PubMed  CAS  Google Scholar 

  31. Flach PA (2003) The geometry of ROC space: understanding machine learning metrics through roc isometrics. In: Proc 20th international conference on machine learning (ICML’03). AAAI Press, pp 194–201

  32. Kuan Xu (2000). Inference for generalized Gini indices using the iterated-bootstrap method. J Bus Econ Statist 18(2): 223–227

    Article  Google Scholar 

  33. Castillo-Salgado C, Schneider C, Loyola E, Mujica O, Roca A, Yerg T (2001) Measuring health inequalities: Gini coefficient and concentration index. Epidemiol Bull Pan Am Health Organization 22(1) (in press)

  34. Vilalta R, Oblinger D (2000) A quantification of distance bias between evaluation metrics in classification. In: ICML ’00: proceedings of the seventeenth international conference on machine learning. San Francisco, Morgan Kaufmann, pp 1087–1094

  35. Popovic A, Engelhardt M, Radermacher K (2006) Knowledge-based segmentation of calvarial tumors in computed tomography images. In: Bildverarbeitung für Medizin, BVM 2006, Informatik-Aktuell. Springer, Heidelberg, pp 151–155

  36. Huang J, Ling CX (2005) Using AUCand accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3) (in press)

  37. Zijdenbos AP, Dawant BM, Margolin RA and Palmer AC (1994). Morphometric analysis of white matter lesions in MR images: method and validation. IEEE Trans Med Imaging 13(4): 716–724

    Article  Google Scholar 

  38. Popovic A, Engelhardt M, Wu T, Portheine F, Schmieder K, Radermacher K (2003) CRANIO—computer assisted planning for navigation and robot-assisted surgery on the skull. In: Lemke HU, Vannier MW, Inamura K, Farman AG, Doi K, Reiber JHC (eds), Proceedings of the 17th international congress and exhibition (CARS), vol 1256 of International Congress Series. Elsevier, pp 1269–1276

  39. Bast P, Popovic A, Wu T, Heger S, Engelhardt M, Lauer W, Radermacher K and Schmieder K (2006). Robot- and computer-assisted craniotomy: resection planning, implant modelling and robot safety. Int J Med Robot Comput Assisted Surg 2(2): 168–178

    Article  CAS  Google Scholar 

  40. Engelhardt M, Bast P, Jeblink N, Lauer W, Popovic A, Eufinger H, Scholz M, Christmann A, Harders A, Radermacher K and Schmieder K (2006). Analysis of surgical management of calvarial tumours and first results of a newly designed robotic trepanation system. Minim Invasive Neurosurg 49(2): 98–103

    Article  PubMed  CAS  Google Scholar 

  41. Popovic A, Engelhardt M, Wu T and Radermacher K (2006). Modeling of intensity priors for knowledge-based level set algorithm in calvarial tumors segmentation. In: Larsen, R, Nielsen, M, and Sporring, J (eds) Proceedings of 9th international conference on medical image computation and computer assisted intervention (MICCAI 2006), vol 4191 of Lecture Notes in Computer Science, pp 864–871. Springer, Heidelberg

    Google Scholar 

  42. Popovic A, Engelhardt M, Wu T, Radermacher K (2006) Towards automatic parameter optimization for medical image segmentation algorithms. In: Proceedings of the 11th international fall workshop, vision modeling, and visualization—VMV 2006

  43. Maddah M, Zou KH, Wells WM, Kikinis R and Warfield SK (2004). Automatic optimization of segmentation algorithms through simultaneous truth and performance level estimation (STAPLE). In: Barillot, C, Haynor, DR, and Hellier, P (eds) Proceedings of MICCAI 2004, seventh international conference, vol 3216 of Lecture Notes in Computer Science., pp 274–282. Springer, Heidelberg

    Google Scholar 

  44. Abdul-Karim M-A, Roysam B, Dowell-Mesfin NM, Jeromin A, Yuksel M and Kalyanaraman S (2005). Automatic selection of parameters for vessel/neurite segmentation algorithms. IEEE Trans Image Proces 14(9): 1338–1350

    Article  Google Scholar 

  45. Crum WR, Camara O, Rueckert D, Bhatia KK, Jenkinson M and Hill DLG (2005). Generalized overlap measures for assessment of pairwise and groupwise image registration and segmentation. In: Duncan, J and Gerig, G (eds) Proceedings of MICCAI 2005, the 8th international conference, vol 3749 of Lecture Notes in Computer Science, pp 99–106. Springer, Berlin

    Google Scholar 

  46. Breiman L (1996). Technical note: some properties of splitting criteria. Mach Learn 24(1): 41–47

    Google Scholar 

  47. Berzal F, Cubero J-C, Cuenca F and Martin-Bautista MJ (2003). On the quest for easy-to-understand splitting rules. Data Knowl Eng 44(1): 31–48

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aleksandra Popovic.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Popovic, A., de la Fuente, M., Engelhardt, M. et al. Statistical validation metric for accuracy assessment in medical image segmentation. Int J CARS 2, 169–181 (2007). https://doi.org/10.1007/s11548-007-0125-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11548-007-0125-1

Keywords

Navigation