Abstract
Dice similarity coefficient (DSC) and Hausdorff distance (HD) are widely used for evaluating medical image segmentation. They have also been criticised, when reported alone, for their unclear or even misleading clinical interpretation. DSCs may also differ substantially from HDs, due to boundary smoothness or multiple regions of interest (ROIs) within a subject. More importantly, either metric can also have a nonlinear, non-monotonic relationship with outcomes based on Type 1 and 2 errors, designed for specific clinical decisions that use the resulting segmentation. Whilst cases causing disagreement between these metrics are not difficult to postulate, one might argue that they may not necessarily be substantiated in real-world segmentation applications, as a majority of ROIs and their predictions often do not manifest themselves in extremely irregular shapes or locations that are prone to such inconsistency. This work first proposes a new asymmetric detection metric, adapting those used in object detection, for planning prostate cancer procedures. The lesion-level metrics is then compared with the voxel-level DSC and HD, whereas a 3D UNet is used for segmenting lesions from multiparametric MR (mpMR) images. Based on experimental results using 877 sets of mpMR images, we report pairwise agreement and correlation 1) between DSC and HD, and 2) between voxel-level DSC and recall-controlled precision at lesion-level, with Cohen’s \(\kappa \in [0.49, 0.61] \) and Pearson’s \(r \in [0.66, 0.76]\) (p-values<0.001) at varying cut-offs. However, the differences in false-positives and false-negatives, between the actual errors and the perceived counterparts if DSC is used, can be as high as 152 and 154, respectively, out of the 357 test set lesions. We therefore carefully conclude that, despite of the significant correlations, voxel-level metrics such as DSC can misrepresent lesion-level detection accuracy for evaluating localisation of multifocal prostate cancer and should be interpreted with caution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This work uses binary segmentation as an example, though the discussion may generalise to multiclass segmentation by considering lesions of different grades separately.
References
Ahmed, H.U., et al.: Diagnostic accuracy of multi-parametric MRI and TRUS biopsy in prostate cancer (PROMIS): a paired validating confirmatory study. Lancet 389(10071), 815–822 (2017)
Ahmed, H.U., Hindley, R.G., Dickinson, L., Freeman, A., et al.: Focal therapy for localised unifocal and multifocal prostate cancer: a prospective development study. Lancet Oncol. 13(6), 622–632 (2012)
Bosaily, A.E.S., et al.: PROMIS-prostate MR imaging study: a paired validating cohort study evaluating the role of multi-parametric MRI in men with clinical suspicion of prostate cancer. Contemp. Clin. Trials 42, 26–40 (2015)
Cao, R., Zhong, X., Shakeri, S., Bajgiran, A.M., et al.: Prostate cancer detection and segmentation in multi-parametric MRI via CNN and conditional random field. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 1900–1904. IEEE (2019)
Catalona, W.J., Bigg, S.W.: Nerve-sparing radical prostatectomy: evaluation of results after 250 patients. J. Urol. 143(3), 538–543 (1990)
Chiou, E., Giganti, F., Punwani, S., Kokkinos, I., Panagiotaki, E.: Harnessing uncertainty in domain adaptation for MRI prostate lesion segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 510–520. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_50
Dickinson, L., Ahmed, H.U., Allen, C., Barentsz, J.O., et al.: Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway? J. Magn. Reson. Imaging 37(1), 48–58 (2013)
Dickinson, L., et al.: A multi-centre prospective development study evaluating focal therapy using high intensity focused ultrasound for localised prostate cancer: the INDEX study. Contemp. Clin. Trials 36(1), 68–80 (2013)
Halligan, S., Altman, D.G., Mallett, S.: Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: a discussion and proposal for an alternative approach. Eur. Radiol. 25(4), 932–939 (2015). https://doi.org/10.1007/s00330-014-3487-0
Hambarde, P., Talbar, S., Mahajan, A., Chavan, S., et al.: Prostate lesion segmentation in MR images using radiomics based deeply supervised U-Net. Biocybernetics Biomed. Eng. 40(4), 1421–1435 (2020)
Hamid, S., et al.: The smartTarget biopsy trial: a prospective, within-person randomised, blinded trial comparing the accuracy of visual-registration and magnetic resonance imaging/ultrasound image-fusion targeted biopsies for prostate cancer risk stratification. Eur. Urol. 75(5), 733–740 (2019)
Jung, J.A., Coakley, F.V., Vigneron, D.B., et al.: Prostate depiction at endorectal MR spectroscopic imaging: investigation of a standardized evaluation system. Radiology 233(3), 701–708 (2004)
Linch, M., et al.: Intratumoural evolutionary landscape of high-risk prostate cancer: the PROGENY study of genomic and immune parameters. Ann. Oncol. 28(10), 2472–2480 (2017)
Ma, J., Chen, J., Ng, M., Huang, R., et al.: Loss odyssey in medical image segmentation. Med. Image Anal. 71, 102035 (2021)
Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2014)
Moore, C.M., Kasivisvanathan, V., Eggener, S., Emberton, M., et al.: Standards of reporting for MRI-targeted biopsy studies (START) of the prostate: recommendations from an international working group. Eur. Urol. 64(4), 544–552 (2013)
Orczyk, C., et al.: Prostate radiofrequency focal ablation (ProRAFT) trial: a prospective development study evaluating a bipolar radiofrequency device to treat prostate cancer. J. Urol. 205(4), 1090–1099 (2021)
Padilla, R., Passos, W.L., Dias, T.L.B., Netto, S.L., da Silva, E.A.B.: A comparative analysis of object detection metrics with a companion open-source toolkit. Electronics 10(3) (2021). https://doi.org/10.3390/electronics10030279. https://www.mdpi.com/2079-9292/10/3/279
Rob, L., Halaska, M., Robova, H.: Nerve-sparing and individually tailored surgery for cervical cancer. Lancet Oncol. 11(3), 292–301 (2010)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Schelb, P., Kohl, S., Radtke, J.P., Wiesenfarth, M., et al.: Classification of cancer at prostate MRI: deep learning versus clinical PI-RADS assessment. Radiology 293(3), 607–617 (2019)
Schelb, P., Tavakoli, A.A., Tubtawee, T., Hielscher, T., et al.: Comparison of prostate MRI lesion segmentation agreement between multiple radiologists and a fully automatic deep learning system. In: RöFo-Fortschritte auf dem Gebiet der Röntgenstrahlen und der bildgebenden Verfahren, vol. 193, pp. 559–573. Georg Thieme Verlag KG (2021)
Simmons, L.A., et al.: Accuracy of transperineal targeted prostate biopsies, visual estimation and image fusion in men needing repeat biopsy in the PICTURE trial. J. Urol. 200(6), 1227–1234 (2018)
Sung, H., Ferlay, J., Siegel, R.L., Laversanne, M., et al.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J Clin. 71(3), 209–249 (2021)
Weinreb, J.C., Barentsz, J.O., Choyke, P.L., Cornud, F., et al.: PI-RADS prostate imaging-reporting and data system: 2015, version 2. Eur. Urol. 69(1), 16–40 (2016)
Winkel, D.J., Wetterauer, C., Matthias, M.O., Lou, B., et al.: Autonomous detection and classification of PI-RADS lesions in an MRI screening population incorporating multicenter-labeled deep learning and biparametric imaging: proof of concept. Diagnostics 10(11), 951 (2020)
Acknowledgment
This work was supported by the International Alliance for Cancer Early Detection, an alliance between Cancer Research UK [C28070/A30912; C73666/A31378], Canary Center at Stanford University, the University of Cambridge, OHSU Knight Cancer Institute, University College London and the University of Manchester. This work was also supported by the Wellcome/EPSRC Centre for Interventional and Surgical Sciences [203145Z/16/Z].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yan, W. et al. (2022). The Impact of Using Voxel-Level Segmentation Metrics on Evaluating Multifocal Prostate Cancer Localisation. In: Wu, S., Shabestari, B., Xing, L. (eds) Applications of Medical Artificial Intelligence. AMAI 2022. Lecture Notes in Computer Science, vol 13540. Springer, Cham. https://doi.org/10.1007/978-3-031-17721-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-17721-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17720-0
Online ISBN: 978-3-031-17721-7
eBook Packages: Computer ScienceComputer Science (R0)