Understanding repeatability of classification by classifier in the context of overall classification performance and operating points can contribute to improved design of computer-aided diagnosis (CADx). Breast lesions (243 benign, 853 malignant: 1,096 total) were segmented using a fuzzy c-means method from dynamic contrast-enhanced magnetic resonance images acquired over 2005-2015. Thirty-eight radiomic features were extracted. Overall classification performance, case-based classification repeatability, and attainment of ‘preferred’ target and ‘optimal’ sensitivity and specificity were investigated for three classifiers: linear discriminant analysis, support vector machine, and random forest using a 1000-iteration 0.632 bootstrap. The area under the receiver operating characteristic curve (AUC) for the task of classifying lesions as malignant or benign was determined using the 0.632+ bootstrap correction. AUC was compared between classifiers; statistical significance was indicated when the 98.33% confidence interval (CI) of the difference in AUC (corrected for multiple comparisons) excluded zero. Classifier repeatability was determined through 95% CI width of classifier output by case across classifier output range. Classifier output thresholds were determined from the training folds for target sensitivity (95%), target specificity (95%), and for a selected ‘optimal’ operating point determined by minimizing (1-sensitivity)2 + (1-specificity)2 and applied to the test folds. No difference in AUC was observed between the three classifiers. Classifier output, however, was more repeatable when the random forest classifier was used as indicated by a lower 95% CI width of classifier output overall. Moreover, limited differences by classifier in threshold to attain target and ‘optimal’ sensitivities and specificities along with attained sensitivities and specificities were observed. CADx design may benefit from these considerations when selecting which classifier is used.
|