Modern generative models, such as generative adversarial networks (GANs), hold tremendous promise for several applications in medical imaging that include unconditional medical image synthesis, image translation, and optimization of imaging systems. However, the extent to which a GAN learns image statistics that are relevant to a diagnostic task is unknown. In this work, canonical stochastic image models (SIMs) that simulate realistic mammographic textures are employed to evaluate GAN-based SIMs with respect to detection, detection-localization, and detection-estimation tasks. It is shown that the specific GAN architecture considered has higher propensity to generate statistics that confound the observers performing the three considered tasks. This work highlights the need for continued development of objective metrics for evaluating GANs.
|