Skip to main content

Jointly Boosting Saliency Prediction and Disease Classification on Chest X-ray Images with Multi-task UNet

  • Conference paper
  • First Online:
Medical Image Understanding and Analysis (MIUA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13413))

Included in the following conference series:

Abstract

Human visual attention has recently shown its distinct capability in boosting machine learning models. However, studies that aim to facilitate medical tasks with human visual attention are still scarce. To support the use of visual attention, this paper describes a novel deep learning model for visual saliency prediction on chest X-ray (CXR) images. To cope with data deficiency, we exploit the multi-task learning method and tackle disease classification on CXR simultaneously. For a more robust training process, we propose a further optimized multi-task learning scheme to better handle model overfitting. Experiments show our proposed deep learning model with our new learning scheme can outperform existing methods dedicated either for saliency prediction or image classification. The code used in this paper is available at [webpage, concealed for double-blind review].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amyar, A., Modzelewski, R., Li, H., Ruan, S.: Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: classification and segmentation. Comput. Biol. Med. 126, 104037 (2020)

    Article  Google Scholar 

  2. Borji, A.: Saliency prediction in the deep learning era: successes and limitations. IEEE Trans. Patt. Anal. Mach. Intell. 43, 679–700 (2019)

    Google Scholar 

  3. Borji, A., Sihite, D.N., Itti, L.: Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans. Image Process. 22(1), 55–69 (2012)

    Article  MathSciNet  Google Scholar 

  4. Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2018)

    Article  Google Scholar 

  5. Cai, Y., Sharma, H., Chatelain, P., Noble, J.A.: Multi-task SonoEyeNet: detection of fetal standardized planes assisted by generated sonographer attention maps. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 871–879. Springer (2018). https://doi.org/10.1007/978-3-030-00928-1_98

  6. Çallı, E., Sogancioglu, E., van Ginneken, B., van Leeuwen, K.G., Murphy, K.: Deep learning for chest x-ray analysis: a survey. Med. Image Anal. 72, 102125 (2021)

    Google Scholar 

  7. Cao, G., Tang, Q., Jo, K.: Aggregated deep saliency prediction by self-attention network. In: Huang, D.-S., Premaratne, P. (eds.) ICIC 2020. LNCS (LNAI), vol. 12465, pp. 87–97. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60796-8_8

    Chapter  Google Scholar 

  8. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)

    Article  MathSciNet  Google Scholar 

  9. Castro, D.C., Walker, I., Glocker, B.: Causality matters in medical imaging. Nat. Commun. 11(1), 1–10 (2020)

    Article  Google Scholar 

  10. Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: International Conference on Machine Learning, pp. 794–803. PMLR (2018)

    Google Scholar 

  11. Crawshaw, M.: Multi-task learning with deep neural networks: a survey. arXiv preprint arXiv:2009.09796 (2020)

  12. Duffner, S., Garcia, C.: An online backpropagation algorithm with validation error-based adaptive learning rate. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 249–258. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74690-4_26

    Chapter  Google Scholar 

  13. El Asnaoui, K., Chawki, Y., Idri, A.: Automated methods for detection and classification pneumonia based on X-Ray images using deep learning. In: Maleh, Y., Baddi, Y., Alazab, M., Tawalbeh, L., Romdhani, I. (eds.) Artificial Intelligence and Blockchain for Future Cybersecurity Applications. SBD, vol. 90, pp. 257–284. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-74575-2_14

    Chapter  Google Scholar 

  14. Fawcett, T.: An introduction to roc analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  15. Fu, K., Dai, W., Zhang, Y., Wang, Z., Yan, M., Sun, X.: MultiCAM: multiple class activation mapping for aircraft recognition in remote sensing images. Remote Sens. 11(5), 544 (2019)

    Article  Google Scholar 

  16. Guo, M., Haque, A., Huang, D.A., Yeung, S., Fei-Fei, L.: Dynamic task prioritization for multitask learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 270–287 (2018)

    Google Scholar 

  17. Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45(2), 171–186 (2001)

    Article  Google Scholar 

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  19. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  20. Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 590–597 (2019)

    Google Scholar 

  21. Jha, A., Kumar, A., Pande, S., Banerjee, B., Chaudhuri, S.: MT-UNET: a novel U-Net based multi-task architecture for visual scene understanding. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 2191–2195. IEEE (2020)

    Google Scholar 

  22. Jia, S., Bruce, N.D.: EML-NET: an expandable multi-layer network for saliency prediction. Image Vis. Comput. 95, 103887 (2020)

    Article  Google Scholar 

  23. Johnson, A.E., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6(1), 1–8 (2019)

    Google Scholar 

  24. Johnson, A.E., et al.: MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042 (2019)

  25. Karargyris, A., et al.: Creation and validation of a chest x-ray dataset with eye-tracking and report dictation for AI development. Sci. Data 8(1), 1–18 (2021)

    Google Scholar 

  26. Karessli, N., Akata, Z., Schiele, B., Bulling, A.: Gaze embeddings for zero-shot image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4525–4534 (2017)

    Google Scholar 

  27. Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7482–7491 (2018)

    Google Scholar 

  28. Khan, W., Zaki, N., Ali, L.: Intelligent pneumonia identification from chest x-rays: a systematic literature review. IEEE Access 9, 51747–51771 (2021)

    Google Scholar 

  29. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  30. Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder-decoder network for visual saliency prediction. Neural Netw. 129, 261–270 (2020)

    Article  Google Scholar 

  31. Kümmerer, M., Wallis, T.S., Bethge, M.: DeepGaze II: reading fixations from deep features trained on object recognition. arXiv preprint arXiv:1610.01563 (2016)

  32. Li, H., Li, J., Guan, X., Liang, B., Lai, Y., Luo, X.: Research on overfitting of deep learning. In: 2019 15th International Conference on Computational Intelligence and Security (CIS), pp. 78–81. IEEE (2019)

    Google Scholar 

  33. Li, Y., Zhang, Z., Dai, C., Dong, Q., Badrigilan, S.: Accuracy of deep learning for automated detection of pneumonia using chest x-ray images: a systematic review and meta-analysis. Comput. Biol. Med. 123, 103898 (2020)

    Google Scholar 

  34. Liebel, L., Körner, M.: Auxiliary tasks in multi-task learning. arXiv preprint arXiv:1805.06334 (2018)

  35. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

    Google Scholar 

  36. Liu, X., Milanova, M.: Visual attention in deep learning: a review. Int. Rob. Auto J. 4(3), 154–155 (2018)

    Google Scholar 

  37. Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)

  38. McLaughlin, L., Bond, R., Hughes, C., McConnell, J., McFadden, S.: Computing eye gaze metrics for the automatic assessment of radiographer performance during x-ray image interpretation. Int. J. Med. Inform. 105, 11–21 (2017)

    Article  Google Scholar 

  39. Moody, G., Mark, R., Goldberger, A.: PhysioNet: a research resource for studies of complex physiologic and biomedical signals. In: Computers in Cardiology 2000, vol. 27 (Cat. 00CH37163), pp. 179–182. IEEE (2000)

    Google Scholar 

  40. Moradi, S., et al.: MFP-Unet: a novel deep learning based approach for left ventricle segmentation in echocardiography. Physica Med. 67, 58–69 (2019)

    Google Scholar 

  41. Oyama, T., Yamanaka, T.: Fully convolutional DenseNet for saliency-map prediction. In: 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), pp. 334–339. IEEE (2017)

    Google Scholar 

  42. Oyama, T., Yamanaka, T.: Influence of image classification accuracy on saliency map estimation. CAAI Trans. Intell. Technol. 3(3), 140–152 (2018)

    Article  Google Scholar 

  43. Pan, J., Sayrol, E., Giro-i Nieto, X., McGuinness, K., O’Connor, N.E.: Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 598–606 (2016)

    Google Scholar 

  44. Paneri, S., Gregoriou, G.G.: Top-down control of visual attention by the prefrontal cortex. functional specialization and long-range interactions. Front. Neurosci. 11, 545 (2017)

    Google Scholar 

  45. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. Adv. Neural. Inf. Process. Syst. 32, 8026–8037 (2019)

    Google Scholar 

  46. Reddy, N., Jain, S., Yarlagadda, P., Gandhi, V.: Tidying deep saliency prediction architectures. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 10241–10247. IEEE (2020)

    Google Scholar 

  47. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015). https://doi.org/10.1007/978-3-319-24574-4_28

  48. Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-cam: why did you say that? arXiv preprint arXiv:1611.07450 (2016)

  49. Sener, O., Koltun, V.: Multi-task learning as multi-objective optimization. arXiv preprint arXiv:1810.04650 (2018)

  50. Serte, S., Serener, A., Al-Turjman, F.: Deep learning in medical imaging: a brief review. Trans. Emerg. Telecommun. Technol. 14 (2020)

    Google Scholar 

  51. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)

  52. Smith, L.N.: A disciplined approach to neural network hyper-parameters: part 1-learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820 (2018)

  53. Sun, Y., Zhao, M., Hu, K., Fan, S.: Visual saliency prediction using multi-scale attention gated network. Multimedia Syst. 28(1), 131–139 (2021). https://doi.org/10.1007/s00530-021-00796-4

    Article  Google Scholar 

  54. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  55. Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)

    Google Scholar 

  56. Tan, M., Le, Q.V.: Efficientnetv2: smaller models and faster training. arXiv preprint arXiv:2104.00298 (2021)

  57. Tieleman, T., Hinton, G., et al.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)

    Google Scholar 

  58. Vandenhende, S., Georgoulis, S., Van Gansbeke, W., Proesmans, M., Dai, D., Van Gool, L.: Multi-task learning for dense prediction tasks: a survey. IEEE Trans. Patt. Anal. Mach. Intell. 44(7) (2021)

    Google Scholar 

  59. Wang, W., Tran, D., Feiszli, M.: What makes training multi-modal classification networks hard? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12695–12705 (2020)

    Google Scholar 

  60. Wang, W., Shen, J., Xie, J., Cheng, M.M., Ling, H., Borji, A.: Revisiting video saliency prediction in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 220–237 (2019)

    Article  Google Scholar 

  61. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: ChestX-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106 (2017)

    Google Scholar 

  62. Zhang, Y., Yang, Q.: A survey on multi-task learning. In: IEEE Transactions on Knowledge and Data Engineering (2021). https://doi.org/10.1109/TKDE.2021.3070203

  63. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)

    Google Scholar 

  64. Zhou, Y., et al.: Multi-task learning for segmentation and classification of tumors in 3D automated breast ultrasound images. Med. Image Anal. 70, 101918 (2021)

    Google Scholar 

  65. Zhu, H., Salcudean, S., Rohling, R.: Gaze-guided class activation mapping: leveraging human attention for network attention in chest x-rays classification. arXiv preprint arXiv:2202.07107 (2022)

  66. Zhu, H., Salcudean, S.E., Rohling, R.N.: A novel gaze-supported multimodal human-computer interaction for ultrasound machines. Int. J. Comput. Assist. Radiol. Surg. 14(7), 1107–1115 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzhi Zhu .

Editor information

Editors and Affiliations

Appendices

AMathematical Derivation of Vicious Circle for Overfitting

Let \(L\ge 0\) be the loss for a task, \(\mathcal {T}\), and \(\sigma >0\) be the variance estimator for L used in Eq. 1. Therefore, the loss for \(\mathcal {T}\) following Eq. 1 can be expressed as:

$$\begin{aligned} \mathcal {L} = \frac{L}{\sigma ^2}+\ln (\sigma +1). \end{aligned}$$
(6)

The partial derivative of \(\mathcal {L}\) with respect to \(\sigma \) is:

$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \sigma } = -\frac{2L}{\sigma ^3}+\frac{1}{\sigma +1}. \end{aligned}$$
(7)

During a gradient based optimization process, to minimize \(\mathcal {L}\), \(\sigma \) converges to the equilibrium value (\(\sigma \) remains unchanged after gradient descend) which is achieved when \(\frac{\partial \mathcal {L}}{\partial \sigma }=0\). Therefore, the following equation holds when \(\sigma \) is at its equilibrium value, denoted as \(\tilde{\sigma }\):

$$\begin{aligned} L = \frac{\tilde{\sigma }^3}{2\tilde{\sigma }+2} \end{aligned}$$
(8)

which is calculated by letting \(\frac{\partial \mathcal {L}}{\partial \sigma }=0\). Let \(f(\tilde{\sigma }) = L\), \(\tilde{\sigma }>0\), we can calculate that:

$$\begin{aligned} \frac{d f(\tilde{\sigma })}{d \tilde{\sigma }} = \frac{\tilde{\sigma }^2(2\tilde{\sigma } + 3)}{2(\tilde{\sigma } +1)^2}>0, \quad \forall \tilde{\sigma }>0. \end{aligned}$$
(9)

Therefore, we know that \(f(\tilde{\sigma })\) is strictly monotonically increasing with respect to \(\tilde{\sigma }\), and hence the inverse function of \(f(\tilde{\sigma })\), \(f^{-1}(\cdot )\), exists. More specifically, we have:

$$\begin{aligned} \tilde{\sigma } = f^{-1}(L). \end{aligned}$$
(10)

As a pair of inverse functions share the same monotonicity, we know that \(\tilde{\sigma } = f^{-1}(L)\) is also strictly monotonically increasing. Thus, when L decreases due to overfitting, we know that \(\tilde{\sigma }\) will decrease accordingly, forcing \(\sigma \) to decrease. The decreased \(\sigma \) leads to an increase in the effective learning rate for \(\mathcal {T}\), forming a vicious circle of overfitting.

B Training Settings

We use the Adam optimizer with default parameters [29] and the RLRP scheduler for all the training processes. The RLRP scheduler reduces \(90\%\) of the learning rate when validation loss stops improving for P consecutive epochs, and reset model parameters to an earlier epoch when the network achieves the best validation loss. All training and testing are performed with the PyTorch framework [45]. Hyper-parameters for optimizations are learning rate r, and P in RLRP scheduler. The dataset is randomly partitioned into \(70\%\), \(10\%\) and \(20\%\) subsections for training, validation and testing, respectively. The random data partitioning process preserves the balanced dataset characteristic, and all classes have equal share in all sub-datasets. All the results presented in this paper are based on at least 5 independent training with the same hyper-parameters. NVIDIA V100 and A100 GPUs (Santa Clara, USA) were used.

C Saliency Map visualization

Table 4. Visualization of predicted saliency distributions. The ground truth and predicted saliency distributions are overlaid over CXR images. Jet colormap is used for saliency distributions where warmer (red and yellow) colors indicate higher concentration of saliency and colder (green and blue) colors indicate lower concentration of saliency.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, H., Rohling, R., Salcudean, S. (2022). Jointly Boosting Saliency Prediction and Disease Classification on Chest X-ray Images with Multi-task UNet. In: Yang, G., Aviles-Rivero, A., Roberts, M., Schönlieb, CB. (eds) Medical Image Understanding and Analysis. MIUA 2022. Lecture Notes in Computer Science, vol 13413. Springer, Cham. https://doi.org/10.1007/978-3-031-12053-4_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-12053-4_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-12052-7

  • Online ISBN: 978-3-031-12053-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics