Abstract
Deep learning techniques have been increasingly applied to provide more accurate results in the classification of medical images and in the classification and generation of report texts. The main objective of this paper is to investigate the influence of fusing several features of heterogeneous modalities to improve musculoskeletal abnormality detection in comparison with the individual results of image and text classification. In this work, we propose a novel image-text classification framework, named ImTeNet, to learn relevant features from image and text information for binary classification of musculoskeletal radiography. Initially, we use a caption generator model to artificially create textual data for a dataset lacking text information. Then, we apply the ImTeNet, a multi-modal information model that consists of two distinct networks, DenseNet-169 and BERT, to perform image and text classification tasks respectively, and a fusion module that receives a concatenation of feature vectors extracted from both. To evaluate our proposed approach, we used the Musculoskeletal Radiographs (MURA) dataset and compare the results obtained with image and text classification scheme individually.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
No additional datasets were used for training.
- 2.
If the normal and abnormal classification occurrences are equal, we perform an arithmetic mean of the probabilities.
References
Annarumma, M., Withey, S.J., Bakewell, R.J., Pesce, E., Goh, V., Montana, G.: Automated triaging of adult chest radiographs with deep artificial neural networks. Radiology 291(1), 196–202 (2019)
Beltagy, I., Lo, K., Cohan, A.: SciBERT: A Pretrained Language Model for Scientific Text. arXiv preprint arXiv:1903.10676 (2019)
Chen, B., Li, J., Guo, X., Lu, G.: DualCheXNet: dual asymmetric feature learning for thoracic disease classification in chest X-rays. Biomed. Signal Process. Control 53, 101554 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Drozdov, I., Forbes, D., Szubert, B., Hall, M., Carlin, C., Lowe, D.J.: Supervised and unsupervised language modelling in chest X-ray radiological reports. PLoS ONE 15(3), e0229963 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Jing, B., Xie, P., Xing, E.P.: On the automatic generation of medical imaging reports. In: 56th Annual Meeting of the Association for Computational Linguistics - Proceedings of the Conference (Long Papers), vol. 1, pp. 2577–2586 (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization, pp. 1–15. arXiv preprint arXiv:1412.6980 (2014)
Kooi, T., et al.: Large scale deep learning for computer aided detection of mammographic lesions. Med. Image Anal. 35, 303–312 (2017)
Pelka, O., Koitka, S., Rückert, J., Nensa, F., Friedrich, C.M.: Radiology objects in context (ROCO): a multimodal image dataset. In: Stoyanov, D., et al. (eds.) LABELS/CVII/STENT -2018. LNCS, vol. 11043, pp. 180–189. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01364-6_20
Pelka, O., Nensa, F., Friedrich, C.M.: Branding - fusion of meta data and musculoskeletal radiographs for multi-modal diagnostic recognition. In: International Conference on Computer Vision Workshop (ICCV), pp. 467–475 (2019)
Rajpurkar, P., et al.: MURA: large dataset for abnormality detection in musculoskeletal radiographs. arXiv preprint arXiv:1712.06957 (2017)
Ranjan, E., Paul, S., Kapoor, S., Kar, A., Sethuraman, R., Sheet, D.: Jointly learning convolutional representations to compress radiological images and classify thoracic diseases in the compressed domain. In: 11th Indian Conference on Computer Vision, Graphics and Image Processing, pp. 1–8 (2018)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Smit, A., Jain, S., Rajpurkar, P., Pareek, A., Ng, A.Y., Lungren, M.P.: CheXbert: combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. arXiv preprint arXiv:2004.09167 (2020)
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008 (2017)
Vig, J.: A Multiscale visualization of attention in the transformer model, pp. 1–6. arXiv preprint arXiv:1906.05714 (2019)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 7(12), 3156–3164 (2015)
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Acknowledments
The authors would like to thank FAPESP (grants #2015/11937-9, #2017/12646-3, #2017/16246-0, #2017/12646-3 and #2019/20875-8), CNPq (grants #304380/2018-0 and #309330/2018-1) and CAPES for their financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Braz, L., Teixeira, V., Pedrini, H., Dias, Z. (2020). ImTeNet: Image-Text Classification Network for Abnormality Detection and Automatic Reporting on Musculoskeletal Radiographs. In: Setubal, J.C., Silva, W.M. (eds) Advances in Bioinformatics and Computational Biology. BSB 2020. Lecture Notes in Computer Science(), vol 12558. Springer, Cham. https://doi.org/10.1007/978-3-030-65775-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-65775-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-65774-1
Online ISBN: 978-3-030-65775-8
eBook Packages: Computer ScienceComputer Science (R0)