An enhanced vision transformer with scale-aware and spatial-aware attention for thighbone fracture detection

Guan, Bin; Yao, Jinkun; Zhang, Guoshan

doi:10.1007/s00521-024-09672-4

An enhanced vision transformer with scale-aware and spatial-aware attention for thighbone fracture detection

Original Article
Published: 16 April 2024

Volume 36, pages 11425–11438, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

474 Accesses
2 Citations
Explore all metrics

Abstract

Vision transformers (ViTs) have recently outperformed convolutional neural networks (CNNs) across a variety of deep learning tasks. In the field of orthopedic medicine, the thighbone serves as a critical support structure for the lower body, and a timely and accurate diagnosis of its fractures is important to preventing lifelong walking disabilities. Despite the successes of CNNs in the computer-aided diagnosis of thighbone fractures, the potential of ViTs in this realm remains unexplored. Consequently, we initially explored the direct application of off-the-shelf ViT models on thighbone fracture detection but found the results did not fully satisfy the requirement of radiologists. To address this gap, we propose a one-stage hybrid method that combines enhanced vision transformers with the CNN attention mechanisms, specifically for thighbone fracture detection. Our method improves a pyramid vision transformer architecture and employs overlapping patch embedding to preserve the local continuity in X-rays. For dynamic feature fusion across spatial and scale dimensions, we use a series of attention mechanisms consisting of two distinct types: scale-aware attention and spatial-aware attention. These mechanisms can integrate feature maps output from the neck structure, thereby improving the representation of thighbone fractures. We validate the proposed method using a meticulously curated dataset of 4000 thighbone X-rays, annotated by experienced radiologists. Ablation studies confirm the effectiveness of each modification in our proposed framework. Experimental results show that our method achieves an average precision (AP) of 53.7% and an $AP_{50}$ of 87.0%, thereby surpassing all previous state-of-the-art methods in thighbone fracture detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WCAY object detection of fractures for X-ray images of multiple sites

Article Open access 04 November 2024

Automatic Classification of Proximal Femur Fractures Based on Attention Models

Fracture Crack Recognition Based on YOLOv5

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Basha MAA, Ismail AAA, Imam AHF (2018) Does radiography still have a significant diagnostic role in evaluation of acute traumatic wrist injuries? A prospective comparative study. Emerg Radiol 25(2):129–138. https://doi.org/10.1007/s10140-017-1559-6
Article Google Scholar
Smith-Bindman R, Lipson J, Marcus R, Kim KP, Mahesh M, Gould R, Berrington De González A, Miglioretti DL (2009) Radiation dose associated with common computed tomography examinations and the associated lifetime attributable risk of cancer. Arch Intern Med 169(22):2078–2086. https://doi.org/10.1001/archinternmed.2009.427
Article Google Scholar
Hanel D, Daluiski A, Lachapelle A, Gupta A, Chopra S, Hotchkiss R, Gardner M, Potter H, Sicular S, Lindsey R, Mozer M, Daluiski A, Chopra S, Lachapelle A, Mozer M, Sicular S, Hanel D, Gardner M, Gupta A, Hotchkiss R, Potter H (2018) Deep neural network improves fracture detection by clinicians. Proc Natl Acad Sci 115(45):11591–11596. https://doi.org/10.1073/pnas.1806905115
Article Google Scholar
Yu H, Li J, Sun J, Zheng J, Wang S, Wang G, Ding Y, Zhao J, Zhang J (2022) Intelligent diagnosis algorithm for thyroid nodules based on deep learning and statistical features. Biomed Signal Process Control 78:103924. https://doi.org/10.1016/j.bspc.2022.103924
Article Google Scholar
Sun Y, Wang C (2022) A computation-efficient CNN system for high-quality brain tumor segmentation. Biomed Signal Process Control 74:103475. https://doi.org/10.1016/j.bspc.2021.103475
Article Google Scholar
Su Y, Cheng J, Yi M, Liu H (2022) FAPN: feature augmented pyramid network for polyp segmentation. Biomed Signal Process Control 78:103903. https://doi.org/10.1016/j.bspc.2022.103903
Article Google Scholar
Aiadi O, Khaldi B (2022) A fast lightweight network for the discrimination of COVID-19 and pulmonary diseases. Biomed Signal Process Control 78:103925. https://doi.org/10.1016/j.bspc.2022.103925
Article Google Scholar
Kim T, Goh TS, Lee JS, Lee JH, Kim H, Jung ID (2023) Transfer learning-based ensemble convolutional neural network for accelerated diagnosis of foot fractures. Phys Eng Sci Med 46:265–277. https://doi.org/10.1007/S13246-023-01215-W
Article Google Scholar
Wang HC, Wang SC, Yan JL, Ko LW (2023) Artificial intelligence model trained with sparse data to detect facial and cranial bone fractures from head ct. J Digit Imaging 36:1408–1418. https://doi.org/10.1007/S10278-023-00829-6/TABLES/3
Article Google Scholar
Su Y, Zhang X, Shangguan H, Li R (2023) Rib fracture detection in chest ct image based on a centernet network with heatmap pyramid structure. SIViP 17:2343–2350. https://doi.org/10.1007/S11760-022-02451-5/TABLES/5
Article Google Scholar
Joshi D, Singh TP, Joshi AK (2022) Deep learning-based localization and segmentation of wrist fractures on x-ray radiographs. Neural Comput Appl 34:19061–19077. https://doi.org/10.1007/S00521-022-07510-Z/FIGURES/11
Article Google Scholar
Wang Y, Li Y, Lin G, Zhang Q, Zhong J, Zhang Y, Ma K, Zheng Y, Lu G, Zhang Z (2023) Lower-extremity fatigue fracture detection and grading based on deep learning models of radiographs. Eur Radiol 33:555–565. https://doi.org/10.1007/S00330-022-08950-W/FIGURES/5
Article Google Scholar
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: 2017 IEEE international conference on computer vision (ICCV), pp 2980–2988. https://doi.org/10.1109/ICCV.2017.322
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031. arXiv:1506.01497
Article Google Scholar
Jones RM, Sharma A, Hotchkiss R, Sperling JW, Hamburger J, Ledig C, O’Toole R, Gardner M, Venkatesh S, Roberts MM, Sauvestre R, Shatkhin M, Gupta A, Chopra S, Kumaravel M, Daluiski A, Plogger W, Nascone J, Potter HG, Lindsey RV (2020) Assessment of a deep-learning system for fracture detection in musculoskeletal radiographs. NPJ Digit Med 3(1):1–6. https://doi.org/10.1038/s41746-020-00352-w
Article Google Scholar
Qi Y, Zhao J, Shi Y, Zuo G, Zhang H, Long Y, Wang F, Wang W (2020) Ground truth annotated femoral X-ray image dataset and object detection based method for fracture types classification. IEEE Access 8:189436–189444. https://doi.org/10.1109/ACCESS.2020.3029039
Article Google Scholar
Guan B, Zhang G, Yao J, Wang X, Wang M (2020) Arm fracture detection in X-rays based on improved deep convolutional neural network. Comput Electr Eng 81:106530. https://doi.org/10.1016/j.compeleceng.2019.106530
Article Google Scholar
Gao Y, Liu H, Jiang L, Yang C, Yin X, Coatrieux JL, Chen Y (2022) CCE-Net: a rib fracture diagnosis network based on contralateral, contextual, and edge enhanced modules. Biomed Signal Process Control 75:103620. https://doi.org/10.1016/j.bspc.2022.103620
Article Google Scholar
Wang W, Huang W, Lu Q, Chen J, Zhang M, Qiao J, Zhang Y (2022) Attention mechanism-based deep learning method for hairline fracture detection in hand X-rays. Neural Comput Appl 1–13. https://doi.org/10.1007/S00521-022-07412-0/TABLES/8
Lu S, Wang S, Wang G (2022) Automated universal fractures detection in X-ray images based on deep learning approach. Multimed Tools Appl 1–17. https://doi.org/10.1007/S11042-022-13287-Z/FIGURES/9
Rajpurkar P, Irvin J, Bagul A, Ding D, Duan T, Mehta H, Yang B, Zhu K, Laird D, Ball RL, Langlotz C, Shpanskaya K, Lungren MP, Ng AY (2018) MURA: large dataset for abnormality detection in musculoskeletal radiographs. arXiv http://arxiv.org/abs/1712.06957v4
Guan B, Yao J, Zhang G, Wang X (2019) Thigh fracture detection using deep learning method based on new dilated convolutional feature pyramid network. Pattern Recogn Lett 125:521–526. https://doi.org/10.1016/J.PATREC.2019.06.015
Article Google Scholar
Guan B, Yao J, Wang S, Zhang G, Zhang Y, Wang X, Wang M (2022) Automatic detection and localization of thighbone fractures in X-ray based on improved deep learning method. Comput Vis Image Underst 216(January):103345. https://doi.org/10.1016/j.cviu.2021.103345
Article Google Scholar
Liu Y, Zhang Y, Wang Y, Hou F, Yuan J, Tian J, Zhang Y, Shi Z, Fan J, He Z (2021) A survey of visual transformers. arXiv:2111.06091
Urakawa T, Tanaka Y, Goto S, Matsuzawa H, Watanabe K, Endo N (2019) Detecting intertrochanteric hip fractures with orthopedist-level accuracy using a deep convolutional neural network. Skeletal Radiol 48(2):239–244. https://doi.org/10.1007/s00256-018-3016-3
Article Google Scholar
Adams M, Chen W, Holcdorf D, McCusker MW, Howe PDL, Gaillard F (2019) Computer vs human: deep learning versus perceptual training for the detection of neck of femur fractures. J Med Imaging Radiat Oncol 63(1):27–32. https://doi.org/10.1111/1754-9485.12828
Article Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y. arXiv:1409.0575
Article MathSciNet Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Tanzi L, Vezzetti E, Moreno R, Moos S (2020) X-Ray bone fracture classification using deep learning: A baseline for designing a reliable approach. MDPI AG. https://doi.org/10.3390/app10041507
Kim T, Moon NH, Goh TS (2023) Jung ID (2023) Detection of incomplete atypical femoral fracture on anteroposterior radiographs via explainable artificial intelligence. Sci Rep 13:1–10. https://doi.org/10.1038/S41598-023-37560-9
Article Google Scholar
Wei J, Yao J, Zhanga G, Guan B, Zhang Y, Wang S (2022) Semi-supervised object detection based on single-stage detector for thighbone fracture localization. Neural Comput Appl 2023:1–15. https://doi.org/10.1007/S00521-023-09277-3/TABLES/12
Article Google Scholar
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings—30th IEEE conference on computer vision and pattern recognition, CVPR 2017, vol 2017. Institute of Electrical and Electronics Engineers Inc., pp 936–944. https://doi.org/10.1109/CVPR.2017.106
Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2018) Focal loss for dense object detection. IEEE Trans Pattern Anal Machine Intell 1:1. https://doi.org/10.1109/TPAMI.2018.2858826
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2016, pp 770–778. https://doi.org/10.1109/CVPR.2016.90arXiv:1512.03385
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR 2021—9th international conference on learning representations
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2:9756–9765. https://doi.org/10.1109/CVPR42600.2020.00978. arXiv:1912.02424
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 2017, pp 5999–6009
Wang W, Xie E, Li X, Fan DP, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE international conference on computer vision, pp 548–558. https://doi.org/10.1109/ICCV48922.2021.00061
Dai X, Chen Y, Xiao B, Chen D, Liu M, Yuan L, Zhang L (2021) Dynamic Head: Unifying Object Detection Heads with Attentions. arXiv. arXiv:2106.08322 [cs]. https://doi.org/10.48550/arXiv.2106.08322 . Accessed 2023 March 15
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable Convolutional Networks. Proceedings of the IEEE international conference on computer vision 2017, 6003 https://doi.org/10.1109/ICCV.2017.89arXiv:1611.00847
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8693 LNCS(PART 5), pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48. arXiv: 1405.0312
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) 12346 LNCS, 213–229. https://doi.org/10.1007/978-3-030-58452-8_13. arXiv: 2005.12872
Wang W, Xie E, Li X, Fan DP, Song K, Liang D, Lu T, Luo P, Shao L (2022) PVT v2: improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424. https://doi.org/10.1007/S41095-022-0274-8/METRICS. arXiv:2106.13797
Article Google Scholar
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra R-CNN: Towards balanced learning for object detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2019:821–830. https://doi.org/10.1109/CVPR.2019.00091
Article Google Scholar
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, Xiao B (2021) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Machine Intell 43(10):1. https://doi.org/10.1109/TPAMI.2020.2983686
Article Google Scholar
Zhang H, Li F, Liu S, Zhang L, Su H, Zhu J, Ni LM, Shum H-Y (2022) DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection. arXiv:2203.03605
Feng C, Zhong Y, Gao Y, Scott MR, Huang W (2021) TOOD: task-aligned one-stage object detection, pp 3490–3499 https://doi.org/10.48550/arxiv.2108.07755

Download references

Acknowledgements

The authors thank the radiologists in the Department of Radiology of Linyi People’s Hospital for their kind help in the construction of the dataset and the analysis of our experiment results. The work in this paper is supported by the National Natural Science Foundation of China under Grants 62073237.

Author information

Authors and Affiliations

School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China
Bin Guan & Guoshan Zhang
Department of Radiology, Linyi People’s Hospital, Linyi, 276100, Shandong, China
Jinkun Yao

Authors

Bin Guan
View author publications
You can also search for this author in PubMed Google Scholar
Jinkun Yao
View author publications
You can also search for this author in PubMed Google Scholar
Guoshan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoshan Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest. The dataset will be made available from the corresponding author on reasonable request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guan, B., Yao, J. & Zhang, G. An enhanced vision transformer with scale-aware and spatial-aware attention for thighbone fracture detection. Neural Comput & Applic 36, 11425–11438 (2024). https://doi.org/10.1007/s00521-024-09672-4

Download citation

Received: 10 September 2023
Accepted: 25 March 2024
Published: 16 April 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s00521-024-09672-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An enhanced vision transformer with scale-aware and spatial-aware attention for thighbone fracture detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

WCAY object detection of fractures for X-ray images of multiple sites

Automatic Classification of Proximal Femur Fractures Based on Attention Models

Fracture Crack Recognition Based on YOLOv5

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now