Abstract
Vision transformers (ViTs) have recently outperformed convolutional neural networks (CNNs) across a variety of deep learning tasks. In the field of orthopedic medicine, the thighbone serves as a critical support structure for the lower body, and a timely and accurate diagnosis of its fractures is important to preventing lifelong walking disabilities. Despite the successes of CNNs in the computer-aided diagnosis of thighbone fractures, the potential of ViTs in this realm remains unexplored. Consequently, we initially explored the direct application of off-the-shelf ViT models on thighbone fracture detection but found the results did not fully satisfy the requirement of radiologists. To address this gap, we propose a one-stage hybrid method that combines enhanced vision transformers with the CNN attention mechanisms, specifically for thighbone fracture detection. Our method improves a pyramid vision transformer architecture and employs overlapping patch embedding to preserve the local continuity in X-rays. For dynamic feature fusion across spatial and scale dimensions, we use a series of attention mechanisms consisting of two distinct types: scale-aware attention and spatial-aware attention. These mechanisms can integrate feature maps output from the neck structure, thereby improving the representation of thighbone fractures. We validate the proposed method using a meticulously curated dataset of 4000 thighbone X-rays, annotated by experienced radiologists. Ablation studies confirm the effectiveness of each modification in our proposed framework. Experimental results show that our method achieves an average precision (AP) of 53.7% and an \(AP_{50}\) of 87.0%, thereby surpassing all previous state-of-the-art methods in thighbone fracture detection.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Basha MAA, Ismail AAA, Imam AHF (2018) Does radiography still have a significant diagnostic role in evaluation of acute traumatic wrist injuries? A prospective comparative study. Emerg Radiol 25(2):129–138. https://doi.org/10.1007/s10140-017-1559-6
Smith-Bindman R, Lipson J, Marcus R, Kim KP, Mahesh M, Gould R, Berrington De González A, Miglioretti DL (2009) Radiation dose associated with common computed tomography examinations and the associated lifetime attributable risk of cancer. Arch Intern Med 169(22):2078–2086. https://doi.org/10.1001/archinternmed.2009.427
Hanel D, Daluiski A, Lachapelle A, Gupta A, Chopra S, Hotchkiss R, Gardner M, Potter H, Sicular S, Lindsey R, Mozer M, Daluiski A, Chopra S, Lachapelle A, Mozer M, Sicular S, Hanel D, Gardner M, Gupta A, Hotchkiss R, Potter H (2018) Deep neural network improves fracture detection by clinicians. Proc Natl Acad Sci 115(45):11591–11596. https://doi.org/10.1073/pnas.1806905115
Yu H, Li J, Sun J, Zheng J, Wang S, Wang G, Ding Y, Zhao J, Zhang J (2022) Intelligent diagnosis algorithm for thyroid nodules based on deep learning and statistical features. Biomed Signal Process Control 78:103924. https://doi.org/10.1016/j.bspc.2022.103924
Sun Y, Wang C (2022) A computation-efficient CNN system for high-quality brain tumor segmentation. Biomed Signal Process Control 74:103475. https://doi.org/10.1016/j.bspc.2021.103475
Su Y, Cheng J, Yi M, Liu H (2022) FAPN: feature augmented pyramid network for polyp segmentation. Biomed Signal Process Control 78:103903. https://doi.org/10.1016/j.bspc.2022.103903
Aiadi O, Khaldi B (2022) A fast lightweight network for the discrimination of COVID-19 and pulmonary diseases. Biomed Signal Process Control 78:103925. https://doi.org/10.1016/j.bspc.2022.103925
Kim T, Goh TS, Lee JS, Lee JH, Kim H, Jung ID (2023) Transfer learning-based ensemble convolutional neural network for accelerated diagnosis of foot fractures. Phys Eng Sci Med 46:265–277. https://doi.org/10.1007/S13246-023-01215-W
Wang HC, Wang SC, Yan JL, Ko LW (2023) Artificial intelligence model trained with sparse data to detect facial and cranial bone fractures from head ct. J Digit Imaging 36:1408–1418. https://doi.org/10.1007/S10278-023-00829-6/TABLES/3
Su Y, Zhang X, Shangguan H, Li R (2023) Rib fracture detection in chest ct image based on a centernet network with heatmap pyramid structure. SIViP 17:2343–2350. https://doi.org/10.1007/S11760-022-02451-5/TABLES/5
Joshi D, Singh TP, Joshi AK (2022) Deep learning-based localization and segmentation of wrist fractures on x-ray radiographs. Neural Comput Appl 34:19061–19077. https://doi.org/10.1007/S00521-022-07510-Z/FIGURES/11
Wang Y, Li Y, Lin G, Zhang Q, Zhong J, Zhang Y, Ma K, Zheng Y, Lu G, Zhang Z (2023) Lower-extremity fatigue fracture detection and grading based on deep learning models of radiographs. Eur Radiol 33:555–565. https://doi.org/10.1007/S00330-022-08950-W/FIGURES/5
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: 2017 IEEE international conference on computer vision (ICCV), pp 2980–2988. https://doi.org/10.1109/ICCV.2017.322
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031. arXiv:1506.01497
Jones RM, Sharma A, Hotchkiss R, Sperling JW, Hamburger J, Ledig C, O’Toole R, Gardner M, Venkatesh S, Roberts MM, Sauvestre R, Shatkhin M, Gupta A, Chopra S, Kumaravel M, Daluiski A, Plogger W, Nascone J, Potter HG, Lindsey RV (2020) Assessment of a deep-learning system for fracture detection in musculoskeletal radiographs. NPJ Digit Med 3(1):1–6. https://doi.org/10.1038/s41746-020-00352-w
Qi Y, Zhao J, Shi Y, Zuo G, Zhang H, Long Y, Wang F, Wang W (2020) Ground truth annotated femoral X-ray image dataset and object detection based method for fracture types classification. IEEE Access 8:189436–189444. https://doi.org/10.1109/ACCESS.2020.3029039
Guan B, Zhang G, Yao J, Wang X, Wang M (2020) Arm fracture detection in X-rays based on improved deep convolutional neural network. Comput Electr Eng 81:106530. https://doi.org/10.1016/j.compeleceng.2019.106530
Gao Y, Liu H, Jiang L, Yang C, Yin X, Coatrieux JL, Chen Y (2022) CCE-Net: a rib fracture diagnosis network based on contralateral, contextual, and edge enhanced modules. Biomed Signal Process Control 75:103620. https://doi.org/10.1016/j.bspc.2022.103620
Wang W, Huang W, Lu Q, Chen J, Zhang M, Qiao J, Zhang Y (2022) Attention mechanism-based deep learning method for hairline fracture detection in hand X-rays. Neural Comput Appl 1–13. https://doi.org/10.1007/S00521-022-07412-0/TABLES/8
Lu S, Wang S, Wang G (2022) Automated universal fractures detection in X-ray images based on deep learning approach. Multimed Tools Appl 1–17. https://doi.org/10.1007/S11042-022-13287-Z/FIGURES/9
Rajpurkar P, Irvin J, Bagul A, Ding D, Duan T, Mehta H, Yang B, Zhu K, Laird D, Ball RL, Langlotz C, Shpanskaya K, Lungren MP, Ng AY (2018) MURA: large dataset for abnormality detection in musculoskeletal radiographs. arXiv http://arxiv.org/abs/1712.06957v4
Guan B, Yao J, Zhang G, Wang X (2019) Thigh fracture detection using deep learning method based on new dilated convolutional feature pyramid network. Pattern Recogn Lett 125:521–526. https://doi.org/10.1016/J.PATREC.2019.06.015
Guan B, Yao J, Wang S, Zhang G, Zhang Y, Wang X, Wang M (2022) Automatic detection and localization of thighbone fractures in X-ray based on improved deep learning method. Comput Vis Image Underst 216(January):103345. https://doi.org/10.1016/j.cviu.2021.103345
Liu Y, Zhang Y, Wang Y, Hou F, Yuan J, Tian J, Zhang Y, Shi Z, Fan J, He Z (2021) A survey of visual transformers. arXiv:2111.06091
Urakawa T, Tanaka Y, Goto S, Matsuzawa H, Watanabe K, Endo N (2019) Detecting intertrochanteric hip fractures with orthopedist-level accuracy using a deep convolutional neural network. Skeletal Radiol 48(2):239–244. https://doi.org/10.1007/s00256-018-3016-3
Adams M, Chen W, Holcdorf D, McCusker MW, Howe PDL, Gaillard F (2019) Computer vs human: deep learning versus perceptual training for the detection of neck of femur fractures. J Med Imaging Radiat Oncol 63(1):27–32. https://doi.org/10.1111/1754-9485.12828
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y. arXiv:1409.0575
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Tanzi L, Vezzetti E, Moreno R, Moos S (2020) X-Ray bone fracture classification using deep learning: A baseline for designing a reliable approach. MDPI AG. https://doi.org/10.3390/app10041507
Kim T, Moon NH, Goh TS (2023) Jung ID (2023) Detection of incomplete atypical femoral fracture on anteroposterior radiographs via explainable artificial intelligence. Sci Rep 13:1–10. https://doi.org/10.1038/S41598-023-37560-9
Wei J, Yao J, Zhanga G, Guan B, Zhang Y, Wang S (2022) Semi-supervised object detection based on single-stage detector for thighbone fracture localization. Neural Comput Appl 2023:1–15. https://doi.org/10.1007/S00521-023-09277-3/TABLES/12
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings—30th IEEE conference on computer vision and pattern recognition, CVPR 2017, vol 2017. Institute of Electrical and Electronics Engineers Inc., pp 936–944. https://doi.org/10.1109/CVPR.2017.106
Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2018) Focal loss for dense object detection. IEEE Trans Pattern Anal Machine Intell 1:1. https://doi.org/10.1109/TPAMI.2018.2858826
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2016, pp 770–778. https://doi.org/10.1109/CVPR.2016.90arXiv:1512.03385
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR 2021—9th international conference on learning representations
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2:9756–9765. https://doi.org/10.1109/CVPR42600.2020.00978. arXiv:1912.02424
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 2017, pp 5999–6009
Wang W, Xie E, Li X, Fan DP, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE international conference on computer vision, pp 548–558. https://doi.org/10.1109/ICCV48922.2021.00061
Dai X, Chen Y, Xiao B, Chen D, Liu M, Yuan L, Zhang L (2021) Dynamic Head: Unifying Object Detection Heads with Attentions. arXiv. arXiv:2106.08322 [cs]. https://doi.org/10.48550/arXiv.2106.08322 . Accessed 2023 March 15
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable Convolutional Networks. Proceedings of the IEEE international conference on computer vision 2017, 6003 https://doi.org/10.1109/ICCV.2017.89arXiv:1611.00847
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8693 LNCS(PART 5), pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48. arXiv: 1405.0312
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) 12346 LNCS, 213–229. https://doi.org/10.1007/978-3-030-58452-8_13. arXiv: 2005.12872
Wang W, Xie E, Li X, Fan DP, Song K, Liang D, Lu T, Luo P, Shao L (2022) PVT v2: improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424. https://doi.org/10.1007/S41095-022-0274-8/METRICS. arXiv:2106.13797
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra R-CNN: Towards balanced learning for object detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2019:821–830. https://doi.org/10.1109/CVPR.2019.00091
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, Xiao B (2021) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Machine Intell 43(10):1. https://doi.org/10.1109/TPAMI.2020.2983686
Zhang H, Li F, Liu S, Zhang L, Su H, Zhu J, Ni LM, Shum H-Y (2022) DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection. arXiv:2203.03605
Feng C, Zhong Y, Gao Y, Scott MR, Huang W (2021) TOOD: task-aligned one-stage object detection, pp 3490–3499 https://doi.org/10.48550/arxiv.2108.07755
Acknowledgements
The authors thank the radiologists in the Department of Radiology of Linyi People’s Hospital for their kind help in the construction of the dataset and the analysis of our experiment results. The work in this paper is supported by the National Natural Science Foundation of China under Grants 62073237.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest. The dataset will be made available from the corresponding author on reasonable request.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guan, B., Yao, J. & Zhang, G. An enhanced vision transformer with scale-aware and spatial-aware attention for thighbone fracture detection. Neural Comput & Applic 36, 11425–11438 (2024). https://doi.org/10.1007/s00521-024-09672-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09672-4