Abstract:
Automated classification of gastrointestinal endoscope images can help reduce the workload of doctors and improve the accuracy of diagnoses. The rapidly developed vision ...Show MoreMetadata
Abstract:
Automated classification of gastrointestinal endoscope images can help reduce the workload of doctors and improve the accuracy of diagnoses. The rapidly developed vision Transformer, represented by Swin Transformer, has become an impressive technique for medical image classification. However, Swin Transformer cannot capture the long-range dependency well in complex gastrointestinal endoscopy images. As a result, it fails to represent features of some widely-spread targets in digestive tract images, such as normal-z-line and esophagitis, effectively. To solve this problem, we propose a novel vision Transformer model based on hybrid shifted windows for digestive tract image classification, which can obtain both short-range and long-range dependency concurrently. Extensive experiments demonstrate the superiority of our method to the state-of-the-art methods with a classification accuracy of 95.42% on the Kvasir v2 dataset and a classification accuracy of 86.81% on the HyperKvasir dataset.
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 33, Issue: 9, September 2023)