Presentation + Paper
7 April 2023 Audio-visual feature fusion for improved thoracic disease classification
Author Affiliations +
Abstract
In this work, we fuse imaging features from Chest X-Ray (CXR) scans and audio features from dictations of a radiologist to improve thoracic disease classification. Recent deep learning-based disease classification methods mostly use imaging modalities. Dictation audio from a radiologist contains rich auxiliary diseaserelated contextual information. The main hypothesis of this proposed work is that leveraging complementary imaging and audio representations improves disease classification. We use shifting window (Swin) transformer architectures as encoders for both visual and audio modalities and finally fuse the feature representations using cross-correlational feature multiplication fusion strategy. This fused feature representation is fed to a classification head for downstream disease classification. We experimentally show that the proposed fused model outperforms the individual modality models for multi-class thoracic disease classification that includes normal, pneumonia, and congestive heart failure cases. We report F1-score of 0.5415 and 0.5353 for shifting window transformer base and small architectures respectively, for fused modalities, while the corresponding baselines are reported at 0.5046 and 0.5076 for the audio modality and 0.4676 and 0.5261 for the imaging modality, respectively.
Conference Presentation
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Moinak Bhattacharya and Prateek Prasanna "Audio-visual feature fusion for improved thoracic disease classification", Proc. SPIE 12465, Medical Imaging 2023: Computer-Aided Diagnosis, 124651A (7 April 2023); https://doi.org/10.1117/12.2654571
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Diseases and disorders

Chest imaging

Image fusion

Feature fusion

Image classification

Back to Top