Audio-visual feature fusion for improved thoracic disease classification

Moinak Bhattacharya; Prateek Prasanna

doi:10.1117/12.2654571

7 April 2023 Audio-visual feature fusion for improved thoracic disease classification

Proceedings Volume 12465, Medical Imaging 2023: Computer-Aided Diagnosis; 124651A (2023) https://doi.org/10.1117/12.2654571
Event: SPIE Medical Imaging, 2023, San Diego, California, United States

Abstract

In this work, we fuse imaging features from Chest X-Ray (CXR) scans and audio features from dictations of a radiologist to improve thoracic disease classification. Recent deep learning-based disease classification methods mostly use imaging modalities. Dictation audio from a radiologist contains rich auxiliary diseaserelated contextual information. The main hypothesis of this proposed work is that leveraging complementary imaging and audio representations improves disease classification. We use shifting window (Swin) transformer architectures as encoders for both visual and audio modalities and finally fuse the feature representations using cross-correlational feature multiplication fusion strategy. This fused feature representation is fed to a classification head for downstream disease classification. We experimentally show that the proposed fused model outperforms the individual modality models for multi-class thoracic disease classification that includes normal, pneumonia, and congestive heart failure cases. We report F1-score of 0.5415 and 0.5353 for shifting window transformer base and small architectures respectively, for fused modalities, while the corresponding baselines are reported at 0.5046 and 0.5076 for the audio modality and 0.4676 and 0.5261 for the imaging modality, respectively.

Conference Presentation

Citation Download Citation

Moinak Bhattacharya and Prateek Prasanna "Audio-visual feature fusion for improved thoracic disease classification", Proc. SPIE 12465, Medical Imaging 2023: Computer-Aided Diagnosis, 124651A (7 April 2023); https://doi.org/10.1117/12.2654571

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available