ISCA Archive Interspeech 2020
ISCA Archive Interspeech 2020

Air-Tissue Boundary Segmentation in Real Time Magnetic Resonance Imaging Video Using 3-D Convolutional Neural Network

Renuka Mannem, Navaneetha Gaddam, Prasanta Kumar Ghosh

The real-time Magnetic Resonance Imaging (rtMRI) is often used for speech production research as it captures the complete view of the vocal tract during speech. Air-tissue boundaries (ATBs) are the contours that trace the transition between high-intensity tissue region and low-intensity airway cavity region in an rtMRI video. The ATBs are used in several speech related applications. However, the ATB segmentation is a challenging task as the rtMRI frames have low resolution and low signal-to-noise ratio. Several works have been proposed in the past for ATB segmentation. Among these, the supervised algorithms have been shown to perform well compared to the unsupervised algorithms. However, the supervised algorithms have limited generalizability towards subjects not involved in training. In this work, we propose a 3-dimensional convolutional neural network (3D-CNN) which utilizes both spatial and temporal information from the rtMRI video for accurate ATB segmentation. The 3D-CNN model captures the vocal tract dynamics in an rtMRI video independent of the morphology of the subject leading to an accurate ATB segmentation for unseen subjects. In a leave-one-subject-out experimental setup, it is observed that the proposed approach provides ~32% relative improvement in the performance compared to the best (SegNet based) baseline approach.


doi: 10.21437/Interspeech.2020-2241

Cite as: Mannem, R., Gaddam, N., Ghosh, P.K. (2020) Air-Tissue Boundary Segmentation in Real Time Magnetic Resonance Imaging Video Using 3-D Convolutional Neural Network. Proc. Interspeech 2020, 1396-1400, doi: 10.21437/Interspeech.2020-2241

@inproceedings{mannem20_interspeech,
  author={Renuka Mannem and Navaneetha Gaddam and Prasanta Kumar Ghosh},
  title={{Air-Tissue Boundary Segmentation in Real Time Magnetic Resonance Imaging Video Using 3-D Convolutional Neural Network}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1396--1400},
  doi={10.21437/Interspeech.2020-2241}
}