Semi-Supervised Self-Produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-Conducted Signals Using Variational Autoencoder

Seki, Shogo; Takada, Moe; Toda, Tomoki

doi:10.21437/Interspeech.2020-2055

Semi-Supervised Self-Produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-Conducted Signals Using Variational Autoencoder

Shogo Seki, Moe Takada, Tomoki Toda

This paper proposes a semi-supervised method for enhancing and suppressing self-produced speech, using a variational autoencoder (VAE) to jointly model self-produced speech recorded with air- and body-conductive microphones. In speech enhancement and suppression for self-produced speech, body-conducted signals can be used as an acoustical clue since they are robust against external noise and include self-produced speech predominantly. We have previously developed a semi-supervised method taking an improved source modeling approach called the joint source modeling, which can capture a nonlinear correspondence of air- and body-conducted signals using non-negative matrix factorization (NMF). This allows enhanced and suppressed air-conducted self-produced speech to be prevented from contaminating by the characteristics of body-conducted signals. However, our previous method employs a rank-1 spatial model, which is effective but difficult to consider in more practical situations. Furthermore, joint source modeling depends on the representation capability of NMF. As a result, enhancement and suppression performances are limited. To overcome these limitations, this paper employs a full-rank spatial model and proposes a joint source modeling of air- and body-conducted signals using a VAE, which has shown to represent source signals more accurately than NMF. Experimental results revealed that the proposed method outperformed baseline methods.

doi: 10.21437/Interspeech.2020-2055

Cite as: Seki, S., Takada, M., Toda, T. (2020) Semi-Supervised Self-Produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-Conducted Signals Using Variational Autoencoder. Proc. Interspeech 2020, 4039-4043, doi: 10.21437/Interspeech.2020-2055

@inproceedings{seki20_interspeech,
  author={Shogo Seki and Moe Takada and Tomoki Toda},
  title={{Semi-Supervised Self-Produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-Conducted Signals Using Variational Autoencoder}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={4039--4043},
  doi={10.21437/Interspeech.2020-2055}
}