Empirical Analysis of Generalized Iterative Speech Separation Networks

Luo, Yi; Han, Cong; Mesgarani, Nima

doi:10.21437/Interspeech.2021-1161

Empirical Analysis of Generalized Iterative Speech Separation Networks

Yi Luo, Cong Han, Nima Mesgarani

Although most existing speech separation networks are designed as a one-pass pipeline where the sources are directly estimated from the mixture, multi-pass or iterative pipelines have been shown to be effective by designing multiple rounds of separation and utilizing separation outputs from a previous iteration as additional inputs for the next iteration. Moreover, such iterative separation pipeline can also be extended to a more general framework where a training objective designed to minimize the discrepancy between the estimated and target sources is applied to different parts of the network. In this paper, we empirically investigate the effect of such generalized iterative separation pipeline by adjusting its configuration in multiple aspects in both training and inference phases. For the training phase, we compare the separation performance of both time-domain and frequency-domain networks with different numbers of iterations following the recent discussions on the model architecture organizations. We also evaluate the effect of parameter sharing across iterations and the necessity of additional training objectives. For the inference phase, we measure the separation performance of various numbers of iterations. Our results show that iterative speech separation is a promising direction and deserves more in-depth analysis and exploration.

doi: 10.21437/Interspeech.2021-1161

Cite as: Luo, Y., Han, C., Mesgarani, N. (2021) Empirical Analysis of Generalized Iterative Speech Separation Networks. Proc. Interspeech 2021, 3485-3489, doi: 10.21437/Interspeech.2021-1161

@inproceedings{luo21e_interspeech,
  author={Yi Luo and Cong Han and Nima Mesgarani},
  title={{Empirical Analysis of Generalized Iterative Speech Separation Networks}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3485--3489},
  doi={10.21437/Interspeech.2021-1161}
}