Loading [a11y]/accessibility-menu.js
Harmonic-plus-Noise Network with Linear Prediction and Perceptual Weighting Filters for Filipino Speech Synthesis | IEEE Conference Publication | IEEE Xplore

Harmonic-plus-Noise Network with Linear Prediction and Perceptual Weighting Filters for Filipino Speech Synthesis


Abstract:

Vocoders in Text-To-Speech (TTS) systems are responsible for converting acoustic feature representations such as the Mel Spectrogram to the sound waveform. Recent develop...Show More

Abstract:

Vocoders in Text-To-Speech (TTS) systems are responsible for converting acoustic feature representations such as the Mel Spectrogram to the sound waveform. Recent developments in vocoders, such as WaveRNN [1], Parallel WaveGAN [2], HiFi-GAN [3], and Diffusion models [4], [5], mostly have introduced neural architectures outperforming traditional architectures like those using the Griffin-Lim algorithm (GLA)[6]. In this paper, a multi-band Parallel WaveGAN architecture (PWG), the Harmonic-plus-Noise (H+N) vocoder, is trained, implemented, and combined with two types of filters: a) Linear Prediction (LP) filter and b) Perceptual Weighting (PW) filter to improve the TTS quality in Filipino language. Based on the results, HN-PWG garnered the highest total MOS at 4.59 ± 0.10, closely followed by HN-PWG-PW at 4.58 ± 0.07 with no statistically significant difference between the two. All the implemented H+N systems were able to outperform the Tacotron2-based Filipino TTS using a WaveGlow vocoder based on the MOS.
Date of Conference: 25-27 October 2023
Date Added to IEEE Xplore: 15 November 2023
ISBN Information:

ISSN Information:

Conference Location: Bucharest, Romania

Contact IEEE to Subscribe

References

References is not available for this document.