Abstract:
While recent advances in automatic speech recognition (ASR) systems involve neural architectures and large acoustic models with latent feature representations, problem ar...Show MoreMetadata
Abstract:
While recent advances in automatic speech recognition (ASR) systems involve neural architectures and large acoustic models with latent feature representations, problem arises for low-resource languages- especially when dealing with children’s speech that has very limited data. In this paper, recent data augmentation techniques such as spectral warping (SW), vocal tract length perturbation (VTLP), spectrogram augmentation (SpecAug), and MaskCycleGAN-VC were used on the Tagalog ISIP06 corpus. Moreover, experiments were designed to determine the optimal parameters for each technique and to evaluate the systems using different combinations of DA in terms of the word error rate (WER) and relative improvement (RI) upon testing with actual children’s speech data. Based on the results, the combined augmented data which yielded the best-performing system recorded the lowest WER of 11.72% corresponding to a 43.55% relative improvement with respect to the baseline system.
Date of Conference: 25-27 October 2023
Date Added to IEEE Xplore: 15 November 2023
ISBN Information: