Loading [MathJax]/extensions/TeX/ieeemacros.js
Low-Resourced Phonetic and Prosodic Feature Estimation With Self-Supervised-Learning-based Acoustic Modeling | IEEE Conference Publication | IEEE Xplore

Low-Resourced Phonetic and Prosodic Feature Estimation With Self-Supervised-Learning-based Acoustic Modeling


Abstract:

We propose a method of phonetic and prosodic feature estimation from speech that uses self-supervised-learning (SSL)-based acoustic modeling (AM). Due to the small amount...Show More

Abstract:

We propose a method of phonetic and prosodic feature estimation from speech that uses self-supervised-learning (SSL)-based acoustic modeling (AM). Due to the small amount of prosodic feature data, we use SSL for few-shot learning-based speech recognition. Prosodic features allow the symbolization of accent information in pitch-accent languages, which is important information for pronunciation. This method automatically generates labeled data of text-to-speech for pitch-accented language from speech only. In contrast, conventional methods can recognize only pitch accents in phonetic and prosodic features and often have low character error rates. Our method combines wav2vec 2.0, an SSL-based AM method with the Transformer architecture commonly used in natural language processing for correcting phonetic-confusion errors. The experiment indicates that our proposed method brings a 4.7%-character error rate with an SSL-based acoustic modeling with 5.69 hours fine-tuning data and phoneme-error-correction Transformer.
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 15 August 2024
ISBN Information:
Conference Location: Seoul, Korea, Republic of

References

References is not available for this document.