Conferences >2024 IEEE International Confe...

Low-Resourced Phonetic and Prosodic Feature Estimation With Self-Supervised-Learning-based Acoustic Modeling

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We propose a method of phonetic and prosodic feature estimation from speech that uses self-supervised-learning (SSL)-based acoustic modeling (AM). Due to the small amount...Show More

Metadata

Abstract:

We propose a method of phonetic and prosodic feature estimation from speech that uses self-supervised-learning (SSL)-based acoustic modeling (AM). Due to the small amount of prosodic feature data, we use SSL for few-shot learning-based speech recognition. Prosodic features allow the symbolization of accent information in pitch-accent languages, which is important information for pronunciation. This method automatically generates labeled data of text-to-speech for pitch-accented language from speech only. In contrast, conventional methods can recognize only pitch accents in phonetic and prosodic features and often have low character error rates. Our method combines wav2vec 2.0, an SSL-based AM method with the Transformer architecture commonly used in natural language processing for correcting phonetic-confusion errors. The experiment indicates that our proposed method brings a 4.7%-character error rate with an SSL-based acoustic modeling with 5.69 hours fine-tuning data and phoneme-error-correction Transformer.

Published in: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)

Date of Conference: 14-19 April 2024

Date Added to IEEE Xplore: 15 August 2024

ISBN Information:

DOI: 10.1109/ICASSPW62465.2024.10626112

Conference Location: Seoul, Korea, Republic of

Contents

References is not available for this document.

Low-Resourced Phonetic and Prosodic Feature Estimation With Self-Supervised-Learning-based Acoustic Modeling

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Low-Resourced Phonetic and Prosodic Feature Estimation With Self-Supervised-Learning-based Acoustic Modeling

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?