skip to main content
10.1145/3640824.3640861acmotherconferencesArticle/Chapter ViewAbstractPublication PagescceaiConference Proceedingsconference-collections
research-article

Exploring Accent Similarity for Cross-Accented Speech Recognition

Published: 08 March 2024 Publication History

Abstract

In recent years, speech recognition has made significant progress, but recognizing accented speech remains a challenge. Although multi-accent speech recognition models have exhibited remarkable capabilities across different accents, their performance still degrades when encountering low-resource accents. In this paper, we propose AccentFusion, a framework leveraging accent similarity to improve cross-accent speech recognition. AccentFusion employs an interaction-augmented module to capture accent similarities between source and target accents with fine-grained association. Additionally, we use fusion-guided loss to supervise the weights of the target accent while learning accent similarity, encouraging the model to focus on its primary attention on the target accent. During the inference, we fuse the source and target accent features based on the similarity. We evaluate AccentFusion on the CommonVoice corpus. Experiments demonstrate that fusing accent information improves over fine-tuning baseline, significantly reducing word error rates (WER) on low-resource accents.

References

[1]
Alëna Aksënova, Zhehuai Chen, Chung-Cheng Chiu, Daan van Esch, Pavel Golik, Wei Han, Levi King, Bhuvana Ramabhadran, Andrew Rosenberg, Suzan Schwartz, 2022. Accented speech recognition: Benchmarking, pre-training, and diverse data. arXiv preprint arXiv:2205.08014 (2022).
[2]
Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M Tyers, and Gregor Weber. 2019. Common voice: A massively-multilingual speech corpus. arXiv preprint arXiv:1912.06670 (2019).
[3]
Yi-Chen Chen, Zhaojun Yang, Ching-Feng Yeh, Mahaveer Jain, and Michael L Seltzer. 2020. Aipnet: Generative adversarial pre-training of accent-invariant networks for end-to-end speech recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6979–6983.
[4]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126–1135.
[5]
Jiatao Gu, Yong Wang, Yun Chen, Kyunghyun Cho, and Victor OK Li. 2018. Meta-learning for low-resource neural machine translation. arXiv preprint arXiv:1808.08437 (2018).
[6]
Tao Han, Hantao Huang, Ziang Yang, and Wei Han. 2021. Supervised contrastive learning for accented speech recognition. arXiv preprint arXiv:2107.00921 (2021).
[7]
Arthur Hinsvark, Natalie Delworth, Miguel Del Rio, Quinten McNamara, Joshua Dong, Ryan Westerman, Michelle Huang, Joseph Palakapilly, Jennifer Drexler, Ilya Pirkin, 2021. Accented speech recognition: A survey. arXiv preprint arXiv:2104.10747 (2021).
[8]
Wenxin Hou, Yue Dong, Bairong Zhuang, Longfei Yang, Jiatong Shi, and Takahiro Shinozaki. 2020. Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning. In Proc. Interspeech 2020. 1037–1041. https://doi.org/10.21437/Interspeech.2020-2164
[9]
Wenxin Hou, Yidong Wang, Shengzhou Gao, and Takahiro Shinozaki. 2021. Meta-adapter: Efficient cross-lingual adaptation with meta-learning. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7028–7032.
[10]
Jui-Yang Hsu, Yuan-Jui Chen, and Hung-yi Lee. 2020. Meta learning for end-to-end low-resource speech recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7844–7848.
[11]
Hu Hu, Xuesong Yang, Zeynab Raeesy, Jinxi Guo, Gokce Keskin, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, and Roland Maas. 2021. Redat: Accent-invariant representation for end-to-end asr by domain adversarial training with relabeling. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6408–6412.
[12]
Abhinav Jain, Minali Upreti, and Preethi Jyothi. 2018. Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning. In Interspeech. 2454–2458.
[13]
Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226 (2018).
[14]
Bo Li, Tara N Sainath, Khe Chai Sim, Michiel Bacchiani, Eugene Weinstein, Patrick Nguyen, Zhifeng Chen, Yanghui Wu, and Kanishka Rao. 2018. Multi-dialect speech recognition with a single sequence-to-sequence model. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 4749–4753.
[15]
Jialu Li, Vimal Manohar, Pooja Chitkara, Andros Tjandra, Michael Picheny, Frank Zhang, Xiaohui Zhang, and Yatharth Saraf. 2021. Accent-robust automatic speech recognition using supervised and unsupervised wav2vec embeddings. arXiv preprint arXiv:2110.03520 (2021).
[16]
Arun Narayanan and DeLiang Wang. 2014. Joint noise adaptive training for robust automatic speech recognition. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2504–2508.
[17]
Yanmin Qian, Xun Gong, and Houjun Huang. 2022. Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 2842–2853. https://doi.org/10.1109/taslp.2022.3198546
[18]
Sining Sun, Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, and Lei Xie. 2018. Domain adversarial training for accented speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4854–4858.
[19]
Katrin Tomanek, Vicky Zayats, Dirk Padfield, Kara Vaillancourt, and Fadi Biadsy. 2021. Residual adapters for parameter-efficient ASR adaptation to atypical and accented speech. arXiv preprint arXiv:2109.06952 (2021).
[20]
Shinji Watanabe, Takaaki Hori, and John R Hershey. 2017. Language independent end-to-end architecture for joint language identification and speech recognition. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 265–271.
[21]
Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, and Pascale Fung. 2020. Learning fast adaptation on cross-accented speech recognition. arXiv preprint arXiv:2003.01901 (2020).
[22]
Jicheng Zhang, Yizhou Peng, Pham Van Tung, Haihua Xu, Hao Huang, and Eng Siong Chng. 2021. E2E-based multi-task learning approach to joint speech and accent recognition. arXiv preprint arXiv:2106.08211 (2021).

Index Terms

  1. Exploring Accent Similarity for Cross-Accented Speech Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CCEAI '24: Proceedings of the 2024 8th International Conference on Control Engineering and Artificial Intelligence
    January 2024
    297 pages
    ISBN:9798400707971
    DOI:10.1145/3640824
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 March 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Accented speech recognition
    2. Cross-accent speech recognition
    3. Domain adaptation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Zhejiang Electric Power Co., Ltd.

    Conference

    CCEAI 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 59
      Total Downloads
    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media