research-article

SPCPFS: a pseudo-label filtering strategy with fusion of perplexity and confidence

Authors:
Zhiqiang Liu

College of Data Scienceand Application, Inner Mongolia University of Technology, China

College of Data Scienceand Application, Inner Mongolia University of Technology, China

0000-0003-2459-6753
View Profile

,
Zhiqiang Ma

College of Data Scienceand Application, Inner Mongolia University of Technology, China

College of Data Scienceand Application, Inner Mongolia University of Technology, China

0000-0003-0006-2044
View Profile

,
Jiatai Wang

College of Data Scienceand Application, Inner Mongolia University of Technology, China

College of Data Scienceand Application, Inner Mongolia University of Technology, China

0000-0003-0887-1911
View Profile

,
Jinyi Li

College of Data Scienceand Application, Inner Mongolia University of Technology, China

College of Data Scienceand Application, Inner Mongolia University of Technology, China

0000-0003-4854-7497
View Profile

,
Jiaqi Sun

College of Data Scienceand Application, Inner Mongolia University of Technology, China

College of Data Scienceand Application, Inner Mongolia University of Technology, China

0000-0003-0283-874X
View Profile

,
Caijilahu Bao

College of Data Scienceand Application, Inner Mongolia University of Technology, China

College of Data Scienceand Application, Inner Mongolia University of Technology, China

0000-0001-8886-9592
View Profile

MLNLP '22: Proceedings of the 2022 5th International Conference on Machine Learning and Natural Language ProcessingDecember 2022Pages 383–389https://doi.org/10.1145/3578741.3578833

Published:06 March 2023Publication History

MLNLP '22: Proceedings of the 2022 5th International Conference on Machine Learning and Natural Language Processing

Pages 383–389

ABSTRACT

In the pseudo-label filtering for semi-supervised Mongolian speech recognition, the correctness of word combinations and the correctness of speech and word correspondences in the self-training set cannot be guaranteed simultaneously. To solve this problem, we propose a pseudo-label filtering strategy with fusion of perplexity and confidence, which is called sentence perplexity confidence. The strategy simultaneously evaluates the semantic relations of pseudo-labels and the correspondence between pseudo-labels and acoustic features of unlabeled speech, which improves the accuracy of the self-training set and thus the performance of the target speech recognition model output by semi-supervised training. We conducted ablation experiments and comparison experiments of sentence perplexity confidence on Mongolian datasets IMUT-MC and IMUT-MC-SMI. The experimental results show that the sentence perplexity confidence is ahead of the sentence-level confidence and perplexity in terms of accuracy improvement ability of the self-training set, and the output target speech recognition models reach 14.7% and 16.1% for WER and SER respectively.

References

Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, and Hao Zheng. 2017. Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline. In 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA). IEEE, 1–5.Google ScholarCross Ref
Delphine Charlet. 2001. Confidence-measure-driven unsupervised incremental adaptation for HMM-based speech recognition. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), Vol. 1. IEEE, 357–360.Google ScholarCross Ref
Chinggeltei. 1991. Mongolian grammar. Inner Mongolia People’s Publishing House.Google Scholar
XIE C D and GUO W. 2016. Semi-supervised Acoustic Modeling Based on Perplexity Data Selection. Pattern Recognition and Artificial Intelligence 29, 6(2016), 6.Google Scholar
Helin Dutağacı. 2002. Statistical language models for large vocabulary Turkish speech recognition. Ph.D. Dissertation. MS Thesis, Department of Computer Engineering, Bogazici University.Google Scholar
Alexandru-Lucian Georgescu, Cristian Manolache, Dan Oneaţă, Horia Cucu, and Corneliu Burileanu. 2021. Data-filtering methods for self-training of automatic speech recognition systems. In 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 1–7.Google ScholarCross Ref
Jacob Kahn, Ann Lee, and Awni Hannun. 2020. Self-training for end-to-end speech recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7084–7088.Google ScholarCross Ref
Naoyuki Kanda, Shoji Harada, Xugang Lu, and Hisashi Kawai. 2016. Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks.. In INTERSPEECH. 1325–1329.Google Scholar
Jeff Ma and Spyros Matsoukas. 2007. Unsupervised training on a large amount of Arabic broadcast news data. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, Vol. 2. IEEE, II–349.Google ScholarCross Ref
Morigen. 2016. The analysis and research on syntax of the traditional mongolian sentences based on rules. Ph.D. Dissertation. Inner Mongolia University.Google Scholar
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5206–5210.Google ScholarCross Ref
Karel Veselỳ, Lukás Burget, and Jan Cernockỳ. 2017. Semi-Supervised DNN Training with Word Selection for ASR.. In Interspeech. 3687–3691.Google Scholar
Karel Veselỳ, Mirko Hannemann, and Lukáš Burget. 2013. Semi-supervised training of deep neural networks. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, 267–272.Google ScholarCross Ref
Shane Walker, Morten Pedersen, Iroro Orife, and Jason Flaks. 2017. Semi-supervised model training for unbounded conversational speech recognition. arXiv preprint arXiv:1705.09724(2017).Google Scholar
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, 2018. Espnet: End-to-end speech processing toolkit. arXiv preprint arXiv:1804.00015(2018).Google Scholar
Wang Xilou, Guo Wu, and Xie Chuandong. 2018. Speech Recognition Based on Semi-supervised Data Selection via Decoding Multiple Candidate Results. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence 31, 7 (2018), 662–667.Google Scholar
Qian Yan-min and Liu Jia. 2013. Optimized data selection strategy based unsupervised acoustic modeling for low data resource speech recognition. Journal of Tsinghua University (Science and Technology) 53, 7(2013), 1001–1004.Google Scholar
Rong Zhang and A.I. Rudnicky. 2006. A New Data Selection Approach for Semi-Supervised Acoustic Modeling. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Vol. 1. I–I. https://doi.org/10.1109/ICASSP.2006.1660047Google ScholarCross Ref
Liu Zhiqiang, Ma Zhiqiang, Zhang Xiaoxu, Bao Caijilahu, Xie Xiulan, and Zhu Fangyuan. 2022. IMUT-MC: a speech corpus for Mongolian speech recognition. China Scientific Data 7, 2 (2022), 13.Google Scholar

Index Terms

SPCPFS: a pseudo-label filtering strategy with fusion of perplexity and confidence
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Semi-supervised learning settings

Recommendations

Semi-supervised and unsupervised discriminative language model training for automatic speech recognition

We investigate supervised, semi-supervised and unsupervised training of DLMs.We use supervised and unsupervised confusion models to generate artificial data.We propose three target output selection methods for unsupervised DLM training.Ranking ...
Read More
Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system

This paper describes a new technique of language modeling for a highly inflectional Dravidian language, Tamil. It aims to alleviate the main problems encountered in processing of Tamil language, like enormous vocabulary growth caused by the large number ...
Read More
Research on Pseudo-label Technology for Multi-label News Classification
Document Analysis and Recognition – ICDAR 2021
Abstract
Multi-label news classification exerts a significant importance with the growing size of news containing multiple semantics. However, most of the existing multi-label classification methods rely on large-scale labeled corpus while publicly ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MLNLP '22: Proceedings of the 2022 5th International Conference on Machine Learning and Natural Language Processing
December 2022
406 pages
ISBN:9781450399067
DOI:10.1145/3578741

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 March 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Confidence
Mongolian speech recognition;
Perplexity
Pseudo-label filtering
Semi-supervised training
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 26
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

SPCPFS: a pseudo-label filtering strategy with fusion of perplexity and confidence

MLNLP '22: Proceedings of the 2022 5th International Conference on Machine Learning and Natural Language Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semi-supervised and unsupervised discriminative language model training for automatic speech recognition

Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system

Research on Pseudo-label Technology for Multi-label News Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

SPCPFS: a pseudo-label filtering strategy with fusion of perplexity and confidence

MLNLP '22: Proceedings of the 2022 5th International Conference on Machine Learning and Natural Language Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semi-supervised and unsupervised discriminative language model training for automatic speech recognition

Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system

Research on Pseudo-label Technology for Multi-label News Classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media