research-article

Open access

Application of end-to-end speech recognition in Chinese chess software

Authors:

Yangting Qiu,

Hongwei Cao,

Lin ChengAuthors Info & Claims

ICIEAI '23: Proceedings of the 2023 International Conference on Information Education and Artificial Intelligence

Pages 460 - 465

https://doi.org/10.1145/3660043.3660125

Published: 30 May 2024 Publication History

All formats PDF

Abstract

This paper proposes the integration of end-to-end speech recognition technology into Chinese chess software. We collected and presented a dataset of Chinese chess terminology instructions, and extracting the Mel Frequency Cepstral Coefficients (MFCC) from the speech signals to serve as features. Additionally, we employed the Wenet platform to establish an end-to-end speech recognition model and developed a Chinese chess software system for speech recognition and control. Through experimental we demonstrated this system has the capability to achieve accurate speech recognition and control for playing Chinese chess.

References

[1]

R. C, E. D, N. T, G. V, S. K, W. F, J. G T, Emotion recognition in human-computer interaction[J], IEEE Signal Processing Magazine, 2001, 18(1): 32-80.

Crossref

Google Scholar

[2]

geoffrey E H, li D, dong Y, george E D, abdelrahman M, navdeep J, andrew W S, vincent V, patrick N, tara N S, brian K, Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups[J], IEEE Signal Processing Magazine, 2012, 29(6): 82-97.

Crossref

Google Scholar

[3]

Awni Y H, Carl C, Jared C, Bryan C C, Greg D, Erich E, Ryan P, Sanjeev S, Shubho S, Adam C, Andrew Y N, Deep Speech: Scaling up end-to-end speech recognition.[J], Computing Research Repository, 2014, abs/1412.5567()

Google Scholar

[4]

William C, Navdeep J, Quoc V L, Oriol V, Listen, Attend And Spell: A Neural Network For Large Vocabulary Conversational Speech Recognition[C], IEEE International Conference on Acoustics, Speech, and Signal Processing, 2016: 4960-4964.

Google Scholar

[5]

Alex Graves, Navdeep Jaitly. Towards End-To-End Speech Recognition with Recurrent Neural Networks.[C], International Conference on Machine Learning, 2014: 1764-1772.

Google Scholar

[6]

daniel P, arnab G, gilles B, lukas B, ondrej G, nagendra G, mirko H, petr M, yanmin Q, petr S, jan S, georg S, karel V, The Kaldi Speech Recognition Toolkit [C], Automatic Speech Recognition & Understanding, 2012.

Google Scholar

[7]

Julián Salazar, Katrin Kirchhoff, Zhiheng Huang. Self-Attention Networks For Connectionist Temporal Classification In Speech Recognition[J], arXiv: Audio and Speech Processing, 2019, abs/1901.10055(): 7115-7119.

Google Scholar

[8]

Ashish V, Noam S, Niki P, Jakob U, Llion J, Aidan N G, Lukasz K, Illia P, Attention Is All You Need.[C], Conference on Neural Information Processing Systems, 2017, 30(): 5998-6008.

Google Scholar

[9]

Shinji W, Takaaki H, Suyoun K, John R H, Tomoki H, Hybrid CTC/Attention Architecture for End-to-End Speech Recognition.[J], IEEE Journal of Selected Topics in Signal Processing, 2017, 11(8): 1240-1253.

Crossref

Google Scholar

[10]

Lindasalwa Muda, Mumtaj Begam, I. Elamvazuthi. Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques [J], Computing Research Repository, 2010, abs/1003.4()

Google Scholar

[11]

Gulati A, Qin J, Chiu C, Parmar N, Zhang Y, Yu J, Han W, Wang S, Zhang Z, Wu Y, Pang R, Conformer: Convolution-augmented Transformer for Speech Recognition [C], Conference of the International Speech Communication Association, 2020, abs/2005.08100: 5036-5040.

Google Scholar

[12]

Hui B, Jiayu D, Xingyu N, Bengu W, Hao Z, AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline[J], Computing Research Repository, 2017, abs/1709.05522(): 1-5.

Google Scholar

[13]

Zhuoyuan Y, Di W, Xiong W, Binbin Z, Fan Y, Chao Y, Zhendong P, Xiaoyu C, Lei X, Xin L, WeNet - Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit.[C], Conference of the International Speech Communication Association, 2021: 4054-4058.

Google Scholar

Index Terms

Application of end-to-end speech recognition in Chinese chess software
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

Assessment of pitch-adaptive front-end signal processing for childrens speech recognition

Studying the need for pitch normalization during the front-end speech parameterization step in the case of childrens speech recognition system.Analyzing the reasons behind the pitch sensitivity of MFCC features.Exploring the effectiveness of STRAIGHT-...
A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition

Acoustic feature extraction from speech constitutes a fundamental component of automatic speech recognition (ASR) systems. In this paper, we propose a novel feature extraction algorithm, perceptual-MVDR (PMVDR), which computes cepstral coefficients from ...
Improving End-to-End Single-Channel Multi-Talker Speech Recognition
Although significant progress has been made in single-talker automatic speech recognition (ASR), there is still a large performance gap between multi-talker and single-talker speech recognition systems. In this article, we propose an enhanced end-to-end ...

Comments

Information & Contributors

Information

Published In

ICIEAI '23: Proceedings of the 2023 International Conference on Information Education and Artificial Intelligence

December 2023

1132 pages

ISBN:9798400716157

DOI:10.1145/3660043

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ICIEAI 2023

ICIEAI 2023: 2023 International Conference on Information Education and Artificial Intelligence

December 22 - 24, 2023

Xiamen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
38
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)23

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Index Terms

Recommendations

Assessment of pitch-adaptive front-end signal processing for childrens speech recognition

A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition

Improving End-to-End Single-Channel Multi-Talker Speech Recognition

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

HTML Format

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations