ABSTRACT
This paper proposes a novel processing architecture to extract Mel-Frequency Cepstrum Coefficients (MFCC) for automatic speech recognition. Inspired by the human ear, the energy-efficient analog-domain information processing is adopted to replace the energy-intensive Fourier Transform in conventional digital-domain. Moreover, the proposed architecture extracts the acoustic features in the mixed-signal domain, which significantly reduces the cost of Analog-to-Digital Converter (ADC) and the computational complexity. We carry out the circuit-level simulation based on 180nm CMOS technology, which shows an energy consumption of 2.4 nJ/frame, and a processing speed of 45.79 μs/frame. The proposed architecture achieves 97.2% energy saving and about 6.4x speedup than state of the art. Speech recognition simulation reaches the classification accuracy of 99% using the proposed MFCC features.
- Jo, Jihyuck, et al., "Energy-Efficient floating-point MFCC extraction architecture for speech recognition systems." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24.2 (2016): 754--758.Google ScholarDigital Library
- Kou, Haofeng, et al. "Efficient MFCC feature extraction on Graphics Processing Units." CIWSP IET, 2013:1--4.Google Scholar
- Manikandan, J., et al. "Hardware implementation of real-time speech recognition system using TMS320C6713 DSP." VLSI Design, 2011. Google ScholarDigital Library
- Price, M., et al. "27.2 A 6mW 5k-word real-time speech recognizer using WFST models." Solid-State Circuits Conference Digest of Technical Papers (ISSCC), IEEE International, 2014.Google Scholar
- R. Gary Leonard, and George Doddington. TIDIGITS LDC93S10. Web Download. Philadelphia: Linguistic Data Consortium, 1993.Google Scholar
- Graves A, et al. Biologically Plausible Speech Recognition with LSTM Neural Nets{J}. Lecture Notes in Computer Science, 3141(2004).Google Scholar
Index Terms
Energy-efficient MFCC extraction architecture in mixed-signal domain for automatic speech recognition
Recommendations
Research of Automatic Speech Recognition of Asante-Twi Dialect For Translation
EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer EngineeringThis paper presents a new way of building low-resourced dialect Automatic Speech Recognition (ASR) systems using a small database using the Asante-Twi dialect. Three different ASR systems with different features and methods have been tested and tried ...
Psycho-acoustics inspired automatic speech recognition
AbstractUnderstanding the human spoken language recognition process is still a far scientific goal. Nowadays, commercial automatic speech recognisers (ASRs) achieve high performance at recognising clean speech, but their approaches are poorly ...
Highlights- We propose a novel Automatic Speech Recognizer inspired by psycho-acoustic studies.
MFCC-GMM based accent recognition system for Telugu speech signals
Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. ...
Comments