note

Speech Feature Enhancement based on Time-frequency Analysis

Authors:

Thanh-Duc Chau,

Thai-Son TranAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 22, Issue 8

Article No.: 219, Pages 1 - 14

https://doi.org/10.1145/3605549

Published: 23 August 2023 Publication History

Abstract

Time-frequency analysis (TFA) is a powerful method to exploit the hidden information of signals, including speech signals. Many techniques in this group were invented and developed to capture the most crucial stationary feature. However, human speech is not stable, and it contains some non-stationary elements. This work aims to design a new algorithm via the TFA technique to extract the trends and changes inside the speech signal in the time-frequency (TF) plane. We design a new algorithm to create a set of atoms for the signal transform, which can analyze the signal in many different view directions via Poly-Linear Chirplet Transform (PLCT). After processing the signal, the proposed method returns a multichannel output in which each channel results from a particular Linear Chirplet Transform (LCT). The feature then is combined with the MFCC feature to form the final representation. Although the size for speech representation rises, our extracted feature contains rich-meaning information to improve the recognition results compared to other features in gender recognition, dialect recognition, and speaker recognition.

References

[1]

Alireza Ahrabian and Danilo P. Mandic. 2015. Selective time-frequency reassignment based on synchrosqueezing. IEEE Signal Process. Lett. 22, 11 (2015), 2039–2043. DOI:

[2]

F. Auger and P. Flandrin. 1994. Generalization of the reassignment method to all bilinear time-frequency and time-scale representations. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’94). DOI:

[3]

F. Auger and P. Flandrin. 1995. Improving the readability of time-frequency and time-scale representations by the reassignment method. Trans. Signal Process. 43, 5 (1995), 1068–1089. DOI:

Digital Library

[4]

Priyanka Bansal, Syed Akhtar Imam, and Roma Bharti. 2015. Speaker recognition using MFCC, shifted MFCC with vector quantization and fuzzy. In International Conference on Soft Computing Techniques and Implementations (ICSCTI’15). 41–44. DOI:

[5]

R. G. Baraniuk and D. L. Jones. 1996. Wigner-based formulation of the chirplet transform. Trans. Signal Process. 44, 12 (1996), 3129–3135. DOI:

Digital Library

[6]

Leon Cohen. 1995. Time-frequency Analysis, Vol. 778. Prentice Hall, NJ.

Digital Library

[7]

Ingrid Daubechies, Jianfeng Lu, and Hau-Tieng Wu. 2011. Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool. Appl. Computat. Harmon. Anal. 30, 2 (2011), 243–261. DOI:

[8]

Hao D. Do, Duc T. Chau, and Son T. Tran. 2022. Speech representation using linear chirplet transform and its application in speaker-related recognition. In Computational Collective Intelligence, Ngoc Thanh Nguyen, Yannis Manolopoulos, Richard Chbeir, Adrianna Kozierkiewicz, and Bogdan Trawiński (Eds.). Springer International Publishing, Cham, 719–729.

Digital Library

[9]

George R. Goudie-Marshall, Kathleen M. Fisher, and William M. Doddington. 1986. The DARPA speech recognition research database: Specifications and status. In DARPA Workshop on Speech Recognition. 93–99.

[10]

Wen-Biao Gao and Bing-Zhao Li. 2021. Octonion short-time Fourier transform for time-frequency representation and its applications. Trans. Signal Process. 69 (2021), 6386–6398. DOI:

Digital Library

[11]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[12]

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural Computat. 1, 4 (121989), 541–551. DOI:

Digital Library

[13]

Yanfeng Li, Zhijian Wang, Tiansheng Zhao, and Wanqing Song. 2021. An improved multi-ridge extraction method based on differential synchro-squeezing wavelet transform. IEEE Access 9 (2021), 96763–96774. DOI:

[14]

Shizhen Liu, Chenhao Cui, Yihan Hong, and Siyuan Lin. 2021. Birdcall identification using mel-spectrum based on ResNeSt50 model. In IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI’21). 626–629. DOI:

[15]

Hieu-Thi Luong and Hai-Quan Vu. 2016. A non-expert Kaldi recipe for Vietnamese speech recognition system. In 3rd International Workshop on Worldwide Language Service Infrastructure and 2nd Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT@COLING’16), Yohei Murakami, Donghui Lin, Nancy Ide, and James Pustejovsky (Eds.). The COLING 2016 Organizing Committee, 51–55. Retrieved from https://aclanthology.org/W16-5207/.

[16]

Mann and Haykin. 1995. The chirplet transform: Physical considerations. Trans. Signal Process. 43, 11 (1995), 2745–2761. DOI:

Digital Library

[17]

S. Mann and S. Haykin. 1992. Time-frequency perspectives: The “chirplet” transform. In IEEE International Conference on Acoustics, Speech, and Signal Processing. 417–420. DOI:

[18]

Sylvain Meignen, Thomas Oberlin, and Duong-Hung Pham. 2019. Synchrosqueezing transforms: From low- to high-frequency modulations and perspectives. Comptes Rendus Physique 20, 5 (2019), 449–460. DOI:

[19]

Geir Kjetil Nilsen. 2009. Recursive time-frequency reassignment. Trans. Signal Process. 57, 8 (2009), 3283–3287. DOI:

Digital Library

[20]

Sâmara de Cavalcante Paiva, Ricardo Lúcio de Araujo Ribeiro, Denis Keuton Alves, Josep M. Guerrero, Thiago de Oliveira Alves Rocha, and Flavio Bezerra Costa. 2021. Wavelet-based frequency tracking monitor applied for low-inertia AC microgrids. IEEE Trans. Power Electron. 36, 6 (2021), 6674–6684. DOI:

[21]

M. S. Richman, T. W. Parks, and R. G. Shenoy. 1998. Discrete-time, discrete-frequency, time-frequency analysis. IEEE Trans. Signal Process. 46, 6 (1998), 1517–1527. DOI:

Digital Library

[22]

Frank Rudzicz, Graeme Hirst, and Pascal van Lieshout. 2012. Vocal tract representation in the recognition of cerebral palsied speech. J. Speech, Lang. Hear. Res. 55 4 (2012), 1190–1207.

[23]

Frank Rudzicz, Aravind Kumar Namasivayam, and Talya Wolff. 2012. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 46 (2012), 523–541.

Digital Library

[24]

M. L. Seltzer and R. M. Stern. 2003. Subband parameter optimization of microphone arrays for speech recognition in reverberant environments. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’03). DOI:

[25]

Anggun Winursito, Risanuri Hidayat, and Agus Bejo. 2018. Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. In International Conference on Information and Communications Technology (ICOIACT’18). 379–383. DOI:

[26]

Gang Yu, Zhonghua Wang, and Ping Zhao. 2019. Multisynchrosqueezing transform. IEEE Trans. Industr. Electron. 66, 7 (2019), 5441–5455. DOI:

[27]

Gang Yu and Yiqi Zhou. 2016. General linear chirplet transform. Mechan. Syst. Signal Process. 70 (2016), 958–973.

Recommendations

A statistical instantaneous frequency estimator for high-concentration time-frequency representation
Highlights
- Define an inner product space from the short-time Fourier transforms (STFTs).
- ...
Abstract
The instantaneous frequency (IF)-based post-processing methods, synchrosqueezing and synchroextracting, can accurately characterize the time-varying frequency and amplitude of multi-component nonstationary signals. However, the window'...
Nonlinear squeezing time-frequency transform for weak signal detection

Conventional time-frequency analysis methods can characterize the time-frequency pattern of multi-component nonstationary signals. However, it is difficult to detect weak components hidden in complex signals because the time-frequency representation is ...
A New Algorithm for Speech Feature Extraction Using Polynomial Chirplet Transform
Abstract
Time–frequency analysis (TFA) is a powerful tool for signal feature representation. In the time–frequency plane, the primary data properties are shown with both instantaneous values and trends of frequency change during time. With a complicated ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22, Issue 8

August 2023

373 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3615980

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2023

Accepted: 09 June 2023

Revised: 26 March 2023

Received: 14 August 2022

Published in TALLIP Volume 22, Issue 8

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Note

Funding Sources

Vingroup JSC
PhD Scholarship Programme of Vingroup Innovation Foundation (VINIF), Institute of Big Data

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
85
Total Downloads

Downloads (Last 12 months)39
Downloads (Last 6 weeks)3

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents