skip to main content
note

Speech Feature Enhancement based on Time-frequency Analysis

Published: 23 August 2023 Publication History

Abstract

Time-frequency analysis (TFA) is a powerful method to exploit the hidden information of signals, including speech signals. Many techniques in this group were invented and developed to capture the most crucial stationary feature. However, human speech is not stable, and it contains some non-stationary elements. This work aims to design a new algorithm via the TFA technique to extract the trends and changes inside the speech signal in the time-frequency (TF) plane. We design a new algorithm to create a set of atoms for the signal transform, which can analyze the signal in many different view directions via Poly-Linear Chirplet Transform (PLCT). After processing the signal, the proposed method returns a multichannel output in which each channel results from a particular Linear Chirplet Transform (LCT). The feature then is combined with the MFCC feature to form the final representation. Although the size for speech representation rises, our extracted feature contains rich-meaning information to improve the recognition results compared to other features in gender recognition, dialect recognition, and speaker recognition.

References

[1]
Alireza Ahrabian and Danilo P. Mandic. 2015. Selective time-frequency reassignment based on synchrosqueezing. IEEE Signal Process. Lett. 22, 11 (2015), 2039–2043. DOI:
[2]
F. Auger and P. Flandrin. 1994. Generalization of the reassignment method to all bilinear time-frequency and time-scale representations. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’94). DOI:
[3]
F. Auger and P. Flandrin. 1995. Improving the readability of time-frequency and time-scale representations by the reassignment method. Trans. Signal Process. 43, 5 (1995), 1068–1089. DOI:
[4]
Priyanka Bansal, Syed Akhtar Imam, and Roma Bharti. 2015. Speaker recognition using MFCC, shifted MFCC with vector quantization and fuzzy. In International Conference on Soft Computing Techniques and Implementations (ICSCTI’15). 41–44. DOI:
[5]
R. G. Baraniuk and D. L. Jones. 1996. Wigner-based formulation of the chirplet transform. Trans. Signal Process. 44, 12 (1996), 3129–3135. DOI:
[6]
Leon Cohen. 1995. Time-frequency Analysis, Vol. 778. Prentice Hall, NJ.
[7]
Ingrid Daubechies, Jianfeng Lu, and Hau-Tieng Wu. 2011. Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool. Appl. Computat. Harmon. Anal. 30, 2 (2011), 243–261. DOI:
[8]
Hao D. Do, Duc T. Chau, and Son T. Tran. 2022. Speech representation using linear chirplet transform and its application in speaker-related recognition. In Computational Collective Intelligence, Ngoc Thanh Nguyen, Yannis Manolopoulos, Richard Chbeir, Adrianna Kozierkiewicz, and Bogdan Trawiński (Eds.). Springer International Publishing, Cham, 719–729.
[9]
George R. Goudie-Marshall, Kathleen M. Fisher, and William M. Doddington. 1986. The DARPA speech recognition research database: Specifications and status. In DARPA Workshop on Speech Recognition. 93–99.
[10]
Wen-Biao Gao and Bing-Zhao Li. 2021. Octonion short-time Fourier transform for time-frequency representation and its applications. Trans. Signal Process. 69 (2021), 6386–6398. DOI:
[11]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[12]
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural Computat. 1, 4 (121989), 541–551. DOI:
[13]
Yanfeng Li, Zhijian Wang, Tiansheng Zhao, and Wanqing Song. 2021. An improved multi-ridge extraction method based on differential synchro-squeezing wavelet transform. IEEE Access 9 (2021), 96763–96774. DOI:
[14]
Shizhen Liu, Chenhao Cui, Yihan Hong, and Siyuan Lin. 2021. Birdcall identification using mel-spectrum based on ResNeSt50 model. In IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI’21). 626–629. DOI:
[15]
Hieu-Thi Luong and Hai-Quan Vu. 2016. A non-expert Kaldi recipe for Vietnamese speech recognition system. In 3rd International Workshop on Worldwide Language Service Infrastructure and 2nd Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT@COLING’16), Yohei Murakami, Donghui Lin, Nancy Ide, and James Pustejovsky (Eds.). The COLING 2016 Organizing Committee, 51–55. Retrieved from https://aclanthology.org/W16-5207/.
[16]
Mann and Haykin. 1995. The chirplet transform: Physical considerations. Trans. Signal Process. 43, 11 (1995), 2745–2761. DOI:
[17]
S. Mann and S. Haykin. 1992. Time-frequency perspectives: The “chirplet” transform. In IEEE International Conference on Acoustics, Speech, and Signal Processing. 417–420. DOI:
[18]
Sylvain Meignen, Thomas Oberlin, and Duong-Hung Pham. 2019. Synchrosqueezing transforms: From low- to high-frequency modulations and perspectives. Comptes Rendus Physique 20, 5 (2019), 449–460. DOI:
[19]
Geir Kjetil Nilsen. 2009. Recursive time-frequency reassignment. Trans. Signal Process. 57, 8 (2009), 3283–3287. DOI:
[20]
Sâmara de Cavalcante Paiva, Ricardo Lúcio de Araujo Ribeiro, Denis Keuton Alves, Josep M. Guerrero, Thiago de Oliveira Alves Rocha, and Flavio Bezerra Costa. 2021. Wavelet-based frequency tracking monitor applied for low-inertia AC microgrids. IEEE Trans. Power Electron. 36, 6 (2021), 6674–6684. DOI:
[21]
M. S. Richman, T. W. Parks, and R. G. Shenoy. 1998. Discrete-time, discrete-frequency, time-frequency analysis. IEEE Trans. Signal Process. 46, 6 (1998), 1517–1527. DOI:
[22]
Frank Rudzicz, Graeme Hirst, and Pascal van Lieshout. 2012. Vocal tract representation in the recognition of cerebral palsied speech. J. Speech, Lang. Hear. Res. 55 4 (2012), 1190–1207.
[23]
Frank Rudzicz, Aravind Kumar Namasivayam, and Talya Wolff. 2012. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang. Resour. Eval. 46 (2012), 523–541.
[24]
M. L. Seltzer and R. M. Stern. 2003. Subband parameter optimization of microphone arrays for speech recognition in reverberant environments. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’03). DOI:
[25]
Anggun Winursito, Risanuri Hidayat, and Agus Bejo. 2018. Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. In International Conference on Information and Communications Technology (ICOIACT’18). 379–383. DOI:
[26]
Gang Yu, Zhonghua Wang, and Ping Zhao. 2019. Multisynchrosqueezing transform. IEEE Trans. Industr. Electron. 66, 7 (2019), 5441–5455. DOI:
[27]
Gang Yu and Yiqi Zhou. 2016. General linear chirplet transform. Mechan. Syst. Signal Process. 70 (2016), 958–973.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 8
August 2023
373 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3615980
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2023
Accepted: 09 June 2023
Revised: 26 March 2023
Received: 14 August 2022
Published in TALLIP Volume 22, Issue 8

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Speech feature
  2. multichannel representation
  3. time-frequency analysis
  4. Chirplet Transform
  5. poly-linear chirplet transform
  6. instantaneous frequency

Qualifiers

  • Note

Funding Sources

  • Vingroup JSC
  • PhD Scholarship Programme of Vingroup Innovation Foundation (VINIF), Institute of Big Data

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 85
    Total Downloads
  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)3
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media