ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Time Delay Estimation for Speaker Localization Using CNN-Based Parametrized GCC-PHAT Features

Daniele Salvati, Carlo Drioli, Gian Luca Foresti

We propose a time delay estimation (TDE) method for speaker localization based on parametrized generalized cross-correlation phase transform (PGCC-PHAT) functions and convolutional neural networks (CNNs). The PGCC-PHAT is used to build a feature matrix, which gives TDE information of two microphone signals with different normalization levels in the cross-correlation functions. The feature matrix is processed by a CNN, composed by several convolutional layers and fully connected layers and by a regression output for the directly estimation of the time difference of arrival (TDOA). Simulations in noisy and reverberant adverse conditions show that the proposed method improves the TDOA estimation performance if compared to the GCC-PHAT.


doi: 10.21437/Interspeech.2021-988

Cite as: Salvati, D., Drioli, C., Foresti, G.L. (2021) Time Delay Estimation for Speaker Localization Using CNN-Based Parametrized GCC-PHAT Features. Proc. Interspeech 2021, 1479-1483, doi: 10.21437/Interspeech.2021-988

@inproceedings{salvati21_interspeech,
  author={Daniele Salvati and Carlo Drioli and Gian Luca Foresti},
  title={{Time Delay Estimation for Speaker Localization Using CNN-Based Parametrized GCC-PHAT Features}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1479--1483},
  doi={10.21437/Interspeech.2021-988}
}