GTTS-EHU Systems for the Albayzin 2018 Search on Speech Evaluation

Rodríguez-Fuentes, Luis J.; Peñagarikano, Mikel; Varona, Amparo; Bordel, Germán

doi:10.21437/IberSPEECH.2018-52

GTTS-EHU Systems for the Albayzin 2018 Search on Speech Evaluation

Luis J. Rodríguez-Fuentes, Mikel Peñagarikano, Amparo Varona, Germán Bordel

This paper describes the systems developed by GTTS-EHU for the QbE-STD and STD tasks of the Albayzin 2018 Search on Speech Evaluation. Stacked bottleneck features (sBNF) are used as frame-level acoustic representation for both audio documents and spoken queries. In QbE-STD, a flavour of segmental DTW (originally developed for MediaEval 2013) is used to perform the search, which iteratively finds the match that minimizes the average distance between two test-normalized sBNF vectors, until either a maximum number of hits is obtained or the score does not attain a given threshold. The STD task is performed by synthesizing spoken queries (using publicly available TTS APIs), then averaging their sBNF representations and using the average query for QbE-STD. A publicly available toolkit (developed by BUT/Phonexia) has been used to extract three sBNF sets, trained for English monophone and triphone state posteriors (contrastive systems 3 and 4) and for multilingual triphone posteriors (contrastive system 2), respectively. The concatenation of the three sBNF sets has been also tested (contrastive system 1). The primary system consists of a discriminative fusion of the four contrastive systems. Detection scores are normalized on a query-by-query basis (qnorm), calibrated and, if two or more systems re considered, fused with other scores. Calibration and fusion parameters are discriminatively estimated using the ground truth of development data. Finally, due to a lack of robustness in calibration, Yes/No decisions are made by applying the MTWV hresholds obtained for the development sets, except for the COREMAH test set. In this case, calibration is based on the MAVIR corpus, and the 15% highest scores are taken as positive (Yes) detections.

doi: 10.21437/IberSPEECH.2018-52

Cite as: Rodríguez-Fuentes, L.J., Peñagarikano, M., Varona, A., Bordel, G. (2018) GTTS-EHU Systems for the Albayzin 2018 Search on Speech Evaluation. Proc. IberSPEECH 2018, 249-253, doi: 10.21437/IberSPEECH.2018-52

@inproceedings{rodriguezfuentes18_iberspeech,
  author={Luis J. Rodríguez-Fuentes and Mikel Peñagarikano and Amparo Varona and Germán Bordel},
  title={{GTTS-EHU Systems for the Albayzin 2018 Search on Speech Evaluation}},
  year=2018,
  booktitle={Proc. IberSPEECH 2018},
  pages={249--253},
  doi={10.21437/IberSPEECH.2018-52}
}