Abstract:
With the growing complexity of various text-to-speech systems, it is becoming more important to understand the underlying perceptual and judgement processes that drive us...Show MoreMetadata
Abstract:
With the growing complexity of various text-to-speech systems, it is becoming more important to understand the underlying perceptual and judgement processes that drive user Quality-of-Experience (QoE) perception. Typical QoE assessment techniques, such as listening tests with self-report ratings, are useful but provide limited insight into these underlying processes. Recent advances in neuroimaging and physiological monitoring technologies, however, have opened new doors and allowed us to better understand and measure QoE perception. In this paper, we explore the use of two neuroimaging techniques, namely electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), to better understand neuronal and cerebral haemodynamic changes resultant from synthesized speech of varying quality. Neural correlates of several QoE dimensions were derived and validated on the publicly available PhySyQX database. Fusion of EEG, fNIRS, and fNIRS-derived physiological parameters, combined with conventional features extracted from the synthesized speech signal showed to accurately represent several QoE dimensions, including those related to listener affective states. It is hoped that these findings will help researchers build better instrumental QoE models that incorporate technological, contextual, and human influence factors.
Published in: IEEE Journal of Selected Topics in Signal Processing ( Volume: 11, Issue: 1, February 2017)