Abstract:
Extracting speaker sentiment from natural audio streams such as YouTube is challenging. A number of factors contribute to the task difficulty, namely, Automatic Speech Re...Show MoreMetadata
Abstract:
Extracting speaker sentiment from natural audio streams such as YouTube is challenging. A number of factors contribute to the task difficulty, namely, Automatic Speech Recognition (ASR) of spontaneous speech, unknown background environments, variable source and channel characteristics, accents, diverse topics, etc. In this study, we build upon our previous work [5], where we had proposed a system for detecting sentiment in YouTube videos. Particularly, we propose several enhancements including (i) better text-based sentiment model due to training on larger and more diverse dataset, (ii) an iterative scheme to reduce sentiment model complexity with minimal impact on performance accuracy, (iii) better speech recognition due to superior acoustic modeling and focused (domain dependent) vocabulary/language models, and (iv) a larger evaluation dataset. Collectively, our enhancements provide an absolute 10% improvement over our previous system in terms of sentiment detection accuracy. Additionally, we also present analysis that helps understand the impact of WER (word error rate) on sentiment detection accuracy. Finally, we investigate the relative importance of different Parts-of-Speech (POS) tag features towards sentiment detection. Our analysis reveals the practicality of this technology and also provides several potential directions for future work.
Date of Conference: 08-12 December 2013
Date Added to IEEE Xplore: 09 January 2014
Electronic ISBN:978-1-4799-2756-2