An Improved Speech / Nonspeech Classification Based on Feature Combination for Audio Indexing

Ji-Soo KEUM; Hyon-Soo LEE; Masafumi HAGIWARA

doi:10.1587/transfun.E93.A.830

Abstract

In this letter, we propose an improved speech/nonspeech classification method to effectively classify a multimedia source. To improve performance, we introduce a feature based on spectral duration analysis, and combine recently proposed features such as high zero crossing rate ratio (HZCRR), low short time energy ratio (LSTER), and pitch ratio (PR). According to the results of our experiments on speech, music, and environmental sounds, the proposed method obtained high classification results when compared with conventional approaches.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!