1st International ICST Conference on Forensic Applications and Techniques in Telecommunications, Information and Multimedia

Research Article

Automatic Voice Activity Detection in Different Speech Applications

  • @INPROCEEDINGS{10.4108/e-forensics.2008.2781,
        author={Marko Tuononen and Rosa Gonzalez Hautamaki and Pasi Fr\aa{}nti},
        title={Automatic Voice Activity Detection in Different Speech Applications},
        proceedings={1st International ICST Conference on Forensic Applications and Techniques in Telecommunications, Information and Multimedia},
        publisher={ACM},
        proceedings_a={E-FORENSICS},
        year={2010},
        month={5},
        keywords={Voice activity detection speech applicatons unsupervised learning voice biometric and speech recognition},
        doi={10.4108/e-forensics.2008.2781}
    }
    
  • Marko Tuononen
    Rosa Gonzalez Hautamaki
    Pasi Fränti
    Year: 2010
    Automatic Voice Activity Detection in Different Speech Applications
    E-FORENSICS
    ACM
    DOI: 10.4108/e-forensics.2008.2781
Marko Tuononen1,*, Rosa Gonzalez Hautamaki2,*, Pasi Fränti3,*
  • 1: University of Joensuu, P.0.Box 111 FI-80101 Joensuu, Finland. +358 13 251 7963
  • 2: University of Joensuu, P.0.Box 111 FI-80101 Joensuu, Finland. +358 13 251 7902
  • 3: University of Joensuu, P.0.Box 111 FI-80101 Joensuu, Finland. +358 13 251 7931
*Contact email: mtuonon@cs.joensuu.fi, rgonza@cs.joensuu.fi, franti@cs.joensuufi

Abstract

This paper presents performance evaluation of voice activity detectors (VAD) by long-term spectral divergence and simple energy-based scheme. Evaluation is made in the terms of false accept (FA) and false reject (FR) errors using four different types of materials, recorded under different transfer channels, scenarios and conditions. Performance of VADs is considered for forensics, speaker recognition and interactive speech dialogue applications. Performance is still far from perfect, but despite the numerous classification errors of the methods tested, especially with noisy data, the methods can be still useful.