Abstract:
In this paper, we address the problem of the concurrent detection of multiple infant cries by using microphones located in the cribs of a Neonatal Intensive Care Unit (NI...Show MoreMetadata
Abstract:
In this paper, we address the problem of the concurrent detection of multiple infant cries by using microphones located in the cribs of a Neonatal Intensive Care Unit (NICU). We term this task as infant cry diarization in resemblance with the "speaker diarization" task related to the speech signal: instead of determining "who spoke when", here the problem is determining "who cried when". The proposed algorithm consists of a fully-convolutional neural network (Conv-DetNet) that processes simultaneously all the audio signals acquired from the microphone in each crib and detects if the infants cried or not. The neural network takes as input Log-Mel coefficients and it is composed of stacked dilated convolutional blocks with increasing dilation factors. Each block is composed of pointwise and depthwise convolutional layers that replace standard convolutions with a mathematically equivalent but more efficient operation. The architecture has been compared to its single-channel equivalent and to single and multi-channel architectures presented in a previous work, composed of standard convolutional layers and fully-connected layers. The experiments have been conducted on a synthetic dataset that simulates the acoustic environment of the Salesi Hospital NICU located in Ancona (Italy). The results have been evaluated in terms of Area Under Precision-Recall Curve (PRC-AUC) and they showed that the proposed multi-channel Conv-DetNet achieves the highest performance with a PRC-AUC equal to 87.58%, outperforming all the comparative methods.
Date of Conference: 19-24 July 2020
Date Added to IEEE Xplore: 28 September 2020
ISBN Information: