Abstract
Sound event detection and localization (SDL) is helpful for extracting information about the position of sound sources in real time using a microphone array. This paper develops an SDL system for intelligent outdoor security cameras, so that it can listen and react to the surrounding acoustic events. In outdoor environments, this task is challenging due to high-energy and non-stationary noises such as wind noise. This paper proposes new methods for improving both detection and localization, based on a new feature, namely cross-channel power difference (XPD). The XPD is estimated from the difference of short-term power between microphones that are sensitive to wind noise. In the detection step, a time frame with high XPD is regarded as wind noise, and periods of wind, which cause false alarms, are removed from the localization step. Furthermore, the XPD is used to create a binary mask for separating the wind noise and other sound sources, thus preventing the wind noise from degrading the localization of target sounds. The proposed system is evaluated using a hardware prototype that consists of four microphones attached to the housing of a pan–tilt–zoom camera. Through real environmental experiments, we indicate that the proposed methods outperform other state-of-the-art SDL methods in windy conditions.
Similar content being viewed by others
References
S. Araki, H. Sawada, R. Mukai, S. Makino, DOA estimation for multiple sparse sources with arbitrarily arranged multiple sensors. J. Signal Process. Syst. 63(3), 265–275 (2009)
F. Asano, H. Asoh, K. Nakadai, Sound source localization using joint Bayesian estimation with a hierarchical noise model. IEEE Trans. Audio Speech Lang. Process. 21(9), 1953–1965 (2013)
C. Blandin, A. Orezov, E. Vincent, Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Process. 92(8), 1950–1960 (2012)
G. Bogason, U. Kjems, T. Nielsen, K. Petersen, Device and method for detecting wind noise, U.S. Patent No. 7340068, March (2008)
K. Chung, Comparisons of spectral characteristics of wind noise between omnidirectional and directional microphones. J. Acoust. Soc. Am. 131, 4508–4517 (2012)
H. Do, H.F. Silverman, Y. Yu, A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Apr. (2007), pp. 121–124
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32, 1109–1121 (1984)
Y. Huang, J. Benesty, G.W. Elko, Passive acoustic source localization for video camera steering. in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) vol. 2 (2000), pp. 909–912
M. Kawamoto, F. Asano, K. Kurumatani, Y. Hua, A system for detecting unusual sounds from sound environment observed by microphone arrays. in Proceedings of International Conference on Information Assurance and Security, (IAS) vol. 1 (2009), pp. 729–732
O. Keita, T. Yoshida, K. Nakamura, K. Nakadai, Outdoor auditory scene analysis using a moving microphone array embedded in a quadrocopter. in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Portugal (2012), pp. 3288–3293
F. Keyrouz, K. Diepold, S. Keyrouz, High performance 3D sound localization for surveillance applications. in Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Washington DC (2007), pp. 563–566
T. Kim, H. Park, S. Hong, Y. Chung, Integrated system of face recognition and sound localization for a smart door phone. IEEE Trans. Consum. Electron. 59(3), 598–603 (2013)
U. Kim, H. Okuno, Improved binaural sound localization and tracking for unknown time-varying number of speakers. Adv. Robot. 27(15), 161–1173 (2013)
J. Kotus, K. opatka, A. Cyzewski, Detection and localization of selected acoustic events in 3D acoustic field for smart surveillance applications. Multimedia Commun. Serv. Secur. 68(1), 55–63 (2011)
B. Lee, J. Choi, D. Kim, M. Kim, Sound source localization in reverberant environment using visual information. in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taiwan, October (2010), pp. 3542–3547
B. Loesch, B. Yang, Blind source separation based on time–frequency sparseness in the presence of spatial aliasing. in Proceedings of International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), France (2010)
B. Markus, T. Haulick, System for detecting and reducing noise via a microphone array, U.S. Patent No. 7881480, Feb. (2011)
Q. Nguyen, J. Choi, Multiple sound sources localization with perception sensor network. in Proceedings of International Symposium on Robot and Human Interactive Communication (RO-MAN), Korea, August (2013), pp. 418–423
Pulse code modulation (PCM) of voice frequencies, ITU-T Recommendation G.711, Geneva, November (1988)
David M.W. Powers, Evaluation: from precision, recall and F-factor to ROC, informedness, markedness and correlation. J. Machine Learn. Technol. 2(1), 37–63 (2011)
G. Shen, Q. Nguyen, J. Choi, An environmental sound source classification system based on Mel-frequency cepstral coefficients and Gaussian mixture models, in Proceedings of International Conference on Information Control Problems in Manufacturing, Romania (2012)
J. Stachurski, L. Netsch, R. Cole, Sound source localization for video surveillance camera, in Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Poland, August (2013), pp. 93–98
L. Sun, Q. Cheng, Real-time microphone array processing for sound source separation and localization. in Proceedings of Annual Conference on Information Sciences and Systems (CISS), Maryland, USA, (2013), pp. 1–6
G. Valenzise, L. Gerosa, M. Tagliasacchi, E. Antonacci, A. Sarti, Scream and gunshot detection and localization for audio-surveillance systems, in Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), London, UK, (2007), pp. 21–26
D. Wang, On ideal Binary Mask as the Computational Goal of Auditory Scene Analysis. Speech Separation by Humans and Machines (Springer, New York, 2005)
O. Yilmaz, S. Rickard, Blind separation of speech mixtures via time–frequency masking. IEEE Trans. Signal Process. 52(7), 1830–1847 (2004)
C. Zhang, D. Florencio, D. Ba, Z. Zhang, Maximum likelihood sound source localization and beamforming for directional microphone arrays in distributed meetings. IEEE Trans. Multimedia 10(3), 538–548 (2008)
C. Zieger, A. Brutti, P. Svaizer, Acoustic based surveillance system for intrusion detection. in Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Italy (2009)
Acknowledgments
This work was supported by the Implementation of Technologies for Identification, Behavior, and Location of Human based on Sensor Network Fusion Program through the Ministry of Knowledge Economy (Grant No. 10041629).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nguyen, Q., Shen, G. & Choi, J. Sound Detection and Localization in Windy Conditions for Intelligent Outdoor Security Cameras. Circuits Syst Signal Process 35, 233–251 (2016). https://doi.org/10.1007/s00034-015-0058-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-015-0058-9