ABSTRACT
When we think of audio data, we think of music and speech. However, the set of various kinds of audio data, contains a vast multitude of different sounds. Human brain can identify sounds such as two vehicles crashing against each other, someone crying or a bomb explosion. When we hear such sounds, we can identify the source of the sound and the event that caused them. We can build artificial systems which can detect acoustic events just like humans. Acoustic event detection (AED) is a technology which is used to detect acoustic events. Not only can we detect the acoustic event but also, determine the time duration and the exact time of occurrence of any event. This paper aims to make use of convolutional neural networks in classifying environmental sounds which are linked to certain acoustic events. Classification and detection of acoustic events has numerous real-world applications such as anomaly detection in industrial instruments and machinery, smart home systems, security applications, tagging audio data and in creating systems to aid the hearing-impaired individuals. While environmental sounds can encompass a large variety of sounds, we will focus specifically on certain urban sounds in our study and make use of convolutional neural networks (CNNs) which have traditionally been used to classify image data, for our analysis on audio data. The model, when given a sample audio file must be able to assign a classification label and a corresponding accuracy score.
- A. Khamparia, D. Gupta, N. G. Nguyen, A. Khanna, B. Pandey and P. Tiwari, "Sound Classification Using Convolutional Neural Network and Tensor Deep Stacking Network," in IEEE Access, vol. 7, pp. 7717-7727, 2019, doi: 10.1109/ACCESS.2018.2888882.Google Scholar
- S. Hershey , "CNN architectures for large-scale audio classification," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 131-135, doi: 10.1109/ICASSP.2017.7952132.Google Scholar
- Justin Salamon, Christopher Jacoby, and Juan Pablo Bello. 2014. A Dataset and Taxonomy for Urban Sound Research. In Proceedings of the 22nd ACM international conference on Multimedia (MM '14). Association for Computing Machinery, New York, NY, USA, 1041–1044. DOI:https://doi.org/10.1145/2647868.2655045Google Scholar
- M. Rahmandani, H. A. Nugroho and N. A. Setiawan, "Cardiac Sound Classification Using Mel-Frequency Cepstral Coefficients (MFCC) and Artificial Neural Network (ANN)," 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE), 2018, pp. 22-26, doi: 10.1109/ICITISEE.2018.8721007.Google Scholar
- A. I. Iliev and M. S. Scordilis, "Emotion recognition in speech using inter-sentence Glottal statistics," 2008 15th International Conference on Systems, Signals and Image Processing, 2008, pp. 465-468, doi: 10.1109/IWSSIP.2008.4604467.Google Scholar
- J. Salamon and J. P. Bello, "Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification," in IEEE Signal Processing Letters, vol. 24, no. 3, pp. 279-283, March 2017, doi: 10.1109/LSP.2017.2657381.Google ScholarCross Ref
- P. Dileep, D. Das and P. K. Bora, "Dense Layer Dropout Based CNN Architecture for Automatic Modulation Classification," 2020 National Conference on Communications (NCC), 2020, pp. 1-5, doi: 10.1109/NCC48643.2020.9055989.Google Scholar
- Y. Guo, L. Sun, Z. Zhang and H. He, "Algorithm Research on Improving Activation Function of Convolutional Neural Networks," 2019 Chinese Control And Decision Conference (CCDC), 2019, pp. 3582-3586, doi: 10.1109/CCDC.2019.8833156.Google Scholar
- I. Lezhenin, N. Bogach and E. Pyshkin, "Urban Sound Classification using Long Short-Term Memory Neural Network," 2019 Federated Conference on Computer Science and Information Systems (FedCSIS), 2019, pp. 57-60, doi: 10.15439/2019F185.Google Scholar
- J. Sang, S. Park and J. Lee, "Convolutional Recurrent Neural Networks for Urban Sound Classification Using Raw Waveforms," 2018 26th European Signal Processing Conference (EUSIPCO), 2018, pp. 2444-2448, doi: 10.23919/EUSIPCO.2018.8553247.Google Scholar
- Acoustic Event Detection and Sound Separation for security systems and IoT devices
Recommendations
Acoustic event detection in meeting-room environments
Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in the signals that are captured by one or several microphones. The AED problem has been recently proposed for meeting-room or class-room environments, ...
Detection and localization of selected acoustic events in acoustic field for smart surveillance applications
A method for automatic determination of position of chosen sound events such as speech signals and impulse sounds in 3-dimensional space is presented. The events are localized in the presence of sound reflections employing acoustic vector sensors. Human ...
On Learning Disentangled Representation for Acoustic Event Detection
MM '19: Proceedings of the 27th ACM International Conference on MultimediaPolyphonic Acoustic Event Detection (AED) is a challenging task as the sounds are mixed with the signals from different events, and the features extracted from the mixture do not match well with features calculated from sounds in isolation, leading to ...
Comments