ABSTRACT
The proliferation of smart connected devices using digital assistants activated by voice commands (e.g., Apple Siri, Google Assistant, Amazon Alexa, etc.) is raising the interest in algorithms to localize and recognize audio sources. Among the others, deep neural networks (DNNs) are seen as a promising approach to accomplish such task. Unlike other approaches, DNNs can categorize received events, thus discriminating between events of interests and not even in presence of noise. Despite their advantages, DNNs require large datasets to be trained. Thus, tools for generating datasets are of great value, being able to accelerate the development of advanced learning models.
This paper presents SoundFactory, a framework for simulating the propagation of sound waves (also considering noise, reverberation, reflection, attenuation, and other interfering waves) and the microphone array response to such sound waves. As such, SoundFactory allows to easily generate datasets to train deep neural networks which are at the basis of modern applications. SoundFactory is flexible enough to simulate many different microphone array configurations, thus covering a large set of use cases. To demonstrate the capabilities offered by SoundFactory, we generated a dataset and trained two different (rather simple) learning models against them, achieving up to 97% of accuracy. The quality of the generated dataset has been also assessed comparing the microphone array model responses with the real ones.
- Faheem Zafari et al. A survey of indoor localization systems and technologies. IEEE Communications Surveys and Tutorials, 2019.Google Scholar
- Wenchao Huang et al. A methodology for implementing highly concurrent data objects. IEEE Transactions on Mobile Computing, 2014.Google Scholar
- Faheem Ijaz et al. Indoor positioning: A review of indoor ultrasonic positioning systems. In Proc. of the 15th International Conference on Advanced Communications Technology (ICACT), 2013.Google Scholar
- Annamaria Mesaros et al. Acoustic event detection in real life recordings. In Proc. of the 18th EUropean SIgnal Processing COnference (EUSIPCO), 2010.Google Scholar
- Tomoki Hayashi et al. Duration-controlled lstm for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017.Google Scholar
- Emre Cakir et al. Polyphonic sound event detection using multi label deep neural networks. In Proc. of the International Joint Conference on Neural Networks (IJCNN), 2015.Google Scholar
- Kaikai Liu et al. Guoguo: enabling fine-grained indoor localization via smartphone. In Proc. of the 11th annual international conference on Mobile systems, applications, and services (MobiSys), 2013.Google Scholar
- Wenchao Huang et al. Walkielokie: sensing relative positions of surrounding presenters by acoustic signals. In Proc. of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), 2016.Google Scholar
- A. Mandal et al. Beep: 3d indoor positioning using audible sound. In Proc. of the IEEE Consumer Communications and Networking Conference (CCNC), 2005.Google Scholar
- Sharath Adavanne et al. Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE Journal of Selected Topics in Signal Processing, 2018.Google Scholar
- Grégoire Lafay et al. Sound event detection in synthetic audio: Analysis of the dcase 2016 task results. In proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017.Google Scholar
- Jont B Allen and David A Berkley. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 1979.Google ScholarCross Ref
- Sharath Adavanne et al. A multi-room reverberant dataset for sound event localization and detection. In Proc. of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019.Google Scholar
- Google. Dataset: A large-scale dataset of manually annotated audio events. https://research.google.com/audioset/index.html, 2020.Google Scholar
Index Terms
- SoundFactory: a framework for generating datasets for deep learning SELD algorithms
Recommendations
Neural-based approach for localization of sensors in indoor environment
Location of wireless sensor nodes is an important piece of information for many applications. There are many algorithms present in literature based on Received Signal Strength (RSSI) to estimate the location. However the radio signal propagation is ...
Feature Extraction using Spiking Convolutional Neural Networks
ICONS '19: Proceedings of the International Conference on Neuromorphic SystemsSpiking neural networks are biologically plausible counterparts of the artificial neural networks. Conventional (non spiking) artificial neural networks are trained using a stochastic gradient descent algorithm (back propagation) while spiking neural ...
Poster: RSSI-Based Pedestrian Localization Using Artificial Neural Networks
CarSys '17: Proceedings of the 2nd ACM International Workshop on Smart, Autonomous, and Connected Vehicular Systems and ServicesPedestrians are particularly vulnerable traffic participants and, therefore, accurate localization and reliable communication between them and vehicles are of utmost importance to ensure their safety. A common method to determine distances between ...
Comments