extended-abstract

SoundFactory: a framework for generating datasets for deep learning SELD algorithms

Authors:
A. Scionti

LINKS Foundation, Italy

LINKS Foundation, Italy
View Profile

,
S. Ciccia

LINKS Foundation, Italy

LINKS Foundation, Italy
View Profile

,
O. Terzo

LINKS Foundation, Italy

LINKS Foundation, Italy
View Profile

CF '20: Proceedings of the 17th ACM International Conference on Computing FrontiersMay 2020Pages 253–256https://doi.org/10.1145/3387902.3394036

Published:23 May 2020Publication History

CF '20: Proceedings of the 17th ACM International Conference on Computing Frontiers

Pages 253–256

ABSTRACT

The proliferation of smart connected devices using digital assistants activated by voice commands (e.g., Apple Siri, Google Assistant, Amazon Alexa, etc.) is raising the interest in algorithms to localize and recognize audio sources. Among the others, deep neural networks (DNNs) are seen as a promising approach to accomplish such task. Unlike other approaches, DNNs can categorize received events, thus discriminating between events of interests and not even in presence of noise. Despite their advantages, DNNs require large datasets to be trained. Thus, tools for generating datasets are of great value, being able to accelerate the development of advanced learning models.

This paper presents SoundFactory, a framework for simulating the propagation of sound waves (also considering noise, reverberation, reflection, attenuation, and other interfering waves) and the microphone array response to such sound waves. As such, SoundFactory allows to easily generate datasets to train deep neural networks which are at the basis of modern applications. SoundFactory is flexible enough to simulate many different microphone array configurations, thus covering a large set of use cases. To demonstrate the capabilities offered by SoundFactory, we generated a dataset and trained two different (rather simple) learning models against them, achieving up to 97% of accuracy. The quality of the generated dataset has been also assessed comparing the microphone array model responses with the real ones.

References

Faheem Zafari et al. A survey of indoor localization systems and technologies. IEEE Communications Surveys and Tutorials, 2019.Google Scholar
Wenchao Huang et al. A methodology for implementing highly concurrent data objects. IEEE Transactions on Mobile Computing, 2014.Google Scholar
Faheem Ijaz et al. Indoor positioning: A review of indoor ultrasonic positioning systems. In Proc. of the 15th International Conference on Advanced Communications Technology (ICACT), 2013.Google Scholar
Annamaria Mesaros et al. Acoustic event detection in real life recordings. In Proc. of the 18th EUropean SIgnal Processing COnference (EUSIPCO), 2010.Google Scholar
Tomoki Hayashi et al. Duration-controlled lstm for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017.Google Scholar
Emre Cakir et al. Polyphonic sound event detection using multi label deep neural networks. In Proc. of the International Joint Conference on Neural Networks (IJCNN), 2015.Google Scholar
Kaikai Liu et al. Guoguo: enabling fine-grained indoor localization via smartphone. In Proc. of the 11th annual international conference on Mobile systems, applications, and services (MobiSys), 2013.Google Scholar
Wenchao Huang et al. Walkielokie: sensing relative positions of surrounding presenters by acoustic signals. In Proc. of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), 2016.Google Scholar
A. Mandal et al. Beep: 3d indoor positioning using audible sound. In Proc. of the IEEE Consumer Communications and Networking Conference (CCNC), 2005.Google Scholar
Sharath Adavanne et al. Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE Journal of Selected Topics in Signal Processing, 2018.Google Scholar
Grégoire Lafay et al. Sound event detection in synthetic audio: Analysis of the dcase 2016 task results. In proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017.Google Scholar
Jont B Allen and David A Berkley. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 1979.Google ScholarCross Ref
Sharath Adavanne et al. A multi-room reverberant dataset for sound event localization and detection. In Proc. of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019.Google Scholar
Google. Dataset: A large-scale dataset of manually annotated audio events. https://research.google.com/audioset/index.html, 2020.Google Scholar

Index Terms

SoundFactory: a framework for generating datasets for deep learning SELD algorithms
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
  2. Modeling and simulation
    1. Simulation support systems
      1. Simulation environments

Recommendations

Neural-based approach for localization of sensors in indoor environment

Location of wireless sensor nodes is an important piece of information for many applications. There are many algorithms present in literature based on Received Signal Strength (RSSI) to estimate the location. However the radio signal propagation is ...
Read More
Feature Extraction using Spiking Convolutional Neural Networks
ICONS '19: Proceedings of the International Conference on Neuromorphic Systems

Spiking neural networks are biologically plausible counterparts of the artificial neural networks. Conventional (non spiking) artificial neural networks are trained using a stochastic gradient descent algorithm (back propagation) while spiking neural ...
Read More
Poster: RSSI-Based Pedestrian Localization Using Artificial Neural Networks
CarSys '17: Proceedings of the 2nd ACM International Workshop on Smart, Autonomous, and Connected Vehicular Systems and Services

Pedestrians are particularly vulnerable traffic participants and, therefore, accurate localization and reliable communication between them and vehicles are of utmost importance to ensure their safety. A common method to determine distances between ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CF '20: Proceedings of the 17th ACM International Conference on Computing Frontiers
May 2020
298 pages
ISBN:9781450379564
DOI:10.1145/3387902
General Chairs:
Maurizio Palesi
University of Catania, IT
,
Gianluca Palermo
Politecnico di Milano, IT
,
Program Chairs:
Cat Graves
Hewlett Packard Labs
,
Eishi Arima
ITC University of Tokyo, JP
Copyright © 2020 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 May 2020
Check for updates
Author Tags
datasets
detection
localization
neural networks
Qualifiers
- extended-abstract
Conference

Acceptance Rates
Overall Acceptance Rate240of680submissions,35%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 88
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SoundFactory: a framework for generating datasets for deep learning SELD algorithms

CF '20: Proceedings of the 17th ACM International Conference on Computing Frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Neural-based approach for localization of sensors in indoor environment

Feature Extraction using Spiking Convolutional Neural Networks

Poster: RSSI-Based Pedestrian Localization Using Artificial Neural Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SoundFactory: a framework for generating datasets for deep learning SELD algorithms

CF '20: Proceedings of the 17th ACM International Conference on Computing Frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Neural-based approach for localization of sensors in indoor environment

Feature Extraction using Spiking Convolutional Neural Networks

Poster: RSSI-Based Pedestrian Localization Using Artificial Neural Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media