skip to main content
10.1145/3387902.3394036acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
extended-abstract

SoundFactory: a framework for generating datasets for deep learning SELD algorithms

Published:23 May 2020Publication History

ABSTRACT

The proliferation of smart connected devices using digital assistants activated by voice commands (e.g., Apple Siri, Google Assistant, Amazon Alexa, etc.) is raising the interest in algorithms to localize and recognize audio sources. Among the others, deep neural networks (DNNs) are seen as a promising approach to accomplish such task. Unlike other approaches, DNNs can categorize received events, thus discriminating between events of interests and not even in presence of noise. Despite their advantages, DNNs require large datasets to be trained. Thus, tools for generating datasets are of great value, being able to accelerate the development of advanced learning models.

This paper presents SoundFactory, a framework for simulating the propagation of sound waves (also considering noise, reverberation, reflection, attenuation, and other interfering waves) and the microphone array response to such sound waves. As such, SoundFactory allows to easily generate datasets to train deep neural networks which are at the basis of modern applications. SoundFactory is flexible enough to simulate many different microphone array configurations, thus covering a large set of use cases. To demonstrate the capabilities offered by SoundFactory, we generated a dataset and trained two different (rather simple) learning models against them, achieving up to 97% of accuracy. The quality of the generated dataset has been also assessed comparing the microphone array model responses with the real ones.

References

  1. Faheem Zafari et al. A survey of indoor localization systems and technologies. IEEE Communications Surveys and Tutorials, 2019.Google ScholarGoogle Scholar
  2. Wenchao Huang et al. A methodology for implementing highly concurrent data objects. IEEE Transactions on Mobile Computing, 2014.Google ScholarGoogle Scholar
  3. Faheem Ijaz et al. Indoor positioning: A review of indoor ultrasonic positioning systems. In Proc. of the 15th International Conference on Advanced Communications Technology (ICACT), 2013.Google ScholarGoogle Scholar
  4. Annamaria Mesaros et al. Acoustic event detection in real life recordings. In Proc. of the 18th EUropean SIgnal Processing COnference (EUSIPCO), 2010.Google ScholarGoogle Scholar
  5. Tomoki Hayashi et al. Duration-controlled lstm for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017.Google ScholarGoogle Scholar
  6. Emre Cakir et al. Polyphonic sound event detection using multi label deep neural networks. In Proc. of the International Joint Conference on Neural Networks (IJCNN), 2015.Google ScholarGoogle Scholar
  7. Kaikai Liu et al. Guoguo: enabling fine-grained indoor localization via smartphone. In Proc. of the 11th annual international conference on Mobile systems, applications, and services (MobiSys), 2013.Google ScholarGoogle Scholar
  8. Wenchao Huang et al. Walkielokie: sensing relative positions of surrounding presenters by acoustic signals. In Proc. of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), 2016.Google ScholarGoogle Scholar
  9. A. Mandal et al. Beep: 3d indoor positioning using audible sound. In Proc. of the IEEE Consumer Communications and Networking Conference (CCNC), 2005.Google ScholarGoogle Scholar
  10. Sharath Adavanne et al. Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE Journal of Selected Topics in Signal Processing, 2018.Google ScholarGoogle Scholar
  11. Grégoire Lafay et al. Sound event detection in synthetic audio: Analysis of the dcase 2016 task results. In proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017.Google ScholarGoogle Scholar
  12. Jont B Allen and David A Berkley. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  13. Sharath Adavanne et al. A multi-room reverberant dataset for sound event localization and detection. In Proc. of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019.Google ScholarGoogle Scholar
  14. Google. Dataset: A large-scale dataset of manually annotated audio events. https://research.google.com/audioset/index.html, 2020.Google ScholarGoogle Scholar

Index Terms

  1. SoundFactory: a framework for generating datasets for deep learning SELD algorithms

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CF '20: Proceedings of the 17th ACM International Conference on Computing Frontiers
        May 2020
        298 pages
        ISBN:9781450379564
        DOI:10.1145/3387902

        Copyright © 2020 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 May 2020

        Check for updates

        Qualifiers

        • extended-abstract

        Acceptance Rates

        Overall Acceptance Rate240of680submissions,35%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader