skip to main content
10.1145/3424978.3425026acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsaeConference Proceedingsconference-collections
research-article

A Novel Multi-class Classification Framework Based on Local OVR Deep Neural Network

Published: 20 October 2020 Publication History

Abstract

Environmental sound classification (ESC) based on single-classifier-type multi-class deep neural network is gaining growing attention in current audio classification research. In order to improve the performance of multi-class deep neural networks, neural architecture search (NAS) can generally be used, but such methods often demand tremendous computational costs. This paper presents a novel multi-class framework based on local one-versus-rest (OVR) deep neural network, which can improve the classification performance of a pre-trained deep neural network at an affordable computational cost. The main idea of the framework is to first identify the weak classification category group (WeakClass group) of the pre-trained network for the actual sample using only the training data. This is achieved by using the training average confidence matrix of the pre-trained deep neural classification network. The next step is to build a sparse array of OVR subnetwork classifiers according to the WeakClass group. Afterwards, the OVR subnetwork classifiers are integrated into the original pre-trained network to form the final multi-class classifier. We apply this framework to ESC problem and the experimental results show that the proposed framework achieves a classification accuracy of 86.8% on the ESC-50 dataset, which is better than other related algorithms.

References

[1]
Li Wei and Li Shuo (2019). Understanding digital audio---a review of general audio/ambient sound-based computer audition. Journal of Fudan University (Natural Science), (3).
[2]
Qiuying Shi (2016). Deep Learning-based and Transfer Learning-based Environment Sound Recognition. (Doctoral dissertation).
[3]
Keith Dana Martin and Barry L Vercoe (1999). Sound-source recognition: a theory and computational model. Phd thesis, MIT.
[4]
Jaeger C P and Laszlo C A (1999). Machine recognition of sound sources. Canadian acoustics, 27(3).
[5]
Swahney N and Maes P (1997). Situational Awareness from Environmental Sounds.
[6]
Davis S and Mermelstein P (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 65--74.
[7]
Couvreur C, Fontaine V, Gaunard P and Mubikangiey C G (1998). Automatic Classification of Environmental Noise Events by Hidden Markov Models. Applied Acoustics, 54(3), 187--206.
[8]
Goldhor R S (1993). Recognition of environmental sounds. International Conference on Acoustics, Speech, and Signal Processing.
[9]
Srinivasan S, Petkovic D and Ponceleon D (1999). Towards robust features for classifying audio in the CueVideo system. ACM multimedia.
[10]
Scheirer E D and Slaney M (1997). Construction and evaluation of a robust multifeature speech/music discriminator. International Conference on Acoustics, Speech, and Signal Processing.
[11]
Couvreur L and Laniray M (2004). Automatic Noise Recognition in Urban Environments Based on Artificial Neural Networks and Hidden Markov Models. Internoise.
[12]
Piczak K J (2015). Environmental sound classification with convolutional neural networks. International Workshop on Machine Learning for Signal Processing.
[13]
Sailor H B, Agrawal D M and Patil H A (2017). Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification. Conference of the International Speech Communication Association.
[14]
Kumar A, Khadkevich M and Fugen C (2018). Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes. International Conference on Acoustics, Speech, and Signal Processing.
[15]
Zoph B, Vasudevan V K, Shlens J and Le Q V (2018). Learning Transferable Architectures for Scalable Image Recognition. Computer Vision and Pattern Recognition.
[16]
Real E, Aggarwal A, Huang Y and Le Q V (2019). Regularized Evolution for Image Classifier Architecture Search. National Conference on Artificial Intelligence.
[17]
Piczak K J (2015). ESC: Dataset for Environmental Sound Classification. ACM multimedia.
[18]
Tak R, Agrawal D M and Patil H A (2017). Novel Phase Encoded Mel Filterbank Energies for Environmental Sound Classification. Pattern Recognition and Machine Intelligence.
[19]
Tokozume Y, Ushiku Y and Harada T (2018). Learning from Between-class Examples for Deep Sound Recognition. International Conference on Learning Representations.

Cited By

View all
  • (2023)Risevi: A Disease Risk Prediction Model Based on Vision Transformer Applied to Nursing HomesElectronics10.3390/electronics1215320612:15(3206)Online publication date: 25-Jul-2023

Index Terms

  1. A Novel Multi-class Classification Framework Based on Local OVR Deep Neural Network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CSAE '20: Proceedings of the 4th International Conference on Computer Science and Application Engineering
    October 2020
    1038 pages
    ISBN:9781450377720
    DOI:10.1145/3424978
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CNN
    2. Environmental sound classification (ESC)
    3. Local OVR
    4. WeakClass Group

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CSAE 2020

    Acceptance Rates

    CSAE '20 Paper Acceptance Rate 179 of 387 submissions, 46%;
    Overall Acceptance Rate 368 of 770 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Risevi: A Disease Risk Prediction Model Based on Vision Transformer Applied to Nursing HomesElectronics10.3390/electronics1215320612:15(3206)Online publication date: 25-Jul-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media