Large scale data based audio scene classification

Sophiya, E.; Jothilakshmi, S.

doi:10.1007/s10772-018-9552-3

Large scale data based audio scene classification

Published: 04 September 2018

Volume 21, pages 825–836, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

E. Sophiya¹ &
S. Jothilakshmi²

362 Accesses
3 Altmetric
Explore all metrics

Abstract

Artificial Intelligence and Machine learning has been used by many research groups for processing large scale data known as big data. Machine learning techniques to handle large scale complex datasets are expensive to process computation. Apache Spark framework called spark MLlib is becoming a popular platform for handling big data analysis and it is used for many machine learning problems such as classification, regression and clustering. In this work, Apache Spark and the advanced machine learning architecture of a Deep Multilayer Perceptron (MLP), is proposed for Audio Scene Classification. Log Mel band features are used to represent the characteristics of the input audio scenes. The parameters of the DNN are set according to the DNN baseline of DCASE 2017 challenge. The system is evaluated with TUT dataset (2017) and the result is compared with the baseline provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Abeber, J., Mimilakis, S. I., Grafe, R., & Lukashevich, H. (2017). Acoustic Scene classification by combining autoencoder-based dimensionality reduction and convolutional neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).
Bisot, V., Serizel, R., Essid, S., & Richard, G. (2017). Nonnegative feature learning methods for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE 2017).
Bouguelia, M. R., Verikas, A., Nowaczyk, S., & Santosh, K. C. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319.
Article Google Scholar
Bugalho, M., Portelo, J., Trancoso, I., Pellegrini, T. S., & Abad, A. (2009). Detecting audio events for semantic video search. Interspeech, 2009, 1151–1154.
Google Scholar
Cai, R., Lu, L., Hanjalic, A., Zhang, H. J., & Cai, L. H. (2006). A flexible framework for key audio effects detection and auditory context inference. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 1026–1039.
Article Google Scholar
Candel, A., Lanford, J., LeDell, E., Parmar, V., & Arora, A. (2015) Deep learning with H2O, by H2O.ai, c.
Cotton, C. V., & Ellis, D. P. W. (2011). Spectral vs spectro-temporal features for acoustic event detection. In Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on (pp. 69–72). IEEE.
Dahl, G. E., Sainath, T. N., & Hinton, G. E. (2013). Improving deep neural networks for lvcsr using rectifier linear units and dropout. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8609–8613). IEEE.
Dey, N., & Ashour, A. S. (2018). Challenges and future perspectives in speech-sources direction of arrival estimation and localization. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.
Chapter Google Scholar
Dey, N., & Ashour, A. S. (2018). Applied examples and applications of localization and tracking problem of multiple speech sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.
Chapter Google Scholar
Fonseca, E., Gong, R., Bogdanov, D., Slizovskaia, O., Gomez Gutierrez, E., & Serra, X. (2017). Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).
Gupta, A., Thakur, H. K., Shrivastava, R., Kumar, P., & Nag, S. (2017). A big data analysis framework using apache spark and deep learning. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on (pp. 9–16). IEEE.
Han, Y., Park, J., & Lee, K. (2017). Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).
Heittola, T., Mesaros, A., Eronen, A., & Virtanen, T. (2013). Context-dependent sound event detection. EURASIP Journal on Audio, Speech, and Music Processing, 1, 1–13.
Article Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural network for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29, 82–97.
Article Google Scholar
Jimenez, A., Elizalde, B., & Raj, B. (2017). Acoustic scene classification using shiftinvariant kernels and random features. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).
Kong, Q., Sobieraj, I., Wang, W., & Plumbley, M. D. (2016). Deep neural network baseline for dcase challenge 2016. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).
Kumar, A., & Raj, B. (2016). Audio event and scene recognition: A unified approach using strongly and weakly labeled data. In Neural Networks (IJCNN), 2017 International Joint Conference on (pp. 3475–3482). IEEE.
Laffitte, P., Sodoyer, D., Tatkeu, C., & Girin, L. (2016). Deep neural networks for automatic detection of screams and shouted speech in subway trains. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 6460–6464). IEEE.
Lee, H., Pham, P., Largman, Y., & Ng, A. Y. (2009). Unsuper-vised feature learning for audio classification using convolutional deep belief networks. Advances in Neural Information Processing Systems, 2009, 1096–1104.
Google Scholar
Lim, H., Park, J., & Han, Y. (2017). Rare sound event detection using 1d convolutional recurrent neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).
Mesaros, A., Heittola, T., & Klapuri, A. (2011). Latent semantic analysis in sound event detection. In Signal Processing Conference, 2011 19th European (pp. 1307–1311). IEEE.
Mesaros, A., Heittola, T., & Virtanen, T. (2016). TUT database for acoustic scene classification and sound event detection. In Signal Processing Conference (EUSIPCO), 2016 24th European (pp. 1128–1132). IEEE.
Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency–based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology, 2018, 1–8.
Google Scholar
Nam, J., Hyung, Z., & Lee, K. (2013). Acoustic scene classification using sparse feature learning and selective max-pooling by event detection. In IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2013).
Phan, H., Hertel, L., Maass, M., Koch, P., Mazur, R., & Mertins, A. (2017). Improved audio scene classification based on label-tree embeddings and convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1278–1290.
Article Google Scholar
Rakotomamonjy, A. (2017). Supervised representation learning for audio scene classification. IEEE/ACM Transactions on Audio, Speech, And Language Processing, 25(6), 1253–1265.
Article Google Scholar
Schroder, J., Moritz, N., Anemuller, J., Goetze, S., & Kollmeier, B. (2017). Classifier architectures for acoustic scenes and events: Implications for DNNs, TDNNs, and perceptual features from DCASE 2016. IEEE/ACM Transactions on Audio, Speech and Language Processing, 25(6), 1304–1314.
Article Google Scholar
Schroder, J., Wabnik, S., van Hengel, P. W. J., & Gotze, S. (2011). Detection and classification of acoustic events for in-home care. In Ambient assisted living (pp. 181–195). Berlin: Springer.
Chapter Google Scholar
Vajda, S., & Santosh, K. C. (2017). A fast k-nearest neighbor classifier using unsupervised clustering. In Recent trends in image processing and pattern recognition, CCIS (Vol. 709, pp. 185–193). Singapore: Springer.
Chapter Google Scholar
Valenti, M., Squartini, S., Diment, A., Parascandolo, G., & Virtanen, T. (2016). A convolutional neural network approach for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).
Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., & Sarti, A. (2007). Scream and gunshot detection and localization for audio surveillance systems. In IEEE International Conference on Advanced video and Signal based Surveillance.
Wang, C. H., You, J. K., & Liu, Y. W. (2017). Sound event detection from real-life audio by training a long short-term memory network with mono and stereo features. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).
Wang, D. L., & Brown, G. J. (2006). Fundamentals of computational auditory scene analysis: Principles, algorithms, and applications. Hoboken: Wiley.
Book Google Scholar
Xu, M., Xu, C., Duan, L., Jin, J. S., & Luo, S. (2008). Audio keywords generation for sports video analysis. ACM Transactions on Multimedia Computing, Communications, and Applications, 4(2), 1–23.
Article Google Scholar
Xu, Y., Huang, Q., Wang, W., & Plumbley, M. D. (2016). Hierar-chical learning for Dnn-based acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).
Xu, Y., Huangy, Q., Wang, W., Jackson, P. J. B., & Plumbley, M. D. (2016). Fully Dnn-based multi-label regression for audio tagging. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).
Zahid, S., Hussain, F., Rashid, M., Yousaf, M. H., & Habib, H. A. (2015). Optimized audio classification and segmentation algorithm by using ensemble methods. Hindawi Publishing Corporation Mathematical Problems in Engineering, Article ID 209814, p. 11.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Annamalai University, Annamalainagar, Chidambaram, India
E. Sophiya
Department of Information Technology, Annamalai University, Annamalainagar, Chidambaram, India
S. Jothilakshmi

Authors

E. Sophiya
View author publications
You can also search for this author inPubMed Google Scholar
S. Jothilakshmi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to E. Sophiya.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sophiya, E., Jothilakshmi, S. Large scale data based audio scene classification. Int J Speech Technol 21, 825–836 (2018). https://doi.org/10.1007/s10772-018-9552-3

Download citation

Received: 27 February 2018
Accepted: 26 August 2018
Published: 04 September 2018
Issue Date: 15 December 2018
DOI: https://doi.org/10.1007/s10772-018-9552-3

Keywords

Profiles

E. Sophiya View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large scale data based audio scene classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sample Dropout for Audio Scene Classification Using Multi-scale Dense Connected Convolutional Neural Network

Deep Learning Based Audio Scene Classification

Audio-Based Music Classification with DenseNet and Data Augmentation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Profiles

Subscribe and save

Buy Now

Large scale data based audio scene classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sample Dropout for Audio Scene Classification Using Multi-scale Dense Connected Convolutional Neural Network

Deep Learning Based Audio Scene Classification

Audio-Based Music Classification with DenseNet and Data Augmentation

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now