Skip to main content

Advertisement

Log in

Large scale data based audio scene classification

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Artificial Intelligence and Machine learning has been used by many research groups for processing large scale data known as big data. Machine learning techniques to handle large scale complex datasets are expensive to process computation. Apache Spark framework called spark MLlib is becoming a popular platform for handling big data analysis and it is used for many machine learning problems such as classification, regression and clustering. In this work, Apache Spark and the advanced machine learning architecture of a Deep Multilayer Perceptron (MLP), is proposed for Audio Scene Classification. Log Mel band features are used to represent the characteristics of the input audio scenes. The parameters of the DNN are set according to the DNN baseline of DCASE 2017 challenge. The system is evaluated with TUT dataset (2017) and the result is compared with the baseline provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Abeber, J., Mimilakis, S. I., Grafe, R., & Lukashevich, H. (2017). Acoustic Scene classification by combining autoencoder-based dimensionality reduction and convolutional neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).

  • Bisot, V., Serizel, R., Essid, S., & Richard, G. (2017). Nonnegative feature learning methods for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE 2017).

  • Bouguelia, M. R., Verikas, A., Nowaczyk, S., & Santosh, K. C. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319.

    Article  Google Scholar 

  • Bugalho, M., Portelo, J., Trancoso, I., Pellegrini, T. S., & Abad, A. (2009). Detecting audio events for semantic video search. Interspeech, 2009, 1151–1154.

    Google Scholar 

  • Cai, R., Lu, L., Hanjalic, A., Zhang, H. J., & Cai, L. H. (2006). A flexible framework for key audio effects detection and auditory context inference. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 1026–1039.

    Article  Google Scholar 

  • Candel, A., Lanford, J., LeDell, E., Parmar, V., & Arora, A. (2015) Deep learning with H2O, by H2O.ai, c.

  • Cotton, C. V., & Ellis, D. P. W. (2011). Spectral vs spectro-temporal features for acoustic event detection. In Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on (pp. 69–72). IEEE.

  • Dahl, G. E., Sainath, T. N., & Hinton, G. E. (2013). Improving deep neural networks for lvcsr using rectifier linear units and dropout. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8609–8613). IEEE.

  • Dey, N., & Ashour, A. S. (2018). Challenges and future perspectives in speech-sources direction of arrival estimation and localization. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.

    Chapter  Google Scholar 

  • Dey, N., & Ashour, A. S. (2018). Applied examples and applications of localization and tracking problem of multiple speech sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.

    Chapter  Google Scholar 

  • Fonseca, E., Gong, R., Bogdanov, D., Slizovskaia, O., Gomez Gutierrez, E., & Serra, X. (2017). Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).

  • Gupta, A., Thakur, H. K., Shrivastava, R., Kumar, P., & Nag, S. (2017). A big data analysis framework using apache spark and deep learning. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on (pp. 9–16). IEEE.

  • Han, Y., Park, J., & Lee, K. (2017). Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).

  • Heittola, T., Mesaros, A., Eronen, A., & Virtanen, T. (2013). Context-dependent sound event detection. EURASIP Journal on Audio, Speech, and Music Processing, 1, 1–13.

    Article  Google Scholar 

  • Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural network for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29, 82–97.

    Article  Google Scholar 

  • Jimenez, A., Elizalde, B., & Raj, B. (2017). Acoustic scene classification using shiftinvariant kernels and random features. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).

  • Kong, Q., Sobieraj, I., Wang, W., & Plumbley, M. D. (2016). Deep neural network baseline for dcase challenge 2016. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).

  • Kumar, A., & Raj, B. (2016). Audio event and scene recognition: A unified approach using strongly and weakly labeled data. In Neural Networks (IJCNN), 2017 International Joint Conference on (pp. 3475–3482). IEEE.

  • Laffitte, P., Sodoyer, D., Tatkeu, C., & Girin, L. (2016). Deep neural networks for automatic detection of screams and shouted speech in subway trains. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 6460–6464). IEEE.

  • Lee, H., Pham, P., Largman, Y., & Ng, A. Y. (2009). Unsuper-vised feature learning for audio classification using convolutional deep belief networks. Advances in Neural Information Processing Systems, 2009, 1096–1104.

    Google Scholar 

  • Lim, H., Park, J., & Han, Y. (2017). Rare sound event detection using 1d convolutional recurrent neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).

  • Mesaros, A., Heittola, T., & Klapuri, A. (2011). Latent semantic analysis in sound event detection. In Signal Processing Conference, 2011 19th European (pp. 1307–1311). IEEE.

  • Mesaros, A., Heittola, T., & Virtanen, T. (2016). TUT database for acoustic scene classification and sound event detection. In Signal Processing Conference (EUSIPCO), 2016 24th European (pp. 1128–1132). IEEE.

  • Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency–based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology, 2018, 1–8.

    Google Scholar 

  • Nam, J., Hyung, Z., & Lee, K. (2013). Acoustic scene classification using sparse feature learning and selective max-pooling by event detection. In IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2013).

  • Phan, H., Hertel, L., Maass, M., Koch, P., Mazur, R., & Mertins, A. (2017). Improved audio scene classification based on label-tree embeddings and convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1278–1290.

    Article  Google Scholar 

  • Rakotomamonjy, A. (2017). Supervised representation learning for audio scene classification. IEEE/ACM Transactions on Audio, Speech, And Language Processing, 25(6), 1253–1265.

    Article  Google Scholar 

  • Schroder, J., Moritz, N., Anemuller, J., Goetze, S., & Kollmeier, B. (2017). Classifier architectures for acoustic scenes and events: Implications for DNNs, TDNNs, and perceptual features from DCASE 2016. IEEE/ACM Transactions on Audio, Speech and Language Processing, 25(6), 1304–1314.

    Article  Google Scholar 

  • Schroder, J., Wabnik, S., van Hengel, P. W. J., & Gotze, S. (2011). Detection and classification of acoustic events for in-home care. In Ambient assisted living (pp. 181–195). Berlin: Springer.

    Chapter  Google Scholar 

  • Vajda, S., & Santosh, K. C. (2017). A fast k-nearest neighbor classifier using unsupervised clustering. In Recent trends in image processing and pattern recognition, CCIS (Vol. 709, pp. 185–193). Singapore: Springer.

    Chapter  Google Scholar 

  • Valenti, M., Squartini, S., Diment, A., Parascandolo, G., & Virtanen, T. (2016). A convolutional neural network approach for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).

  • Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., & Sarti, A. (2007). Scream and gunshot detection and localization for audio surveillance systems. In IEEE International Conference on Advanced video and Signal based Surveillance.

  • Wang, C. H., You, J. K., & Liu, Y. W. (2017). Sound event detection from real-life audio by training a long short-term memory network with mono and stereo features. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).

  • Wang, D. L., & Brown, G. J. (2006). Fundamentals of computational auditory scene analysis: Principles, algorithms, and applications. Hoboken: Wiley.

    Book  Google Scholar 

  • Xu, M., Xu, C., Duan, L., Jin, J. S., & Luo, S. (2008). Audio keywords generation for sports video analysis. ACM Transactions on Multimedia Computing, Communications, and Applications, 4(2), 1–23.

    Article  Google Scholar 

  • Xu, Y., Huang, Q., Wang, W., & Plumbley, M. D. (2016). Hierar-chical learning for Dnn-based acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).

  • Xu, Y., Huangy, Q., Wang, W., Jackson, P. J. B., & Plumbley, M. D. (2016). Fully Dnn-based multi-label regression for audio tagging. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).

  • Zahid, S., Hussain, F., Rashid, M., Yousaf, M. H., & Habib, H. A. (2015). Optimized audio classification and segmentation algorithm by using ensemble methods. Hindawi Publishing Corporation Mathematical Problems in Engineering, Article ID 209814, p. 11.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. Sophiya.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sophiya, E., Jothilakshmi, S. Large scale data based audio scene classification. Int J Speech Technol 21, 825–836 (2018). https://doi.org/10.1007/s10772-018-9552-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9552-3

Keywords

Navigation