Skip to main content
Log in

Research on sound classification based on SVM

  • Deep Learning for Big Data Analytics
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Sound is a ubiquitous natural phenomenon that contains a wealth of information that constantly enhances our understanding of the objective world. With the continuous development of computer network technology and communication technology, audio information has become a very important part. Audio is a non-semantic symbolic representation and an unstructured binary stream. Because the audio itself lacks the description of content semantics and structured organization, it brings great difficulty to the audio classification work. The research of digital audio classification will become more and more important with the increasing number of digital audio resources in the network. Digital audio classification technology is the key technology to solve this problem. It is the key to solve the problem of audio structure and extract audio structured information and content semantics. It is a research hot spot in the field of audio analysis. It has important application value in many fields, such as audio retrieval, video summary and auxiliary video analysis. This paper studies the structure of audio, the analysis and extraction of audio features, the digital audio classifier based on support vector machines (SVM) and the audio segmentation technology based on BCI. SVM is an important achievement of machine learning research in recent years. As a new machine learning method, SVM can solve practical problems such as small sample, nonlinearity and high dimension, so it has become a new research hot spot after the study of neural network. Experiments show that the SVM-based audio classification algorithm has good classification effect, and the smoothed audio segmentation results are more accurate. With the further development of the research, the research results will be well applied in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    Book  Google Scholar 

  2. Zhang T (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 96(4):440457

    Google Scholar 

  3. Kumar M, Mao YH, Wang YH et al (2017) Fuzzy theoretic approach to signals and systems: static systems. Inf Sci 418:668–702

    Article  Google Scholar 

  4. Zhang WP, Yang JZ, Fang YL et al (2017) Analytical fuzzy approach to biological data analysis. Saudi J Biol Sci 24(3):563–573

    Article  Google Scholar 

  5. Duda RO, Hart PE, Stork DG (2001) Pattern classification, vol 2. Wiley, New York

    MATH  Google Scholar 

  6. Molla Md.KI, Hirose K (2004) On the effectiveness of MFCCs and their statistical distribution properties in speaker identification. In: IEEE international conference on virtual environments, human–computer interfaces and measurement systems, pp 136–141

  7. Picone JW (1976) Signal modeling techniques in speech recognition. Proc IEEE 79(4):157–161

    Google Scholar 

  8. Zhou B, Hansen JH (2005) Efficient audio stream segmentation via the combined T2 statistic and Bayesian information criterion. IEEE Trans Speech Audio Process 13(4):467

    Article  Google Scholar 

  9. Seheirer E, Slaney M (1997, April) Construction and evaluation of a robust multifeature music/speech discriminator. In: Proceedings of ICASSP 97

  10. Vernstrom T, Gaensler BM, Brown S et al (2017) Low frequency radio constraints on the synchrotron cosmic web. Mon Not R Astron Soc 467(4):4914–4936

    Article  Google Scholar 

  11. Reynolds DA, Rose RC (1995) Text-independent speaker identification using Gaussian mixture speaker models. In: IEEE Transaction on SAP, pp 72–83

  12. Li SZ (2000) Content-Based classification and retrieval of audio using the nearest feature line method. IEEE Trans Speech Audio Process 8(5):619–625

    Article  MathSciNet  Google Scholar 

  13. Feiten B, Frank R, Ungvary T (1991) Organization of sounds with neural nets. In: Proceedings of the 1991 international computer music conference. International computer music association, San Francisco, pp 441–444

  14. Liang B, Yaali H, Songyang L, Jianyun C, Lingda W (2004) Feature analysis and extraction for audio automatic classification. In: The International workshop on image, video, audio retrieval and mining, Canada

  15. Lu L, Jiang H, Zhang HJ (2001) A robust audio classification and segmentation method. In: Proceedings of the 9th ACM international conference on multimedia, pp 203–211

  16. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  17. Shirvani A, Chegini H, Setayeshi S et al. (2009) Polynomial kernel function and its application in locally polynomial neurofuzzy models. In: International CSI computer conference. IEEE, pp 54–59

  18. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  19. Kim H, Elter D, Sikora T (2005) Hybrid speaker-based segmentation system using model-level clustering. In: Proceedings of the IEEE international conference onacoustics speech, and signal processing, pp 745–748

  20. Chen S, Gopalakrishnan PS (1998) Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: Proceedings of the speech recognition workshop

  21. L Lu, H-J Zhang (2002) Real-time unsupervised speaker change detection. In: 6th International conference on pattern recognition, pp 358–361

  22. Cheng SS, Wang HM, Fu HC (2008) BIC-based audio segmentation by divide and conquer. In: Proceedings of ICASSP 2008. IEEE Press, Las Vegas, pp 4841–4844

  23. Chen S, Gopalakrishnan R (1998) Speaker environment and channel change detection and clustering via the bayesian information criterion. In: Proceedings of DARPA broadecast news transcription and understanding workshop, Lansdowne, VA, USA, pp 127–132

  24. Cettolo M, Vescovi M. (2003) Efficient audio segmentation algorithms based on the BIC. In: Proceedings of the international conference on acoustics, speech, and signal processing, Hong Kong, China, pp 537–540

Download references

Acknowledgements

This work was supported by Chongqing Big Data Engineering Laboratory for Children, Chongqing Electronics Engineering Technology Research Center for Interactive Learning, the Science and Technology Research Project of Chongqing Municipal Education Commission of China (No. KJ1601401), the Science and Technology Research Project of Chongqing University of Education (No. KY201725C), Basic Research and Frontier Exploration of Chongqing Science and Technology Commission (CSTC2014jcyjA40019), Project of Science and Technology Research Program of Chongqing Education Commission of China (N0. KJZD-K201801601).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengcheng Wei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, P., He, F., Li, L. et al. Research on sound classification based on SVM. Neural Comput & Applic 32, 1593–1607 (2020). https://doi.org/10.1007/s00521-019-04182-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04182-0

Keywords

Navigation