Abstract
Sound is a ubiquitous natural phenomenon that contains a wealth of information that constantly enhances our understanding of the objective world. With the continuous development of computer network technology and communication technology, audio information has become a very important part. Audio is a non-semantic symbolic representation and an unstructured binary stream. Because the audio itself lacks the description of content semantics and structured organization, it brings great difficulty to the audio classification work. The research of digital audio classification will become more and more important with the increasing number of digital audio resources in the network. Digital audio classification technology is the key technology to solve this problem. It is the key to solve the problem of audio structure and extract audio structured information and content semantics. It is a research hot spot in the field of audio analysis. It has important application value in many fields, such as audio retrieval, video summary and auxiliary video analysis. This paper studies the structure of audio, the analysis and extraction of audio features, the digital audio classifier based on support vector machines (SVM) and the audio segmentation technology based on BCI. SVM is an important achievement of machine learning research in recent years. As a new machine learning method, SVM can solve practical problems such as small sample, nonlinearity and high dimension, so it has become a new research hot spot after the study of neural network. Experiments show that the SVM-based audio classification algorithm has good classification effect, and the smoothed audio segmentation results are more accurate. With the further development of the research, the research results will be well applied in practice.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Zhang T (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 96(4):440457
Kumar M, Mao YH, Wang YH et al (2017) Fuzzy theoretic approach to signals and systems: static systems. Inf Sci 418:668–702
Zhang WP, Yang JZ, Fang YL et al (2017) Analytical fuzzy approach to biological data analysis. Saudi J Biol Sci 24(3):563–573
Duda RO, Hart PE, Stork DG (2001) Pattern classification, vol 2. Wiley, New York
Molla Md.KI, Hirose K (2004) On the effectiveness of MFCCs and their statistical distribution properties in speaker identification. In: IEEE international conference on virtual environments, human–computer interfaces and measurement systems, pp 136–141
Picone JW (1976) Signal modeling techniques in speech recognition. Proc IEEE 79(4):157–161
Zhou B, Hansen JH (2005) Efficient audio stream segmentation via the combined T2 statistic and Bayesian information criterion. IEEE Trans Speech Audio Process 13(4):467
Seheirer E, Slaney M (1997, April) Construction and evaluation of a robust multifeature music/speech discriminator. In: Proceedings of ICASSP 97
Vernstrom T, Gaensler BM, Brown S et al (2017) Low frequency radio constraints on the synchrotron cosmic web. Mon Not R Astron Soc 467(4):4914–4936
Reynolds DA, Rose RC (1995) Text-independent speaker identification using Gaussian mixture speaker models. In: IEEE Transaction on SAP, pp 72–83
Li SZ (2000) Content-Based classification and retrieval of audio using the nearest feature line method. IEEE Trans Speech Audio Process 8(5):619–625
Feiten B, Frank R, Ungvary T (1991) Organization of sounds with neural nets. In: Proceedings of the 1991 international computer music conference. International computer music association, San Francisco, pp 441–444
Liang B, Yaali H, Songyang L, Jianyun C, Lingda W (2004) Feature analysis and extraction for audio automatic classification. In: The International workshop on image, video, audio retrieval and mining, Canada
Lu L, Jiang H, Zhang HJ (2001) A robust audio classification and segmentation method. In: Proceedings of the 9th ACM international conference on multimedia, pp 203–211
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Shirvani A, Chegini H, Setayeshi S et al. (2009) Polynomial kernel function and its application in locally polynomial neurofuzzy models. In: International CSI computer conference. IEEE, pp 54–59
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Kim H, Elter D, Sikora T (2005) Hybrid speaker-based segmentation system using model-level clustering. In: Proceedings of the IEEE international conference onacoustics speech, and signal processing, pp 745–748
Chen S, Gopalakrishnan PS (1998) Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: Proceedings of the speech recognition workshop
L Lu, H-J Zhang (2002) Real-time unsupervised speaker change detection. In: 6th International conference on pattern recognition, pp 358–361
Cheng SS, Wang HM, Fu HC (2008) BIC-based audio segmentation by divide and conquer. In: Proceedings of ICASSP 2008. IEEE Press, Las Vegas, pp 4841–4844
Chen S, Gopalakrishnan R (1998) Speaker environment and channel change detection and clustering via the bayesian information criterion. In: Proceedings of DARPA broadecast news transcription and understanding workshop, Lansdowne, VA, USA, pp 127–132
Cettolo M, Vescovi M. (2003) Efficient audio segmentation algorithms based on the BIC. In: Proceedings of the international conference on acoustics, speech, and signal processing, Hong Kong, China, pp 537–540
Acknowledgements
This work was supported by Chongqing Big Data Engineering Laboratory for Children, Chongqing Electronics Engineering Technology Research Center for Interactive Learning, the Science and Technology Research Project of Chongqing Municipal Education Commission of China (No. KJ1601401), the Science and Technology Research Project of Chongqing University of Education (No. KY201725C), Basic Research and Frontier Exploration of Chongqing Science and Technology Commission (CSTC2014jcyjA40019), Project of Science and Technology Research Program of Chongqing Education Commission of China (N0. KJZD-K201801601).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wei, P., He, F., Li, L. et al. Research on sound classification based on SVM. Neural Comput & Applic 32, 1593–1607 (2020). https://doi.org/10.1007/s00521-019-04182-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04182-0