Abstract
Automated audio segmentation and classification play important roles in multimedia content analysis. In this paper, we propose an enhanced approach, called the correlation intensive fuzzy c-means (CIFCM) algorithm, to audio segmentation and classification that is based on audio content analysis. While conventional methods work by considering the attributes of only the current frame or segment, the proposed CIFCM algorithm efficiently incorporates the influence of neighboring frames or segments in the audio stream. With this method, audio-cuts can be detected efficiently even when the signal contains audio effects such as fade-in, fade-out, and cross-fade. A number of audio features are analyzed in this paper to explore the differences between various types of audio data. The proposed CIFCM algorithm works by detecting the boundaries between different kinds of sounds and classifying them into clusters such as silence, speech, music, speech with music, and speech with noise. Our experimental results indicate that the proposed method outperforms the state-of-the-art FCM approach in terms of audio segmentation and classification.
Similar content being viewed by others
References
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Pleum Press, New York
Bezdek JC, Keller J, Krisnapuram R, Pal NR (2005) Fuzzy models and algorithms for pattern recognition and image processing. Kluwer Academic Publishers, Norwell
Bezdek JC, Nikhil R (1995) On cluster validity for the fuzzy c-means model. IEEE Trans on Fuzzy Syst 3(3):370–379
Chen L, Gunduz S, and Ozsu MT. (2006) Mixed type audio classification with support vector machine. IEEE Int. Conf. on Multimedia and Expo, 781–784
Fukuyama Y and Sugeno M. (1989) A new method for Fuzzy clustering. 5th Fuzzy System Symp., 247–250
Gang C, Hui T, Xin-meng C (2005) Audio segmentation via the similarity measure of audio feature vectors. Wuhan Univ J Natur Sci 10(5):833–837
Krinidis S, Chatzis V (2010) A Robust fuzzy local information c-means clustering algorithm. IEEE Trans Image Process 19(5):1328–1337
Liu Z, Huang J, and Yang Y. (1998) Classification of TV programs based on audio information using hidden markov model. IEEE 2nd Workshop on Multimedia Sig Process, 27–32
Liu Z, Wang Y (1998) Audio feature extraction and analysis for scene segmentation and classification. J VLSI Sign Process 20:61–79
Lu L, Zhang HJ, Jiang H (2002) Content analysis for audio classification and segmentation. IEEE Trans Speech Audio Process 10(7):504–516
Luong HV and Kim J-M. (2009) A Generalized spatial fuzzy C-means algorithm for medical image segmentation. IEEE Int. Conf. on Fuzzy Systems, 409–414
Nitanda N, Haseyama M, Kitajima H (2006) Audio signal segmentation and classification using fuzzy c-means clustering. Syst Comput Jpn 37(4):23–34
Park DC (2009) Classification of audio signals using fuzzy c-means with Divergence-based Kernel. Pattern Recognit Lett 30(9):794–798
Park DC, Nguyen DH, Beack SH, Park S (2005) Classification of audio signals using Gradient-based fuzzy c-means algorithm with divergence measure. Adv Multimedia Inf Process PCM 2005:698–708
Park DC, Tran CN, Min BJ, and Park S. (2006) Modeling and classification of audio signals using Gradient-based fuzzy C-means algorithm with a Mercer Kernel. In 9th Pacific Rim International Conference on Artificial Intelligence: 1104–1108
Tzanetakis G, Cook P (2002) Music genre classification of audio signals. IEEE Trans on Speech Audio Process 10(5):293–302
Wang JC, Wang JF, Lin CB, Jian KT, Kuok WH (2006) Content-based audio classification using support vector machines and independent component analysis. 18th Int. Conf. on. Pattern Recognit 4:157–160
Wold E, Blum T, Keislar D, Wheaton J (1996) Content-based classification search and retrieval of audio. IEEE Multimedia Mag 3:27–36
Xie XL, Beni GA (1991) A validity measure for fuzzy clustering. IEEE Trans on Pattern Anal Mach Intell 13(8):841–847
Zhang T, Kuo C-CJ (2001) Audio content analysis for online audiovisual data segmetation and classification. IEEE Trans on Speech Audio Process 9(4):441–457
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2011-0017941)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Haque, M.A., Kim, JM. An enhanced fuzzy c-means algorithm for audio segmentation and classification. Multimed Tools Appl 63, 485–500 (2013). https://doi.org/10.1007/s11042-011-0921-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-011-0921-z