Abstract
This paper characterizes a new method for video–soundtrack retrieval based on environmental sounds. Actually, a set of 26 semantic audio concepts is employed. This set is chosen for its helpfulness to the users in terms of video browsing. Additionally, a set of 2000 videos has been annotated with these concepts. To enhance a new signal processing, we start with the separation of the audio sources. In addition, using a fundamental representation of the audio signal as a sequence of Mel Frequency Cepstral Coefficient, we can carry out experiments with three signal representations: the Support Vector machines, the Gaussian Mixture Model and the Hidden Markov Model. Throughout the experiment synthesis, we maintain the Gaussian Mixture Model classifier based on the Kullback–Leibler distance measure. As a matter of fact, we preserve this audio concept classification to integrate a video retrieval system. Hence, the obtained results mirror the effectiveness of our approaches in distinguishing environmental sound and researching video.










Similar content being viewed by others
References
Saunders J, Lockheed Martin Co (1996) Real-time discrimination of broadcast speech/music. In: IEEE International Conference on Acoustic, Speech, Signal Process, Atlanta, pp 993–996
Williams G, Ellis, Daniel PW (1999) Speech/music discrimination based on posterior probability features. In: 6th European Conference on Speech Communication and Technology. Budapest
Scheirer E, Slaney M (1997) Construction and evaluation of a robust multifeature speech/music discriminator. In: IEEE International Conferences on Acoust, Speech, Signal Process, Munich, pp 1331–1334
Ajmera J, McCowan I, Bourlard H (2003) Speech/music segmentation using entropy and dynamism features in a HMM classification framework. Elsevier Speech Commun 40(3):351–363
Zhang T, Kuo C-CJ (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 9(4):441–457 Fall
Guo G, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. IEEE Trans Neural Netw 14(1):209–215
Wold E, Blum T, Wheaton J (1996) Content-based classification, search and retrieval of audio. IEEE Trans Multimed 3(3):27–36
Malkin R, Waibel A (2005) Classifying user environments for mobile applications using linear autoencoding of ambient audio. Proc IEEE Int Conf Acoustic Speech Signal Process 5:509–512
Milner BL, Smith D (2006) Acoustic environment classification. ACM Trans Speech Lang Process 3(2):1–22
Chu S, Narayanan S, Kuo C-CJ (2006) Content analysis for acoustic environment classification in mobile robots. In: International Conference on Aurally Informed Performance: Integrating Machine Listening and Auditory Presentation in Robotic System, Arlington, pp 16–21
Su F, Yang L, Lu T, Wang G (2011) Environmental sound classification for scene recognition using local discriminant bases and hmm. In: 19th ACM international conference on Multimedia, Newyork, pp 1389–1392
Okuyucu C, Sert M, Yazici A (2013) Audio feature and classifier analysis for efficient recognition of environmental sounds. IEEE International Symposium on Multimedia. Anaheim, pp 125–132
Xia-qing X, Quan-wei B, Lei H, Xu W (2013) Study and application of semantic-based image retrieval. J China Univ Posts Telecommun 20(2):136–142
Andre-Obrecht R (1988) A new statistical approach for automatic segmentation of continuous speech signals. IEEE Trans Acoustic Speech Signal Process 36(1):29–40
Thornburg H (2005) Detection and modeling of transient audio signals with prior information. Ph.D. dissertation, Stanford Univ., Stanford
Ellis DPP, Lee K (2004) Minimal-impact audio-based personal archives. 1st ACM Workshop Continuous Archiving and Recording of Personal Experiences CARPE-04, New York
Lie Lu, Hanjalic A (2006) Audio elements based auditory scene segmentation. In: IEEE International Conference on Acoustic, Speech, Signal Process, Toulouse, France
Wichern G, Thornburg H, Mechtley B, Fink A, Tu K, Spanias A (2007) Robust multi-feature segmentation and indexing for natural sound environments. In: IEEE/EURASIP International Workshop Content- Based Multimedia Indexing, Bordeaux, France, pp 69–76
Jafer E, Mahdi AE (2003) Wavelet based voiced/unvoiced classification algorithm. EURASIP Conference focused on video/ image processing and multimedia communications, pp 667–672
Feki I, Ben Ammar A, Alimi AM (2012) New process to identify audio concepts based on binary classifiers encapsulation. Int J Comp Elect Eng 4(4):515–518
Feki I, Ben Ammar A, Alimi AM (2014) Query sound-by-example video retrieval framework. In: IEEE proceedings of International Conference on Hybrid Intelligent Systems, Kuwait, pp 297–302
Vasconcelos N (2004) On the efficient evaluation of probabilistic similarity functions for image retrieval. IEEE Trans Inform Theory 50(7):1482–1496
Helén M, Virtanen T (2007) Audio query by example of audio signals using Euclidean distance between Gaussian mixture models. IEEE International Conference on Audio, Speech and Signal Processing, Honolulu, USA, pp 225–228
Zhao J, Zhang Z, Han S, Qu C, Yuan Z, Zhang D (2011) SVM based forest fire detection using static and dynamic features. Comp Sci Inform Syst 8(3):821–841
Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice Hall, New Jersey
Weitao W, Yuehui J, Tan Y, Yidong C (2012) A video quality assessment method using subjective and objective mapping stategy. In: IEEE International Conference on Cloud Computing and Intelligent Systems, vol 2, Hangzhou, pp 514–518
Jadhav SM, Patil VS (2012) Review of significant researches on multimedia information retrieval. In: IEEE International Conference on Communication, Information and Computing Technology, Mumbai, pp 1–6
Acknowledgments
The authors would like to acknowledge the financial support of this work by grants from the General Direction of Scientific Research and Technological Renovation (DGRSRT), Tunisia, under the ARUB program 01/UR/11/02.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Feki, I., Ben Ammar, A. & Alimi, A.M. Automatic environmental sound concepts discovery for video retrieval. Int J Multimed Info Retr 5, 105–115 (2016). https://doi.org/10.1007/s13735-016-0096-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-016-0096-5