Automatic environmental sound concepts discovery for video retrieval

Feki, Issam; Ben Ammar, Anis; Alimi, Adel M.

doi:10.1007/s13735-016-0096-5

Automatic environmental sound concepts discovery for video retrieval

Regular Paper
Published: 14 March 2016

Volume 5, pages 105–115, (2016)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Issam Feki¹,
Anis Ben Ammar¹ &
Adel M. Alimi¹

169 Accesses
Explore all metrics

Abstract

This paper characterizes a new method for video–soundtrack retrieval based on environmental sounds. Actually, a set of 26 semantic audio concepts is employed. This set is chosen for its helpfulness to the users in terms of video browsing. Additionally, a set of 2000 videos has been annotated with these concepts. To enhance a new signal processing, we start with the separation of the audio sources. In addition, using a fundamental representation of the audio signal as a sequence of Mel Frequency Cepstral Coefficient, we can carry out experiments with three signal representations: the Support Vector machines, the Gaussian Mixture Model and the Hidden Markov Model. Throughout the experiment synthesis, we maintain the Gaussian Mixture Model classifier based on the Kullback–Leibler distance measure. As a matter of fact, we preserve this audio concept classification to integrate a video retrieval system. Hence, the obtained results mirror the effectiveness of our approaches in distinguishing environmental sound and researching video.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to Sound Scene and Event Analysis

Sound Sharing and Retrieval

Suggesting Sounds for Images from Video Collections

References

Saunders J, Lockheed Martin Co (1996) Real-time discrimination of broadcast speech/music. In: IEEE International Conference on Acoustic, Speech, Signal Process, Atlanta, pp 993–996
Williams G, Ellis, Daniel PW (1999) Speech/music discrimination based on posterior probability features. In: 6th European Conference on Speech Communication and Technology. Budapest
Scheirer E, Slaney M (1997) Construction and evaluation of a robust multifeature speech/music discriminator. In: IEEE International Conferences on Acoust, Speech, Signal Process, Munich, pp 1331–1334
Ajmera J, McCowan I, Bourlard H (2003) Speech/music segmentation using entropy and dynamism features in a HMM classification framework. Elsevier Speech Commun 40(3):351–363
Article Google Scholar
Zhang T, Kuo C-CJ (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 9(4):441–457 Fall
Article Google Scholar
Guo G, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. IEEE Trans Neural Netw 14(1):209–215
Article Google Scholar
Wold E, Blum T, Wheaton J (1996) Content-based classification, search and retrieval of audio. IEEE Trans Multimed 3(3):27–36
Article Google Scholar
Malkin R, Waibel A (2005) Classifying user environments for mobile applications using linear autoencoding of ambient audio. Proc IEEE Int Conf Acoustic Speech Signal Process 5:509–512
Google Scholar
Milner BL, Smith D (2006) Acoustic environment classification. ACM Trans Speech Lang Process 3(2):1–22
MathSciNet Google Scholar
Chu S, Narayanan S, Kuo C-CJ (2006) Content analysis for acoustic environment classification in mobile robots. In: International Conference on Aurally Informed Performance: Integrating Machine Listening and Auditory Presentation in Robotic System, Arlington, pp 16–21
Su F, Yang L, Lu T, Wang G (2011) Environmental sound classification for scene recognition using local discriminant bases and hmm. In: 19th ACM international conference on Multimedia, Newyork, pp 1389–1392
Okuyucu C, Sert M, Yazici A (2013) Audio feature and classifier analysis for efficient recognition of environmental sounds. IEEE International Symposium on Multimedia. Anaheim, pp 125–132
Xia-qing X, Quan-wei B, Lei H, Xu W (2013) Study and application of semantic-based image retrieval. J China Univ Posts Telecommun 20(2):136–142
Google Scholar
Andre-Obrecht R (1988) A new statistical approach for automatic segmentation of continuous speech signals. IEEE Trans Acoustic Speech Signal Process 36(1):29–40
Article Google Scholar
Thornburg H (2005) Detection and modeling of transient audio signals with prior information. Ph.D. dissertation, Stanford Univ., Stanford
Ellis DPP, Lee K (2004) Minimal-impact audio-based personal archives. 1st ACM Workshop Continuous Archiving and Recording of Personal Experiences CARPE-04, New York
Lie Lu, Hanjalic A (2006) Audio elements based auditory scene segmentation. In: IEEE International Conference on Acoustic, Speech, Signal Process, Toulouse, France
Wichern G, Thornburg H, Mechtley B, Fink A, Tu K, Spanias A (2007) Robust multi-feature segmentation and indexing for natural sound environments. In: IEEE/EURASIP International Workshop Content- Based Multimedia Indexing, Bordeaux, France, pp 69–76
Jafer E, Mahdi AE (2003) Wavelet based voiced/unvoiced classification algorithm. EURASIP Conference focused on video/ image processing and multimedia communications, pp 667–672
Feki I, Ben Ammar A, Alimi AM (2012) New process to identify audio concepts based on binary classifiers encapsulation. Int J Comp Elect Eng 4(4):515–518
Article Google Scholar
Feki I, Ben Ammar A, Alimi AM (2014) Query sound-by-example video retrieval framework. In: IEEE proceedings of International Conference on Hybrid Intelligent Systems, Kuwait, pp 297–302
Vasconcelos N (2004) On the efficient evaluation of probabilistic similarity functions for image retrieval. IEEE Trans Inform Theory 50(7):1482–1496
Article MathSciNet MATH Google Scholar
Helén M, Virtanen T (2007) Audio query by example of audio signals using Euclidean distance between Gaussian mixture models. IEEE International Conference on Audio, Speech and Signal Processing, Honolulu, USA, pp 225–228
Zhao J, Zhang Z, Han S, Qu C, Yuan Z, Zhang D (2011) SVM based forest fire detection using static and dynamic features. Comp Sci Inform Syst 8(3):821–841
Article Google Scholar
Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice Hall, New Jersey
MATH Google Scholar
Weitao W, Yuehui J, Tan Y, Yidong C (2012) A video quality assessment method using subjective and objective mapping stategy. In: IEEE International Conference on Cloud Computing and Intelligent Systems, vol 2, Hangzhou, pp 514–518
Jadhav SM, Patil VS (2012) Review of significant researches on multimedia information retrieval. In: IEEE International Conference on Communication, Information and Computing Technology, Mumbai, pp 1–6

Download references

Acknowledgments

The authors would like to acknowledge the financial support of this work by grants from the General Direction of Scientific Research and Technological Renovation (DGRSRT), Tunisia, under the ARUB program 01/UR/11/02.

Author information

Authors and Affiliations

REGIM: Research Group on Intelligent Machines, University of Sfax, ENIS, BP 1173, 3038, Sfax, Tunisia
Issam Feki, Anis Ben Ammar & Adel M. Alimi

Authors

Issam Feki
View author publications
You can also search for this author in PubMed Google Scholar
Anis Ben Ammar
View author publications
You can also search for this author in PubMed Google Scholar
Adel M. Alimi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Issam Feki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feki, I., Ben Ammar, A. & Alimi, A.M. Automatic environmental sound concepts discovery for video retrieval. Int J Multimed Info Retr 5, 105–115 (2016). https://doi.org/10.1007/s13735-016-0096-5

Download citation

Received: 12 February 2016
Revised: 27 February 2016
Accepted: 01 March 2016
Published: 14 March 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s13735-016-0096-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic environmental sound concepts discovery for video retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Introduction to Sound Scene and Event Analysis

Sound Sharing and Retrieval

Suggesting Sounds for Images from Video Collections

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Automatic environmental sound concepts discovery for video retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Introduction to Sound Scene and Event Analysis

Sound Sharing and Retrieval

Suggesting Sounds for Images from Video Collections

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation