Skip to main content
Log in

Automatic environmental sound concepts discovery for video retrieval

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

This paper characterizes a new method for video–soundtrack retrieval based on environmental sounds. Actually, a set of 26 semantic audio concepts is employed. This set is chosen for its helpfulness to the users in terms of video browsing. Additionally, a set of 2000 videos has been annotated with these concepts. To enhance a new signal processing, we start with the separation of the audio sources. In addition, using a fundamental representation of the audio signal as a sequence of Mel Frequency Cepstral Coefficient, we can carry out experiments with three signal representations: the Support Vector machines, the Gaussian Mixture Model and the Hidden Markov Model. Throughout the experiment synthesis, we maintain the Gaussian Mixture Model classifier based on the Kullback–Leibler distance measure. As a matter of fact, we preserve this audio concept classification to integrate a video retrieval system. Hence, the obtained results mirror the effectiveness of our approaches in distinguishing environmental sound and researching video.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Saunders J, Lockheed Martin Co (1996) Real-time discrimination of broadcast speech/music. In: IEEE International Conference on Acoustic, Speech, Signal Process, Atlanta, pp 993–996

  2. Williams G, Ellis, Daniel PW (1999) Speech/music discrimination based on posterior probability features. In: 6th European Conference on Speech Communication and Technology. Budapest

  3. Scheirer E, Slaney M (1997) Construction and evaluation of a robust multifeature speech/music discriminator. In: IEEE International Conferences on Acoust, Speech, Signal Process, Munich, pp 1331–1334

  4. Ajmera J, McCowan I, Bourlard H (2003) Speech/music segmentation using entropy and dynamism features in a HMM classification framework. Elsevier Speech Commun 40(3):351–363

    Article  Google Scholar 

  5. Zhang T, Kuo C-CJ (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 9(4):441–457 Fall

    Article  Google Scholar 

  6. Guo G, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. IEEE Trans Neural Netw 14(1):209–215

    Article  Google Scholar 

  7. Wold E, Blum T, Wheaton J (1996) Content-based classification, search and retrieval of audio. IEEE Trans Multimed 3(3):27–36

    Article  Google Scholar 

  8. Malkin R, Waibel A (2005) Classifying user environments for mobile applications using linear autoencoding of ambient audio. Proc IEEE Int Conf Acoustic Speech Signal Process 5:509–512

    Google Scholar 

  9. Milner BL, Smith D (2006) Acoustic environment classification. ACM Trans Speech Lang Process 3(2):1–22

    MathSciNet  Google Scholar 

  10. Chu S, Narayanan S, Kuo C-CJ (2006) Content analysis for acoustic environment classification in mobile robots. In: International Conference on Aurally Informed Performance: Integrating Machine Listening and Auditory Presentation in Robotic System, Arlington, pp 16–21

  11. Su F, Yang L, Lu T, Wang G (2011) Environmental sound classification for scene recognition using local discriminant bases and hmm. In: 19th ACM international conference on Multimedia, Newyork, pp 1389–1392

  12. Okuyucu C, Sert M, Yazici A (2013) Audio feature and classifier analysis for efficient recognition of environmental sounds. IEEE International Symposium on Multimedia. Anaheim, pp 125–132

  13. Xia-qing X, Quan-wei B, Lei H, Xu W (2013) Study and application of semantic-based image retrieval. J China Univ Posts Telecommun 20(2):136–142

    Google Scholar 

  14. Andre-Obrecht R (1988) A new statistical approach for automatic segmentation of continuous speech signals. IEEE Trans Acoustic Speech Signal Process 36(1):29–40

    Article  Google Scholar 

  15. Thornburg H (2005) Detection and modeling of transient audio signals with prior information. Ph.D. dissertation, Stanford Univ., Stanford

  16. Ellis DPP, Lee K (2004) Minimal-impact audio-based personal archives. 1st ACM Workshop Continuous Archiving and Recording of Personal Experiences CARPE-04, New York

  17. Lie Lu, Hanjalic A (2006) Audio elements based auditory scene segmentation. In: IEEE International Conference on Acoustic, Speech, Signal Process, Toulouse, France

  18. Wichern G, Thornburg H, Mechtley B, Fink A, Tu K, Spanias A (2007) Robust multi-feature segmentation and indexing for natural sound environments. In: IEEE/EURASIP International Workshop Content- Based Multimedia Indexing, Bordeaux, France, pp 69–76

  19. Jafer E, Mahdi AE (2003) Wavelet based voiced/unvoiced classification algorithm. EURASIP Conference focused on video/ image processing and multimedia communications, pp 667–672

  20. Feki I, Ben Ammar A, Alimi AM (2012) New process to identify audio concepts based on binary classifiers encapsulation. Int J Comp Elect Eng 4(4):515–518

    Article  Google Scholar 

  21. Feki I, Ben Ammar A, Alimi AM (2014) Query sound-by-example video retrieval framework. In: IEEE proceedings of International Conference on Hybrid Intelligent Systems, Kuwait, pp 297–302

  22. Vasconcelos N (2004) On the efficient evaluation of probabilistic similarity functions for image retrieval. IEEE Trans Inform Theory 50(7):1482–1496

    Article  MathSciNet  MATH  Google Scholar 

  23. Helén M, Virtanen T (2007) Audio query by example of audio signals using Euclidean distance between Gaussian mixture models. IEEE International Conference on Audio, Speech and Signal Processing, Honolulu, USA, pp 225–228

  24. Zhao J, Zhang Z, Han S, Qu C, Yuan Z, Zhang D (2011) SVM based forest fire detection using static and dynamic features. Comp Sci Inform Syst 8(3):821–841

    Article  Google Scholar 

  25. Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice Hall, New Jersey

    MATH  Google Scholar 

  26. Weitao W, Yuehui J, Tan Y, Yidong C (2012) A video quality assessment method using subjective and objective mapping stategy. In: IEEE International Conference on Cloud Computing and Intelligent Systems, vol 2, Hangzhou, pp 514–518

  27. Jadhav SM, Patil VS (2012) Review of significant researches on multimedia information retrieval. In: IEEE International Conference on Communication, Information and Computing Technology, Mumbai, pp 1–6

Download references

Acknowledgments

The authors would like to acknowledge the financial support of this work by grants from the General Direction of Scientific Research and Technological Renovation (DGRSRT), Tunisia, under the ARUB program 01/UR/11/02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Issam Feki.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feki, I., Ben Ammar, A. & Alimi, A.M. Automatic environmental sound concepts discovery for video retrieval. Int J Multimed Info Retr 5, 105–115 (2016). https://doi.org/10.1007/s13735-016-0096-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-016-0096-5

Keywords

Navigation