Skip to main content

Semantic Concept Annotation of Consumer Videos at Frame-Level Using Audio

  • Conference paper
Advances in Multimedia Information Processing – PCM 2014 (PCM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8879))

Included in the following conference series:

Abstract

With the increasing use of audio sensors in user generated content (UGC) collection, semantic concept annotation using audio streams has become an important research problem. Huawei initiates a grand challenge in the International Conference on Multimedia & Expo (ICME) 2014: Huawei Accurate and Fast Mobile Video Annotation Challenge. In this paper, we present our semantic concept annotation system using audio stream only for the Huawei challenge. The system extracts audio stream from the video data and low-level acoustic features from the audio stream. Bag-of-feature representation is generated based on the low-level features and is used as input feature to train the support vector machine (SVM) concept classifier. The experimental results show that our audio-only concept annotation system can detect semantic concepts significantly better than random guess. It can also provide important complementary information to the visual-based concept annotation system for performance boost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Snoek, C., Worring, M.: Concept-based Video Retrieval. Foundations and Trends in Information Retrieval (2009)

    Google Scholar 

  2. Chang, S.F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A.C., Luo, J.: Large-Scale Multimodal Semantic Concept Detection for Consumer Video. In: International Workshop on Multimedia Information Retrieval (MIR) (2007)

    Google Scholar 

  3. Naphade, M.R., Smith, J.R., Tesic, J., Chang, S.F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: Large-Scale Concept Ontology for Multimedia. IEEE Journal MultiMedia 13(3) (2006)

    Google Scholar 

  4. Over, P., Awad, G., Michel, M., Fiscus, J., Sanders, G., Kraaij, W., Smeaton, A.F., Quéenot, G.: TRECVID 2013 – An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics. In: Proceedings of TRECVID. NIST, USA (2013), http://www-nlpir.nist.gov/projects/tvpubs/tv13.papers/tv13overview.pdf

  5. Lee, K., Ellis, D.P.W.: Audio-Based Semantic Concept Classificationfor Consumer Video. IEEE Transactions on Audio, Speech, and Language Processing 18(6) (2010)

    Google Scholar 

  6. Atrey, P.K., Kankanhalli, M.S., Jain, R.: Information Assimilation Framework for Event Detection in Multimedia Surveillance Systems. In: Multimedia Systems, pp. 239–253 (2006)

    Google Scholar 

  7. Kolekar, M.H., Sengupta, S.: Semantic concept extraction from sports video for highlight generation. In: International Conference on Mobile Multimedia Communications (MobiMedia) (2006)

    Google Scholar 

  8. Luo, H., Fan, J.: Building Concept Ontology for Medical Video Annotation. In: ACM Multimedia (2006)

    Google Scholar 

  9. ICEM 2014 Huawei Accurate and Fast Mobile Video Annotation Challenge, http://www.icme2014.org/huawei-accurate-and-fast-mobile-video-annotation-challenge

  10. Wold, E., Blum, T., Keislar, D., Wheaten, J.: Content-based Classification, Search, and Retrieval of Audio. IEEE Multimedia 3(3) (1996)

    Google Scholar 

  11. Saunders, J.: Real-time Discrimination of Broadcast Speech/Music. In: ICASSP (1996)

    Google Scholar 

  12. Scheirer, E., Slaney, M.: Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator. In: ICASSP (1997)

    Google Scholar 

  13. Williams, G., Ellis, D.P.W.: Speech/Music Discrimination Based on Posterior Probability Features. In: Eurospeech (1999)

    Google Scholar 

  14. Ma, L., Milner, B., Smith, D.: Acoustic Environment Classification. ACM Transactions on Speech and Language Processing 3(2) (2006)

    Google Scholar 

  15. Eronen, A., Peltonen, V., Tuomi, J., Klapuri, A., Fagerlund, S., Sorsa, T., Lorho, G., Huopaniemi, J.: Audio-based Context Recognition. IEEE Trans. on Audio, Speech, and Language Processing 14(1) (2006)

    Google Scholar 

  16. Brown, L., et al.: IBM Research and Columbia University TRECVID-2013 Multimedia Event Detection (MED), Multimedia Event Recounting (MER), Surveillance Event Detection (SED), and Semantic Indexing (SIN) Systems. In: TRECVID Workshop (2013)

    Google Scholar 

  17. Jin, Q., Schulam, F., Rawat, S., Burger, S., Ding, D., Metze, F.: Categorizing Consumer Videos Using Audio. In: Interspeech (2012)

    Google Scholar 

  18. Xue, X.B., Zhou, Z.H.: Distributional Features for Text Categorization. IEEE Transactions on Knowledge and Data Engineering 21(3) (2008)

    Google Scholar 

  19. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR 2007 (2007)

    Google Scholar 

  20. Li, X., Snoek, C., Worring, M., Koelma, D., Smeulders, A.: Bootstrapping Visual Categorization With Relevant Negatives. IEEE Transactions on Multimedia 15(4) (2013)

    Google Scholar 

  21. Maji, S., Berg, A., Malik, J.: Classification using international kernel support vector machines is efficient. In: CVPR 2008 (2008)

    Google Scholar 

  22. Zha, Z.-J., Wang, M., Zheng, Y.-T., Yang, Y., Hong, R., Chua, T.-S.: Interactive Video Indexing with Statistical Active Learning. IEEE Transactions on Multimedia 14(1), 17–27 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Liang, J., Jin, Q., He, X., Yang, G., Xu, J., Li, X. (2014). Semantic Concept Annotation of Consumer Videos at Frame-Level Using Audio. In: Ooi, W.T., Snoek, C.G.M., Tan, H.K., Ho, CK., Huet, B., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2014. PCM 2014. Lecture Notes in Computer Science, vol 8879. Springer, Cham. https://doi.org/10.1007/978-3-319-13168-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13168-9_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13167-2

  • Online ISBN: 978-3-319-13168-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics