short-paper

Semantic Concept Annotation For User Generated Videos Using Soundtracks

Authors:

Xirong LiAuthors Info & Claims

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Pages 599 - 602

https://doi.org/10.1145/2671188.2749388

Published: 22 June 2015 Publication History

Abstract

With the increasing use of audio sensors in user generated content (UGC) collections, semantic concept annotation from video soundtracks has become an important research problem. In this paper, we investigate reducing the semantic gap of the traditional data-driven bag-of-audio-words based audio annotation approach by utilizing the large-amount of wild audio data and their rich user tags, from which we propose a new feature representation based on semantic class model distance. We conduct experiments on the data collection from HUAWEI Accurate and Fast Mobile Video Annotation Grand Challenge 2014. We also fuse the audio-only annotation system with a visual-only system. The experimental results show that our audio-only concept annotation system can detect semantic concepts significantly better than does random guessing. The new feature representation achieves comparable annotation performance with the bag-of-audio-words feature. In addition, it can provide more semantic interpretation in the output. The experimental results also prove that the audio-only system can provide significant complementary information to the visual-only concept annotation system for performance boost and for better interpretation of semantic concepts both visually and acoustically.

References

[1]

C. Snoek, and M. Worring: Concept-based Video Retrieval. Foundations and Trends in Information Retrieval, 2009.

Digital Library

[2]

P. Over, et al.: TRECVID 2013 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics. TRECVID 2013, USA.

[3]

J. Saunders: Real-time discrimination of broadcast speech/music. ICASSP, 1996.

Digital Library

[4]

G. Williams and D. P. W. Ellis: Speech/music discrimination based on posterior probability features. Eurospeech, Budapest, 1999.

[5]

T. Zhang and C.-C. J. Kuo: Audio content analysis for online audiovisual data segmentation and classification. IEEE Tr. Speech and Audio Proc., vol. 9, no. 4, pp. 441--457, 2001.

[6]

J. Ajmera, I. McCowan, H. Bourlard: Speech/music segmentation using entropy and dynamism features in a hmm classification framework. Speech Communication, (40), 351--363, 2003.

Digital Library

[7]

K. Lee and D. P. W. Ellis: Detecting music in ambient audio by long window autocorrelation. ICASSP, 2008.

[8]

D. P. W. Ellis and K. Lee: Minimal-impact audio-based personal archives. ACM Workshop on Continuous Archival and Retrieval of Personal Experiences, New York, NY, October 2004.

Digital Library

[9]

S. Chu, S. Narayanan, and C.-C. J. Kuo: Content analysis for acoustic environment classification in mobile robots. AAAI Fall Symposium, Aurally Informed Performance: Integrating Machine Listening and Auditory Presentation in Robotic Systems, 2006.

[10]

A. Eronen, et al.: Audio-based context recognition. IEEE TASLP, (14), no. 1, pp. 321--329, Jan. 2006.

Digital Library

[11]

E. Wold, et al.: Content-based Classification, Search, and Retrieval of Audio, IEEE Multimedia, 3(3), 1996.

Digital Library

[12]

K. Lee and D. P. W. Ellis: Audio-Based Semantic Concept Classification for Consumer Video, IEEE TASLP, 18(6), 2010.

Digital Library

[13]

Q. Jin, P. Schulam, S. Rawat, S. Burger, D. Ding, F. Metze, "Categorizing Consumer Videos Using Audio," Interspeech, 2012.

[14]

L. Ma, B. Milner, and D. Smith: Acoustic Environment Classification. ACM Trans. on Speech and Language Processing, 3(2), 2006.

Digital Library

[15]

L. Brown, et al.: IBM Research and Columbia University TRECVID-2013. In: TRECVID Workshop, 2013.

[16]

ICME 2014 Huawei Accurate and Fast Mobile Video Annotation Challenge http://www.icme2014.org/huawei-accurate-and-fast-mobile-video-annotation-challenge.

[17]

X. B. Xue, Z. H. Zhou: Distributional Features for Text Categorization. IEEE Transactions on Knowledge and Data Engineering, 21(3), 2008.

Digital Library

[18]

J. Philbin, et al.: Object retrieval with large vocabularies and fast spatial matching. CVPR 2007.

[19]

J. Liang, et al.: Semantic Concept Annotation of Consumer Videos at Frame-level Using Audio. Pacific-Rim Conference on Multimedia (PCM), 2014.

Digital Library

[20]

Freesound data repository: http://www.freesound.org

[21]

X. Li, C. Snoek, M. Worring, D. Koelma, A. Smeulders: Bootstrapping Visual Categorization With Relevant Negatives. IEEE Transactions on Multimedia, 15(4), 2013.

Digital Library

[22]

S. Maji, A. Berg, J. Malik: Classification using international kernel support vector machines is efficient. In: CVPR 2008.

[23]

X. Li, C. Snoek, M. Worring, A. Smeulders, "Fusing concept detection and geo context for visual search", ICMR 2012.

Digital Library

Index Terms

Semantic Concept Annotation For User Generated Videos Using Soundtracks
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Communication hardware, interfaces and storage
    1. Signal processing systems

Recommendations

Semantic Concept Annotation of Consumer Videos at Frame-Level Using Audio
Proceedings of the 15th Pacific-Rim Conference on Advances in Multimedia Information Processing --- PCM 2014 - Volume 8879

With the increasing use of audio sensors in user generated content UGC collection, semantic concept annotation using audio streams has become an important research problem. Huawei initiates a grand challenge in the International Conference on Multimedia ...
Intelligent browsing of concert videos
MM '07: Proceedings of the 15th ACM international conference on Multimedia

The MultimediaN concert-video browser demonstrates a video interaction environment for efficiently browsing the live performance records in the pop, rock and other music concerts. The underlying automatic analyzer extracts the instrumental solos and ...
Semantic concept model using Wikipedia semantic features

Wikipedia has become a high coverage knowledge source which has been used in many research areas such as natural language processing, text mining and information retrieval. Several methods have been introduced for extracting explicit or implicit ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

June 2015

700 pages

ISBN:9781450332743

DOI:10.1145/2671188

General Chairs:
Alex Hauptmann
Carnegie Mellon University, USA
,
Chong-Wah Ngo
City University of Hong Kong, China
,
Xiangyang Xue
Fudan University, China
,
Program Chairs:
Yu-Gang Jiang
Fudan University, China
,
Cees Snoek
University of Amsterdam and Qualcomm Research Netherlands
,
Nuno Vasconcelos
University of California, San Diego, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

SRFDP
The Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China
Beijing Natural Science Foundation
National Science Fundation of China

Conference

ICMR '15

Sponsor:

SIGMM

ICMR '15: International Conference on Multimedia Retrieval

June 23 - 26, 2015

Shanghai, China

Acceptance Rates

ICMR '15 Paper Acceptance Rate 48 of 127 submissions, 38%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
115
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten