Semantic Concept Detection for User-Generated Video Content Using a Refined Image Folksonomy

Min, Hyun-seok; Lee, Sihyoung; De Neve, Wesley; Ro, Yong Man

doi:10.1007/978-3-642-11301-7_51

Hyun-seok Min²¹,
Sihyoung Lee²¹,
Wesley De Neve²¹ &
…
Yong Man Ro²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5916))

Included in the following conference series:

International Conference on Multimedia Modeling

Abstract

The automatic detection of semantic concepts is a key technology for enabling efficient and effective video content management. Conventional techniques for semantic concept detection in video content still suffer from several interrelated issues: the semantic gap, the imbalanced data set problem, and a limited concept vocabulary size. In this paper, we propose to perform semantic concept detection for user-created video content using an image folksonomy in order to overcome the aforementioned problems. First, an image folksonomy contains a vast amount of user-contributed images. Second, a significant portion of these images has been manually annotated by users using a wide variety of tags. However, user-supplied annotations in an image folksonomy are often characterized by a high level of noise. Therefore, we also discuss a method that allows reducing the number of noisy tags in an image folksonomy. This tag refinement method makes use of tag co-occurrence statistics. To verify the effectiveness of the proposed video content annotation system, experiments were performed with user-created image and video content available on a number of social media applications. For the datasets used, video annotation with tag refinement has an average recall rate of 84% and an average precision of 75%, while video annotation without tag refinement shows an average recall rate of 78% and an average precision of 62%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Weakly Supervised Learning of Heterogeneous Concepts in Videos

SemVidRec: A Semantic Approach to Annotations Driven Video Recommendation Model Incorporating Machine Intelligence

Faceted Navigation for Browsing Large Video Collection

References

YouTube, http://www.youtube.com/
7 things you should know about YouTube (2006), http://www.educause.edu/ELI/7ThingsYouShouldKnowAboutYouTu/156821
Ireland, G., Ward, L.: Transcoding Internet and Mobile Video: Solutions for the Long Tail. In: IDC (2007)
Google Scholar
Ames, M., Naaman, M.: Why We Tag: Motivations for Annotation in Mobile and Online Media. In: ACM CHI 2007, pp. 971–980 (2007)
Google Scholar
Wang, M., Hua, X.-S., Hong, R., Tang, J., Qi, G.-J., Song, Y.: Unified Video Annotation via Multi-Graph Learning. IEEE Trans. on Circuits and Systems for Video Technology 19(5) (2009)
Google Scholar
Wang, M., Xian-Sheng, H., Tang, J., Richang, H.: Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation. IEEE Trans. on Multimedia 11(3) (2009)
Google Scholar
Yang, J., Hauptmann, A., Yan, R.: Cross-Domain Video Concept Detection Using Adaptive SVMs. In: Proceedings of ACM Multimedia, pp. 188–197 (2007)
Google Scholar
Chen, M., Chen, S., Shyu, M., Wickramaratna, K.: Semantic event detection via multimodal data mining. IEEE Signal Processing Magazine, Special Issue on Semantic Retrieval of Multimedia 23(2), 38–46 (2006)
Google Scholar
Xie, Z., Shyu, M., Chen, S.: Video Event Detection with Combined Distance-based and Rule-based Data Mining Techniques. In: IEEE International Conference on Multimedia & Expo. 2007, pp. 2026–2029 (2007)
Google Scholar
Jin, S.H., Ro, Y.M.: Video Event Filtering in Consumer Domain. IEEE Trans. on Broadcasting 53(4), 755–762 (2007)
Article Google Scholar
Bae, T.M., Kim, C.S., Jin, S.H., Kim, K.H., Ro, Y.M.: Semantic event detection in structured video using hybrid HMM/SVM. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 113–122. Springer, Heidelberg (2005)
Google Scholar
Wang, F., Jiang, Y., Ngo, C.: Video Event Detection Using Motion Relativity and Visual Relatedness. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 239–248 (2008)
Google Scholar
Jain, M., Vempati, S., Pulla, C., Jawahar, C.V.: Example Based Video Filters. In: ACM International Conference on Image and Video Retrieval (2009)
Google Scholar
Ramakrishnan, R., Tomkins, A.: Toward a People Web. IEEE Computer 40(8), 63–72 (2007)
Google Scholar
Al-Khalifa, H.S., Davis, H.C.: Measuring the Semantic Value of Folksonomies. Innovations in Information Technology, 1–5 (2006)
Google Scholar
Lu, Y., Tian, Q., Zhang, L., Ma, W.: What Are the High-Level Concepts with Small Semantic Gaps? In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)
Google Scholar
Xirong, L., Snoek, C.G.M., Worring, M.: Learning Tag Relevance by Neighbor Voting for Social Image Retrieval. In: Proceeding of the 1st ACM International Conference on Multimedia Information Retrieval, pp. 180–187 (2007)
Google Scholar
Min, H., Jin, S.H., Lee, Y.B., Ro, Y.M.: Contents Authoring System for Efficient Consumption on Portable Multimedia Device. In: Proceedings of SPIE Electron. Imag. Internet Imag. (2008)
Google Scholar
Yang, S., Kim, S.K., Ro, Y.M.: Semantic Home Photo Categorization. IEEE Trans. on Circuits and Systems for Video Technology 17(3), 324–335 (2007)
Article Google Scholar
Ro, Y.M., Kang, H.K.: Hierarchical rotational invariant similarity measurement for MPEG-7 homogeneous texture descriptor. Electron. Lett. 36(15), 1268–1270 (2000)
Article Google Scholar
Manjunath, B.S., et al.: Introduction to MPEG-7. Wiley, New York (2002)
Google Scholar
Huiskes, M.J., Lew, M.S.: The MIR Flickr Retrieval Evaluation. In: ACM International Conference on Multimedia Information Retrieval (MIR 2008), Vancouver, Canada (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Image and Video Systems Lab, Korea Advanced Institute of Science and Technology (KAIST), Yuseong-gu, Daejeon, 305-732, Republic of Korea
Hyun-seok Min, Sihyoung Lee, Wesley De Neve & Yong Man Ro

Authors

Hyun-seok Min
View author publications
You can also search for this author in PubMed Google Scholar
Sihyoung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Wesley De Neve
View author publications
You can also search for this author in PubMed Google Scholar
Yong Man Ro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Oldenburg, Germany
Susanne Boll
University of Texas at San Antonio,, TX, San Antonio, USA
Qi Tian
Microsoft Research Asia, Beijing, P.R. China
Lei Zhang
Southwest University, Beibei, Chongqing, China
Zili Zhang
School of Engineering and Information Technology, Deakin University, 221 Burwood Highway, Vic, 3125, Australia
Yi-Ping Phoebe Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Min, Hs., Lee, S., De Neve, W., Ro, Y.M. (2010). Semantic Concept Detection for User-Generated Video Content Using a Refined Image Folksonomy. In: Boll, S., Tian, Q., Zhang, L., Zhang, Z., Chen, YP.P. (eds) Advances in Multimedia Modeling. MMM 2010. Lecture Notes in Computer Science, vol 5916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11301-7_51

Download citation

DOI: https://doi.org/10.1007/978-3-642-11301-7_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11300-0
Online ISBN: 978-3-642-11301-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics