Abstract
The automatic detection of semantic concepts is a key technology for enabling efficient and effective video content management. Conventional techniques for semantic concept detection in video content still suffer from several interrelated issues: the semantic gap, the imbalanced data set problem, and a limited concept vocabulary size. In this paper, we propose to perform semantic concept detection for user-created video content using an image folksonomy in order to overcome the aforementioned problems. First, an image folksonomy contains a vast amount of user-contributed images. Second, a significant portion of these images has been manually annotated by users using a wide variety of tags. However, user-supplied annotations in an image folksonomy are often characterized by a high level of noise. Therefore, we also discuss a method that allows reducing the number of noisy tags in an image folksonomy. This tag refinement method makes use of tag co-occurrence statistics. To verify the effectiveness of the proposed video content annotation system, experiments were performed with user-created image and video content available on a number of social media applications. For the datasets used, video annotation with tag refinement has an average recall rate of 84% and an average precision of 75%, while video annotation without tag refinement shows an average recall rate of 78% and an average precision of 62%.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
YouTube, http://www.youtube.com/
7 things you should know about YouTube (2006), http://www.educause.edu/ELI/7ThingsYouShouldKnowAboutYouTu/156821
Ireland, G., Ward, L.: Transcoding Internet and Mobile Video: Solutions for the Long Tail. In: IDC (2007)
Ames, M., Naaman, M.: Why We Tag: Motivations for Annotation in Mobile and Online Media. In: ACM CHI 2007, pp. 971–980 (2007)
Wang, M., Hua, X.-S., Hong, R., Tang, J., Qi, G.-J., Song, Y.: Unified Video Annotation via Multi-Graph Learning. IEEE Trans. on Circuits and Systems for Video Technology 19(5) (2009)
Wang, M., Xian-Sheng, H., Tang, J., Richang, H.: Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation. IEEE Trans. on Multimedia 11(3) (2009)
Yang, J., Hauptmann, A., Yan, R.: Cross-Domain Video Concept Detection Using Adaptive SVMs. In: Proceedings of ACM Multimedia, pp. 188–197 (2007)
Chen, M., Chen, S., Shyu, M., Wickramaratna, K.: Semantic event detection via multimodal data mining. IEEE Signal Processing Magazine, Special Issue on Semantic Retrieval of Multimedia 23(2), 38–46 (2006)
Xie, Z., Shyu, M., Chen, S.: Video Event Detection with Combined Distance-based and Rule-based Data Mining Techniques. In: IEEE International Conference on Multimedia & Expo. 2007, pp. 2026–2029 (2007)
Jin, S.H., Ro, Y.M.: Video Event Filtering in Consumer Domain. IEEE Trans. on Broadcasting 53(4), 755–762 (2007)
Bae, T.M., Kim, C.S., Jin, S.H., Kim, K.H., Ro, Y.M.: Semantic event detection in structured video using hybrid HMM/SVM. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 113–122. Springer, Heidelberg (2005)
Wang, F., Jiang, Y., Ngo, C.: Video Event Detection Using Motion Relativity and Visual Relatedness. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 239–248 (2008)
Jain, M., Vempati, S., Pulla, C., Jawahar, C.V.: Example Based Video Filters. In: ACM International Conference on Image and Video Retrieval (2009)
Ramakrishnan, R., Tomkins, A.: Toward a People Web. IEEE Computer 40(8), 63–72 (2007)
Al-Khalifa, H.S., Davis, H.C.: Measuring the Semantic Value of Folksonomies. Innovations in Information Technology, 1–5 (2006)
Lu, Y., Tian, Q., Zhang, L., Ma, W.: What Are the High-Level Concepts with Small Semantic Gaps? In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2008)
Xirong, L., Snoek, C.G.M., Worring, M.: Learning Tag Relevance by Neighbor Voting for Social Image Retrieval. In: Proceeding of the 1st ACM International Conference on Multimedia Information Retrieval, pp. 180–187 (2007)
Min, H., Jin, S.H., Lee, Y.B., Ro, Y.M.: Contents Authoring System for Efficient Consumption on Portable Multimedia Device. In: Proceedings of SPIE Electron. Imag. Internet Imag. (2008)
Yang, S., Kim, S.K., Ro, Y.M.: Semantic Home Photo Categorization. IEEE Trans. on Circuits and Systems for Video Technology 17(3), 324–335 (2007)
Ro, Y.M., Kang, H.K.: Hierarchical rotational invariant similarity measurement for MPEG-7 homogeneous texture descriptor. Electron. Lett. 36(15), 1268–1270 (2000)
Manjunath, B.S., et al.: Introduction to MPEG-7. Wiley, New York (2002)
Huiskes, M.J., Lew, M.S.: The MIR Flickr Retrieval Evaluation. In: ACM International Conference on Multimedia Information Retrieval (MIR 2008), Vancouver, Canada (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Min, Hs., Lee, S., De Neve, W., Ro, Y.M. (2010). Semantic Concept Detection for User-Generated Video Content Using a Refined Image Folksonomy. In: Boll, S., Tian, Q., Zhang, L., Zhang, Z., Chen, YP.P. (eds) Advances in Multimedia Modeling. MMM 2010. Lecture Notes in Computer Science, vol 5916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11301-7_51
Download citation
DOI: https://doi.org/10.1007/978-3-642-11301-7_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11300-0
Online ISBN: 978-3-642-11301-7
eBook Packages: Computer ScienceComputer Science (R0)