Tag refinement of micro-videos by learning from multiple data sources

Huang, Lei; Luo, Bin

doi:10.1007/s11042-017-4781-z

Tag refinement of micro-videos by learning from multiple data sources

Published: 25 May 2017

Volume 76, pages 20341–20358, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

353 Accesses
10 Citations
Explore all metrics

Abstract

Micro-video is an increasingly prevalent social media form, which attracts much attention for its convenient acquisition and expressive ability. However, for the user-generated hashtags of micro-videos have seriously unbalanced distribution and low quality, the management of micro-videos becomes challenging. In this paper, we propose a novel tag refinement approach for micro-videos by learning from multiple public data sources with manually labelled tags, which can overcome the difficulty of directly refining the imprecise hashtags and address the problem of lacking manually labelled micro-video datasets for training. We define a set of target tags by referring to the widely used datasets for object, activity and scene detection. In tag refinement, we firstly transfer the tags from the images in NUS-WIDE to the micro-video keyframes by similarity measurement. Meanwhile, we complete the tags by detecting the objects, activities and scenes in micro-videos based on appearance features and motion features with the assistance of the datasets, namely, ImageNet, PASCAL VOC, HMDB51, UCF50 and SUN. We also denoise the hashtags by constructing the mapping relationships among hashtags and target tags based on the statistics on NUS-WIDE. The results of tag transfer, complement and denoising are finally linearly combined to generate the tag refinement results of micro-videos. To validate the performance, we construct a dataset with 600 micro-videos from Vine, and manually labelled the micro-videos with target tags. The experimental results show that our approach can obtain good performance in tag refinement of micro-videos by learning from multiple data sources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DUT-WEBV: A Benchmark Dataset for Performance Evaluation of Tag Localization for Web Video

On the tag localization of web video

Article 07 August 2014

Recognizing key segments of videos for video annotation by learning from web image sets

Article 01 February 2016

References

Bao BK, Zhu G, Shen J, Yan S (2013) Robust image analysis with sparse representation on quantized visual features. IEEE Trans Image Process 22(3):860–871
Article MathSciNet Google Scholar
Chen J (2016) Multi-modal learning: Study on a large-scale micro-video data collection. In: ACM International Conference on Multimedia, pp 1454–1458
Chen J, Song X, Nie L, Wang X, Zhang H, Chua TS (2016) Micro tells macro: predicting the popularity of micro-videos via a transductive model. In: ACM International Conference on Multimedia, pp 898–907
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: ACM International Conference on Image and Video Retrieval, p 48
Chen L, Xu D, Tsang IWH, Luo J (2010) Tag-based web photo retrieval improved by batch mode re-tagging. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3440–3446
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Fellbaum C (1998) Wordnet Wiley Online Library
Gao L, Song J, Nie F, Yan Y, Sebe N, Tao Shen H (2015) Optimal graph learning with partial tags and multiple features for image and video annotation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 4371–4379
Gao K, Zhang Y, Luo P, Zhang W, Xia J, Lin S (2012) Visual stem mapping and geometric tense coding for augmented visual vocabulary. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3234–3241
Gao Z, Zhang H, Xu G, Xue Y (2015) Multi-perspective and multi-modality joint representation and recognition model for 3d action recognition. Neurocomputing 151:554–564
Article Google Scholar
Gao K, Zhang Y, Zhang W, Lin S (2011) Mining concise and distinctive affine-stable features for object detection in large corpus. Int J Comput Math 88(18):3953–3962
Article Google Scholar
Gao K, Zhang Y, Zhang D, Lin S (2013) Accurate off-line query expansion for large-scale mobile visual search. Signal Process 93(8):2305–2315
Article Google Scholar
Huang L, Luo B (2016) Salient object detection via video spatio-temporal difference and coherence. In: International Conference on Computational Intelligence and Security, pp 1–5
Jin Y, Khan L, Wang L, Awad M (2005) Image annotations by combining multiple evidence & wordnet. In: ACM International Conference on Multimedia, pp 706–715
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. International Conference on Computer Vision 2556–2563
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Lin CY, Tseng BL, Smith JR (2003) Videoannex: Ibm mpeg-7 annotation tool for multimedia indexing and concept learning. In: IEEE International Conference on Multimedia and Expo, pp 1–2
Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116
Article MathSciNet Google Scholar
Liu AA, Su YT, Nie WZ, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114
Article Google Scholar
Liu AA, Xu N, Nie WZ, Su YT, Wong Y, Kankanhalli M (2016) Benchmarking a multimodal and multiview and interactive dataset for human action recognition. IEEE Transactions on Cybernetics 1–14
Liu D, Hua XS, Yang L, Wang M, Zhang H (2009) Tag ranking. In: International Conference on World Wide Web, pp 351–360
Liu D, Hua XS, Zhang HJ (2011) Content-based tag processing for internet social images. Multimedia Tools and Applications 51(2):723–738
Article Google Scholar
Liu J, Ren T, Wang Y, Zhong SH, Bei J, Chen S (2017) Object proposal on rgb-d images via elastic edge boxes. Neurocomputing
Nguyen PX, Rogez G, Fowlkes C, Ramamnan D (2016) The open world of micro-videos. arXiv:1603.09439
Reddyv KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981
Article Google Scholar
Redi M, O’Hare N, Schifanella R, Trevisiol M, Jaimes A (2014) 6 seconds of sound and vision: Creativity in micro-videos. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 4272–4279
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp 91–99
Ren T, Liu Y, Ju R, Wu G (2016) How important is location information in saliency detection of natural images. Multimedia Tools and Applications 75(5):2543–2564
Article Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Article Google Scholar
Sang J, Xu C, Liu J (2012) User-aware image tag refinement via ternary semantic analysis. IEEE Trans Multimedia 14(3):883–895
Article Google Scholar
Sano S, Yamasaki T, Aizawa K (2014) Degree of loop assessment in microvideo. In: IEEE International Conference on Image Processing, pp 5182–5186
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition
Sun C, Bao BK, Xu C (2015) Knowing verb from object: Retagging with transfer learning on verb-object concept images. IEEE Trans Multimedia 17(10):1747–1759
Article Google Scholar
Tang J, Hong R, Yan S, Chua TS, Qi GJ, Jain R (2011) Image annotation by knn-sparse graph-based label propagation over noisily tagged web images. ACM Trans Intell Syst Technol 2(2):14
Article Google Scholar
Tang J, Li M, Li Z, Zhao C (2015) Tag ranking based on salient region graph propagation. Multimedia Systems 21(3):267–275
Article Google Scholar
Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970
Article Google Scholar
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision, pp 3551–3558
Wang M, Ni B, Hua XS, Chua TS (2012) Assistive tagging: A survey of multimedia tagging with human-computer joint exploration. ACM Comput Surv 44(4):25
Article Google Scholar
Weinberger KQ, Slaney M, Van Zwol R (2008) Resolving tag ambiguity. In: ACM International Conference on Multimedia, pp 111–120
Xiao J, Ehinger KA, Hays J, Torralba A, Oliva A (2014) Sun database: Exploring a large collection of scene categories. Int J Comput Vis 1–20
Xu X, Geng W, Ju R, Yang Y, Ren T, Wu G (2014) Obsir: Object-based stereo image retrieval. In: IEEE International Conference on Multimedia and Expo, pp 1–6
Xu X, Ren T, Wu G (2014) Clsh: Cluster-based locality-sensitive hashing. In: International Conference on Internet Multimedia Computing and Service, p 144
Yan R, Natsev A, Campbell M (2009) Hybrid tagging and browsing approaches for efficient manual image annotation. IEEE MultiMedia 16(2):0026–41
Article Google Scholar
Yang S, Chen M, Pomerleau D, Sukthankar R (2010) Food recognition using statistics of pairwise local features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2249–2256
Ye T, Zhang D, Gao K, Jin G, Zhang Y, Yuan Q (2014) Salient region detection: Integrate both global and local cues. In: IEEE International Conference on Multimedia and Expo, pp 1–6
Yong SP, Deng JD, Purvis MK (2013) Wildlife video key-frame extraction based on novelty detection in semantic context. Multimedia Tools and Applications 62(2):359–376
Article Google Scholar
Yuan Z, Sang J, Xu C (2013) Tag-aware image classification via nested deep belief nets. In: IEEE International Conference on Multimedia and Expo, pp 1–6
Zhang J, Nie L, Wang X, He X, Huang X, Chua TS (2016) Shorter-is-better: Venue category estimation from micro-video. In: ACM International Conference on Multimedia 1415–1424
Zhang H, Shang X, Luan H, Wang M, Chua TS (2016) Learning from collective intelligence: Feature learning using social images and tags. ACM Trans Multimed Comput Commun Appl 13
Zhang H, Shen F, Liu W, He X, Luan H, Chua TS (2016) Discrete collaborative filtering. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, p 16
Zhang H, Zha ZJ, Yang Y, Yan S, Gao Y, Chua TS (2013) Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: ACM international conference on Multimedia, pp 33–42
Zhong SH, Liu Y, Chen QC (2015) Visual orientation inhomogeneity based scale-invariant feature transform. Expert Syst Appl 42(13):5658–5667
Article Google Scholar
Zhong SH, Liu Y, Ren F, Zhang J, Ren T (2013) Video saliency detection via dynamic consistent spatio-temporal attention modelling. In: AAAI Conference on artificial intelligence
Zhu G, Yan S, Ma Y (2010) Image tag refinement towards low-rank, content-tag prior and error sparsity. In: ACM International Conference on Multimedia, pp 461–470
Zitnick CL, Dollár P (2014) Edge boxes: Locating object proposals from edges. In: European Conference on Computer Vision, pp 391–405

Download references

Acknowledgements

The authors would like to thank the anonymous reviews for their helpful suggestion. This work is supported by National Science Foundation of China (61202320) and Research Project of Excellent State Key Laboratory (61223003).

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Lei Huang & Bin Luo

Authors

Lei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Luo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, L., Luo, B. Tag refinement of micro-videos by learning from multiple data sources. Multimed Tools Appl 76, 20341–20358 (2017). https://doi.org/10.1007/s11042-017-4781-z

Download citation

Received: 15 November 2016
Revised: 26 March 2017
Accepted: 28 April 2017
Published: 25 May 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11042-017-4781-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tag refinement of micro-videos by learning from multiple data sources

Abstract

Access this article

Similar content being viewed by others

DUT-WEBV: A Benchmark Dataset for Performance Evaluation of Tag Localization for Web Video

On the tag localization of web video

Recognizing key segments of videos for video annotation by learning from web image sets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tag refinement of micro-videos by learning from multiple data sources

Abstract

Access this article

Similar content being viewed by others

DUT-WEBV: A Benchmark Dataset for Performance Evaluation of Tag Localization for Web Video

On the tag localization of web video

Recognizing key segments of videos for video annotation by learning from web image sets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation