Abstract
Organizing and retrieving multimedia data heavily rely on the relevant textual descriptions. Multimedia automatic annotation, which assigns text labels to multimedia samples, has been widely studied. Among others, search-based annotation methods are well suited for annotation tasks on large-scale datasets and are studied in depth because of their simplicity and scalability. However, classical search based annotation methods address this problem by treating each label independently, which ignores the correlation between different labels in the assigned label set. This paper aims to integrate the relevant information of the label set with respect to the multimedia content and the inner correlated information of the label set into a joint learning framework. We evaluate the performance of the proposed method on MIRFLICKR-25000 and NUS-WIDE datasets. Experimental results show that the proposed annotation method achieves excellent performance.
Similar content being viewed by others
References
Chang XJ, Shen HQ, Wang S, Liu JJ, Li X (2014) Semi-supervised feature analysis for multimedia annotation by mining label correlation. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 74–85. https://doi.org/10.1007/978-3-319-06605-9_7
Chang XJ, Yi Y Semi-supervised feature analysis by mining correlations among multiple tasks, IEEE Transactions on Neural Networks and Learning Systems, Early access articles, https://doi.org/10.1109/TNNLS.2016.2582746
Chang XJ, Ma ZG, Yi Y, Zeng ZQ, Alexander GH (2016) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197. https://doi.org/10.1109/TCYB.2016.2539546
Chang XJ, Ma ZG, Lin M, Yi Y, Alexander GH (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920. https://doi.org/10.1109/TIP.2017.2708506
Chang XJ, YU YL, Yi Y, Xing EP Semantic pooling for complex event analysis in untrimmed videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, Early access articles, https://doi.org/10.1109/TPAMI.2016.2608901
Chen L, Xu D, Tsang IW, Luo J (2012) Tag-based image retrieval improved by augmented features and group-based refinement. IEEE Trans Multimed 14(4):1057–1067. https://doi.org/10.1109/10.1109/TMM.2012.2187435
Chua TS, Tang JH, Hong RC, Li H J, Luo ZP, Zheng YT (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, no, p 48 https://doi.org/10.1145/1646396.1646452
Duygulu P, Barnard K, De Freitas JF, Forsyth DA (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proceedings of the European Conference on Computer Vision, pp 97–112 https://doi.org/10.1007/3-540-47979-1_7
Feng Z, Feng S, Jin R, Jain AK (2014) Image tag completion by noisy matrix recovery. In: Proceedings of the European Conference on Computer Vision, pp 424–438 https://doi.org/10.1007/978-3-319-10584-0_28
Gao Y, Wang M, Zha ZJ, Shen J, Li X, Wu X (2013) Visual-textual joint relevance learning for tag-based social image search. IEEE Trans Image Process 22(1):363–376. https://doi.org/10.1109/TIP.2012.2202676
Guillaumin M, Mensink T, Verbeek J (2009) Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of the IEEE 12th International Conference on Computer Vision, pp 309–316 https://doi.org/10.1109/ICCV.2009.5459266
Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of 1st ACM International Conference on Multimedia Information Retrieval, pp 39–43 https://doi.org/10.1145/1460096.1460104
Jin Y, Khan L, Wang L, Awad M (2005) Image annotations by combining multiple evidenceandamp; WordNet. In: Proceedings of 13th ACM International Conference on Multimedia, pp 706–715 https://doi.org/10.1145/1101149.1101305
Kalayeh MM, Idrees H, Shah M (2014) NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 184–191 https://doi.org/10.1109/CVPR.2014.31
Kuo YH, Cheng WH, Lin HT, Hsu WH (2012) Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Trans Multimed 14(4):1079–1090. https://doi.org/10.1109/TMM.2012.2190386
Lee S, De Neve W, Ro YM (2014) Visually weighted neighbor voting for image tag relevance learning. Multimed Tools Appl 72(2):1363–1386. https://doi.org/10.1007/s11042-013-1439-3
Li XR (2014) Tag relevance fusion for social image retrieval. Multimed Syst 23(1):29–40. https://doi.org/10.1007/s00530-014-0430-9
Li X, Snoek CG (2013) Classifying tag relevance with relevant positive and negative examples. In: Proceedings of the 21st ACM International Conference on Multimedia, pp 485–488 https://doi.org/10.1145/2502081.2502129
Li XR, Snoek CG, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimed 11(7):1310–1322. https://doi.org/10.1109/TMM.2009.2030598
Li X, Uricchio T, Ballan L, Bertini M, Snoek CG, Bimbo AD (2016) Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Comput Surv 49(1):14. https://doi.org/10.1145/2906152
Liu J, Li M, Ma WY, Liu Q, Lu H (2006) An adaptive graph model for automatic image annotation. In: Proceedings of 14th ACM International Conference on Multimedia, pp 61–70 https://doi.org/10.1145/1178677.1178689
Liu D, Hua XS, Yang L, Wang M, Zhang HJ (2009) Tag ranking. In: Proceedings of the 18th International Conference on World Wide Web, pp 351–360 https://doi.org/10.1145/1526709.1526757
Liu D, Wang M, Yang L, Hua XS, Zhang HJ (2009) Tag quality improvement for social images. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp 350–353 https://doi.org/10.1109/ICME.2009.5202506
Liu J, Zhang Y, Li Z, Lu H (2013) Correlation consistency constrained probabilistic matrix factorization for social tag refinement. Neurocomputing 119(16):3–9. https://doi.org/10.1016/j.neucom.2012.02.052
Liu AA, Su YT, Jia PP, Gao Z, Hao T, Yang ZX (2015) Multipe/single-view human action recognition via part-induced Multitask structural learning. IEEE Trans Cybern 45(6):1194–1208. https://doi.org/10.1109/TCYB.2014.2347057
Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Trans Image Process 25(5):2103–2116. https://doi.org/10.1109/TIP.2016.2540802
Liu AA, Su YT, Nie WZ, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114. https://doi.org/10.1109/TPAMI.2016.2537337
Liu AA, Xu N, Nie WZ, Su Y, Wong Y, Kankanhalli M (2017) Benchmarking a multimodal and multiview and interactive dataset for human action recognition. IEEE Trans Cybern 47(7):1781–1794. https://doi.org/10.1109/TCYB.2016.2582918
Monay F, Gatica-Perez D (2004) PLSA-based image auto-annotation: constraining the latent space. In: Proceedings of the 12th annual ACM international conference on Multimedia, pp 348–351 https://doi.org/10.1145/1027527.1027608
Nie LQ, Wang M, Zha ZJ, Li G, Chua TS (2011) Multimedia answering:enriching text QA with media information. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 695–704 https://doi.org/10.1145/2009916.2010010
Nie LQ, Yan SC, Wang M, Hong RC, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: Proceedings of the ACM International Conference on Multimedia, pp 59–68 https://doi.org/10.1145/2393347.2393363
Nie LQ, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search:a content-based approach to performance prediction. Acm Trans Inform Syst 30(2):13. https://doi.org/10.1145/2180868.2180875
Nie WZ, Liu AA, Su YT (2016) Cross-domain semantic transfer from large-scale social media. Multimed Syst 22(1):75–85. https://doi.org/10.1007/s00530-014-0394-9
Richter F, Romberg S, Horster E, Lienhart R (2012) Leveraging community metadata for multimodal image ranking. Multimed Tools Appl 56(1):35–62. https://doi.org/10.1007/s11042-010-0554-7
Sang J, Xu C, Liu J (2012) User-aware image tag refinement via ternary semantic analysis. IEEE Trans Multimed 14(3):883–895. https://doi.org/10.1109/TMM.2012.2188782
Sigurbjrnsson B, Van ZR (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of 17th ACM International Conference of World Wide Web, pp 327–336 https://doi.org/10.1145/1367497.1367542
Socher R, Li FF (2010) Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In: Proceedings of the 23th IEEE Conference on Computer Vision and Pattern Recognition, pp 966–973 https://doi.org/10.1109/CVPR.2010.5540112
Tian F, Shen XK (2015) Learning semantic concepts from noisy media collection for automatic image annotation. Chin J Electron 24(4):790–794. https://doi.org/10.1049/cje.2015.10.021
Tian F, Liu XM, Liu ZX, Sun N, Wang M, Wang HC et al Multimedia integrated annotation based on common space learning, Multimedia Tools and Applications, Early access articles, https://doi.org/10.1007/s11042-017-5068-0
Tian F, Shen XK, Shang FH Automatic image annotation with realworld community contributed data set, Multimedia Systems, Early access articles, https://doi.org/10.1007/s00530-017-0548-7
Verbeek J, Guillaumin M, Mensink T, Schmid C (2010) Image annotation with tagprop on the mirflickr set. In: Proceedings of the International Conference on Multimedia Information Retrieval, pp 537–546 https://doi.org/10.1145/1743384.1743476
Wang XJ, Zhang L, Li XR, Ma WY (2008) Annotating images by mining image search results. IEEE Trans Pattern Anal Mach Intell 30(11):1919–1932. https://doi.org/10.1109/TPAMI.2008.127
Wang H, Huang H, Ding C (2009) Image annotation using multi-label correlated green’s function. In: Proceedings of the IEEE 12th International Conference on Computer Vision, pp 2029–2034 https://doi.org/10.1109/ICCV.2009.5459447
Wang H, Huang H, Ding C (2010) Multi-label feature transform for image classifications. In: Proceedings of the European Conference on Computer Vision, pp 793–806 https://doi.org/10.1007/978-3-642-15561-1_57
Wang M, Ni B, Hua XS (2012) Assistive tagging: A survey of multimedia tagging with human-computer joint exploration. ACM Comput Surv 44(4):1–24. https://doi.org/10.1145/2333112.2333120
Wang J, Zhou J, Xu H, Mei T, Hua XS, Li S (2014) Image tag refinement by regularized latent Dirichlet allocation. Comput Vis Image Understand 124:61–70. https://doi.org/10.1016/j.cviu.2014.02.011
Wu L, Jin R, Jain AK (2013) Tag completion for image retrieval. IEEE Trans Pattern Anal Mach Intell 35(3):716–727. https://doi.org/10.1109/TPAMI.2012.124
Xu X, Shimada A, Taniguchi RI (2014) Tag completion with defective tag assignments via image-tag re-weighting. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp 1–6 https://doi.org/10.1109/ICME.2014.6890154
Yakhnenko O, Honavar V (2008) Annotating images and image objects using a hierarchical dirichlet process model. In: Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD, pp 1–7 https://doi.org/10.1145/1509212.1509213
Zhou B, Jagadeesh V, Piramuthu R (2015) Conceptlearner: Discovering visual concepts from weakly labeled image collections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1492–1500 https://doi.org/10.1109/CVPR.2015.7298756
Zhu G, Yan S, Ma Y (2010) Image tag refinement towards low-rank, content-tag prior and error sparsity. In: Proceedings of the 18th ACM international conference on Multimedia, pp 461–470 https://doi.org/10.1145/1873951.1874028
Zhu X, Nejdl W, Georgescu M (2014) An adaptive teleportation random walk model for learning social tag relevance. In: Proceedings of the 37th International ACM SIGIR Conference on Research andamp; Development in Information Retrieval, pp 223–232 https://doi.org/10.1145/2600428.2609556
Acknowledgements
Special thanks should go to the collaborators in the Lab for Media Search of National University of Singapore, for their instructive advice and useful suggestions on this work. This work is supported by the Natural Science Foundation of China (No.61502094,61402099) and Natural Science Foundation of Heilongjiang Province of China(No.F2016002,F2015020).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tian, F., Shen, X. & Liu, X. Multimedia automatic annotation by mining label set correlation. Multimed Tools Appl 77, 3473–3491 (2018). https://doi.org/10.1007/s11042-017-5170-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5170-3