Skip to main content
Log in

Multimedia automatic annotation by mining label set correlation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Organizing and retrieving multimedia data heavily rely on the relevant textual descriptions. Multimedia automatic annotation, which assigns text labels to multimedia samples, has been widely studied. Among others, search-based annotation methods are well suited for annotation tasks on large-scale datasets and are studied in depth because of their simplicity and scalability. However, classical search based annotation methods address this problem by treating each label independently, which ignores the correlation between different labels in the assigned label set. This paper aims to integrate the relevant information of the label set with respect to the multimedia content and the inner correlated information of the label set into a joint learning framework. We evaluate the performance of the proposed method on MIRFLICKR-25000 and NUS-WIDE datasets. Experimental results show that the proposed annotation method achieves excellent performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Chang XJ, Shen HQ, Wang S, Liu JJ, Li X (2014) Semi-supervised feature analysis for multimedia annotation by mining label correlation. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 74–85. https://doi.org/10.1007/978-3-319-06605-9_7

    Chapter  Google Scholar 

  2. Chang XJ, Yi Y Semi-supervised feature analysis by mining correlations among multiple tasks, IEEE Transactions on Neural Networks and Learning Systems, Early access articles, https://doi.org/10.1109/TNNLS.2016.2582746

    Article  MathSciNet  Google Scholar 

  3. Chang XJ, Ma ZG, Yi Y, Zeng ZQ, Alexander GH (2016) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197. https://doi.org/10.1109/TCYB.2016.2539546

    Article  Google Scholar 

  4. Chang XJ, Ma ZG, Lin M, Yi Y, Alexander GH (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920. https://doi.org/10.1109/TIP.2017.2708506

    Article  MathSciNet  MATH  Google Scholar 

  5. Chang XJ, YU YL, Yi Y, Xing EP Semantic pooling for complex event analysis in untrimmed videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, Early access articles, https://doi.org/10.1109/TPAMI.2016.2608901

    Article  Google Scholar 

  6. Chen L, Xu D, Tsang IW, Luo J (2012) Tag-based image retrieval improved by augmented features and group-based refinement. IEEE Trans Multimed 14(4):1057–1067. https://doi.org/10.1109/10.1109/TMM.2012.2187435

    Article  Google Scholar 

  7. Chua TS, Tang JH, Hong RC, Li H J, Luo ZP, Zheng YT (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, no, p 48 https://doi.org/10.1145/1646396.1646452

  8. Duygulu P, Barnard K, De Freitas JF, Forsyth DA (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proceedings of the European Conference on Computer Vision, pp 97–112 https://doi.org/10.1007/3-540-47979-1_7

    Chapter  Google Scholar 

  9. Feng Z, Feng S, Jin R, Jain AK (2014) Image tag completion by noisy matrix recovery. In: Proceedings of the European Conference on Computer Vision, pp 424–438 https://doi.org/10.1007/978-3-319-10584-0_28

    Chapter  Google Scholar 

  10. Gao Y, Wang M, Zha ZJ, Shen J, Li X, Wu X (2013) Visual-textual joint relevance learning for tag-based social image search. IEEE Trans Image Process 22(1):363–376. https://doi.org/10.1109/TIP.2012.2202676

    Article  MathSciNet  MATH  Google Scholar 

  11. Guillaumin M, Mensink T, Verbeek J (2009) Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of the IEEE 12th International Conference on Computer Vision, pp 309–316 https://doi.org/10.1109/ICCV.2009.5459266

  12. Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of 1st ACM International Conference on Multimedia Information Retrieval, pp 39–43 https://doi.org/10.1145/1460096.1460104

  13. Jin Y, Khan L, Wang L, Awad M (2005) Image annotations by combining multiple evidenceandamp; WordNet. In: Proceedings of 13th ACM International Conference on Multimedia, pp 706–715 https://doi.org/10.1145/1101149.1101305

  14. Kalayeh MM, Idrees H, Shah M (2014) NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 184–191 https://doi.org/10.1109/CVPR.2014.31

  15. Kuo YH, Cheng WH, Lin HT, Hsu WH (2012) Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Trans Multimed 14(4):1079–1090. https://doi.org/10.1109/TMM.2012.2190386

    Article  Google Scholar 

  16. Lee S, De Neve W, Ro YM (2014) Visually weighted neighbor voting for image tag relevance learning. Multimed Tools Appl 72(2):1363–1386. https://doi.org/10.1007/s11042-013-1439-3

    Article  Google Scholar 

  17. Li XR (2014) Tag relevance fusion for social image retrieval. Multimed Syst 23(1):29–40. https://doi.org/10.1007/s00530-014-0430-9

    Article  Google Scholar 

  18. Li X, Snoek CG (2013) Classifying tag relevance with relevant positive and negative examples. In: Proceedings of the 21st ACM International Conference on Multimedia, pp 485–488 https://doi.org/10.1145/2502081.2502129

  19. Li XR, Snoek CG, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimed 11(7):1310–1322. https://doi.org/10.1109/TMM.2009.2030598

    Article  Google Scholar 

  20. Li X, Uricchio T, Ballan L, Bertini M, Snoek CG, Bimbo AD (2016) Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Comput Surv 49(1):14. https://doi.org/10.1145/2906152

    Article  Google Scholar 

  21. Liu J, Li M, Ma WY, Liu Q, Lu H (2006) An adaptive graph model for automatic image annotation. In: Proceedings of 14th ACM International Conference on Multimedia, pp 61–70 https://doi.org/10.1145/1178677.1178689

  22. Liu D, Hua XS, Yang L, Wang M, Zhang HJ (2009) Tag ranking. In: Proceedings of the 18th International Conference on World Wide Web, pp 351–360 https://doi.org/10.1145/1526709.1526757

  23. Liu D, Wang M, Yang L, Hua XS, Zhang HJ (2009) Tag quality improvement for social images. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp 350–353 https://doi.org/10.1109/ICME.2009.5202506

  24. Liu J, Zhang Y, Li Z, Lu H (2013) Correlation consistency constrained probabilistic matrix factorization for social tag refinement. Neurocomputing 119(16):3–9. https://doi.org/10.1016/j.neucom.2012.02.052

    Article  Google Scholar 

  25. Liu AA, Su YT, Jia PP, Gao Z, Hao T, Yang ZX (2015) Multipe/single-view human action recognition via part-induced Multitask structural learning. IEEE Trans Cybern 45(6):1194–1208. https://doi.org/10.1109/TCYB.2014.2347057

    Article  Google Scholar 

  26. Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Trans Image Process 25(5):2103–2116. https://doi.org/10.1109/TIP.2016.2540802

    Article  MathSciNet  MATH  Google Scholar 

  27. Liu AA, Su YT, Nie WZ, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114. https://doi.org/10.1109/TPAMI.2016.2537337

    Article  Google Scholar 

  28. Liu AA, Xu N, Nie WZ, Su Y, Wong Y, Kankanhalli M (2017) Benchmarking a multimodal and multiview and interactive dataset for human action recognition. IEEE Trans Cybern 47(7):1781–1794. https://doi.org/10.1109/TCYB.2016.2582918

    Article  Google Scholar 

  29. Monay F, Gatica-Perez D (2004) PLSA-based image auto-annotation: constraining the latent space. In: Proceedings of the 12th annual ACM international conference on Multimedia, pp 348–351 https://doi.org/10.1145/1027527.1027608

  30. Nie LQ, Wang M, Zha ZJ, Li G, Chua TS (2011) Multimedia answering:enriching text QA with media information. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 695–704 https://doi.org/10.1145/2009916.2010010

  31. Nie LQ, Yan SC, Wang M, Hong RC, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: Proceedings of the ACM International Conference on Multimedia, pp 59–68 https://doi.org/10.1145/2393347.2393363

  32. Nie LQ, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search:a content-based approach to performance prediction. Acm Trans Inform Syst 30(2):13. https://doi.org/10.1145/2180868.2180875

    Article  Google Scholar 

  33. Nie WZ, Liu AA, Su YT (2016) Cross-domain semantic transfer from large-scale social media. Multimed Syst 22(1):75–85. https://doi.org/10.1007/s00530-014-0394-9

    Article  Google Scholar 

  34. Richter F, Romberg S, Horster E, Lienhart R (2012) Leveraging community metadata for multimodal image ranking. Multimed Tools Appl 56(1):35–62. https://doi.org/10.1007/s11042-010-0554-7

    Article  Google Scholar 

  35. Sang J, Xu C, Liu J (2012) User-aware image tag refinement via ternary semantic analysis. IEEE Trans Multimed 14(3):883–895. https://doi.org/10.1109/TMM.2012.2188782

    Article  Google Scholar 

  36. Sigurbjrnsson B, Van ZR (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of 17th ACM International Conference of World Wide Web, pp 327–336 https://doi.org/10.1145/1367497.1367542

  37. Socher R, Li FF (2010) Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In: Proceedings of the 23th IEEE Conference on Computer Vision and Pattern Recognition, pp 966–973 https://doi.org/10.1109/CVPR.2010.5540112

  38. Tian F, Shen XK (2015) Learning semantic concepts from noisy media collection for automatic image annotation. Chin J Electron 24(4):790–794. https://doi.org/10.1049/cje.2015.10.021

    Article  Google Scholar 

  39. Tian F, Liu XM, Liu ZX, Sun N, Wang M, Wang HC et al Multimedia integrated annotation based on common space learning, Multimedia Tools and Applications, Early access articles, https://doi.org/10.1007/s11042-017-5068-0

    Article  Google Scholar 

  40. Tian F, Shen XK, Shang FH Automatic image annotation with realworld community contributed data set, Multimedia Systems, Early access articles, https://doi.org/10.1007/s00530-017-0548-7

  41. Verbeek J, Guillaumin M, Mensink T, Schmid C (2010) Image annotation with tagprop on the mirflickr set. In: Proceedings of the International Conference on Multimedia Information Retrieval, pp 537–546 https://doi.org/10.1145/1743384.1743476

  42. Wang XJ, Zhang L, Li XR, Ma WY (2008) Annotating images by mining image search results. IEEE Trans Pattern Anal Mach Intell 30(11):1919–1932. https://doi.org/10.1109/TPAMI.2008.127

    Article  Google Scholar 

  43. Wang H, Huang H, Ding C (2009) Image annotation using multi-label correlated green’s function. In: Proceedings of the IEEE 12th International Conference on Computer Vision, pp 2029–2034 https://doi.org/10.1109/ICCV.2009.5459447

  44. Wang H, Huang H, Ding C (2010) Multi-label feature transform for image classifications. In: Proceedings of the European Conference on Computer Vision, pp 793–806 https://doi.org/10.1007/978-3-642-15561-1_57

    Chapter  Google Scholar 

  45. Wang M, Ni B, Hua XS (2012) Assistive tagging: A survey of multimedia tagging with human-computer joint exploration. ACM Comput Surv 44(4):1–24. https://doi.org/10.1145/2333112.2333120

    Article  Google Scholar 

  46. Wang J, Zhou J, Xu H, Mei T, Hua XS, Li S (2014) Image tag refinement by regularized latent Dirichlet allocation. Comput Vis Image Understand 124:61–70. https://doi.org/10.1016/j.cviu.2014.02.011

    Article  Google Scholar 

  47. Wu L, Jin R, Jain AK (2013) Tag completion for image retrieval. IEEE Trans Pattern Anal Mach Intell 35(3):716–727. https://doi.org/10.1109/TPAMI.2012.124

    Article  Google Scholar 

  48. Xu X, Shimada A, Taniguchi RI (2014) Tag completion with defective tag assignments via image-tag re-weighting. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp 1–6 https://doi.org/10.1109/ICME.2014.6890154

  49. Yakhnenko O, Honavar V (2008) Annotating images and image objects using a hierarchical dirichlet process model. In: Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD, pp 1–7 https://doi.org/10.1145/1509212.1509213

  50. Zhou B, Jagadeesh V, Piramuthu R (2015) Conceptlearner: Discovering visual concepts from weakly labeled image collections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1492–1500 https://doi.org/10.1109/CVPR.2015.7298756

  51. Zhu G, Yan S, Ma Y (2010) Image tag refinement towards low-rank, content-tag prior and error sparsity. In: Proceedings of the 18th ACM international conference on Multimedia, pp 461–470 https://doi.org/10.1145/1873951.1874028

  52. Zhu X, Nejdl W, Georgescu M (2014) An adaptive teleportation random walk model for learning social tag relevance. In: Proceedings of the 37th International ACM SIGIR Conference on Research andamp; Development in Information Retrieval, pp 223–232 https://doi.org/10.1145/2600428.2609556

Download references

Acknowledgements

Special thanks should go to the collaborators in the Lab for Media Search of National University of Singapore, for their instructive advice and useful suggestions on this work. This work is supported by the Natural Science Foundation of China (No.61502094,61402099) and Natural Science Foundation of Heilongjiang Province of China(No.F2016002,F2015020).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Tian.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, F., Shen, X. & Liu, X. Multimedia automatic annotation by mining label set correlation. Multimed Tools Appl 77, 3473–3491 (2018). https://doi.org/10.1007/s11042-017-5170-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5170-3

Keywords

Navigation