Skip to main content

Advertisement

Log in

Typicality ranking: beyond accuracy for video semantic annotation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In video annotation, the typicalities or relevancy degrees of relevant samples to a certain concept are generally different. Thus we argue that it is more reasonable to rank typical relevant samples higher than non-typical ones. However, generally the labels of the training data only differentiate relevant of irrelevant; that is to say, typical or non-typical training samples have the same contribution to the learning process. Therefore, the learned scores of the unlabeled data cannot well measure the typicality. Accordingly, three pre-processing approaches are proposed to relax the labels of the training data to real-valued typicality scores. Then the typicality scores of the training data are propagated to unlabeled data using manifold ranking. Meanwhile, we propose to use a novel criterion, Average Typicality Precision (ATP), to replace the frequently used one, Average Precision (AP), for evaluating the performance of video typicality ranking algorithms. Though AP cares the number of relevant samples at the top of the annotation rank list, it actually does not care the typicality order of these samples, while which was taken into consideration of the evaluation strategy ATP. Experiments conducted on the TRECVID data set demonstrate that this typicality ranking scheme is more consistent with human perception than normal accuracy based ranking schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Duda RO, Stork DG, Hart PE (Oct. 2000) Pattern classification, 2nd edn. John Wiley.

  2. Feng SL, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. IEEE Conf Comput Vis Pattern Recogn.

  3. Ghoshal A, Arcing P, Khudanpur S (2005) Hidden Markov models for automatic annotation and content-based retrieval of images and video. ACM Conference on Research & Development on Information Retrieval.

  4. Guidelines for the TRECVID 2005 Evaluation. http://www-nlpir.nist.gov/projects/tv2005/tv2005.html.

  5. He J, Li M, Zhang H-J, Tong H, Zhang C (Oct. 2006) Generalized manifold-ranking based image retrieval. IEEE Trans Image Process.

  6. Li X, Snoek CGM, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimed 11(7).

  7. Liu D, Hua X-S, Yang L, Wang M, Zhang H-J (2009) Tag ranking. 18th International Conference on World Wide Web.

  8. Naphade M, Smith JR, Tesic J, Chang S-F, Hsu W, Kennedy L, Hauptmann AG, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Multimedia 16:3

    Google Scholar 

  9. Rui Y, Huang TS, Ortega M, Mehrotra S (Sept. 1998) Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans Circuit Syst Video Technol.

  10. Schein AI (2005) Active learning for logistic regression, PhD thesis, University of Pennsylvania.

  11. Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. Proc. 17th International Conference on Machine Learning.

  12. Schwaninger A, Vogel J, Hofer F, Schiele B (Oct. 2006) A psychophysically plausible model for typicality ranking of natural scenes. ACM Trans Appl Percept 3(Issue 4).

  13. Seung HS, Opper M, Sompolinsky H (1992) Query by committee. Conference on Computational Learning Theory.

  14. Shen J, Cheng Z (2011) Personalized video similarity measure. Multimedia Systems.

  15. Shen J, Tao D, Li X (2008) Modality mixture projections for semantic video event detection. IEEE Trans Circ Syst Video Tech 18:11

    Google Scholar 

  16. Smith JR, Schirling P (2006) Metadata standards roundup. IEEE Multimedia 13:2

    Article  Google Scholar 

  17. Snoek CG, Worring M, Smeulders AW (2005) Early versus late fusion in semantic video analysis. ACM International Conference on Multimedia.

  18. Snoek CGM, Worring M, Gemert JCV, Geusebroek J-M, Smeulders AWM (2006) The challenge problem for automated detection of 101 semantic concepts in multimedia. ACM Multimedia.

  19. Song Y, Hua X-S, Dai L, Wang M (2005) Semi-automatic video annotation based on active learning with multiple complementary predictors. ACM International Workshop on Multimedia Information Retrieval.

  20. Song Y, Hua X-S, Qi G-J, Dai L-R, Wang M, Zhang H-J (2006) Efficient semantic annotation method for indexing large personal video database. ACM International Workshop on Multimedia Information Retrieval.

  21. Tang J, Song Y, Hua X-S, Mei T, Wu X (2006) To construct optimal training set for video annotation. ACM International Conference on Multimedia.

  22. Tang J, Hua X-S, Mei T, Qi G-J, Wu X (2007) Video annotation based on temporally consistent gaussian random field. Electron Lett 43(8).

  23. Tang J, Hua X-S, Qi G-J, Wang M, Mei T, Wu X (2007) Structure-sensitive manifold ranking for video concept detection. ACM Multimedia. Augsburg, Germany, Sep. 23–29.

  24. Tang J, Hua X-S, Qi G-J, Gu Z, Wu X (2007) Beyond accuracy: typicality ranking for video annotation. IEEE International Conference on Multimedia and Expo.

  25. Tang J, Hua X-S, Qi G-J, Song Y, Wu X (2008) Video annotation based on kernel linear neighborhood propagation. IEEE Trans Multimed 10:4

    Article  Google Scholar 

  26. Tang J, Hua X-S, Wang M, Gu Z, Qi G-J, Wu X (2009) Correlative linear neighborhood propagation for video annotation. IEEE Trans Syst Man Cybern B Cybern 39:2

    Article  Google Scholar 

  27. Tang J, Wang M, Hua X-S, Chua T-S (2011) Social media mining and search. Multimed Tool Appl.

  28. Tong H, He J, Li M, Zhang C, Ma W (2005) Graph based multimodality learning. ACM Multimedia.

  29. TREC-10 appendix on common evaluation measures. http://trec.nist.gov/pubs/trec10/appendices/measures.pdf.

  30. Wang F, Zhang C (2008) Label propagation through linear neighborhoods. IEEE Trans Knowl Data Eng 20:1

    Article  Google Scholar 

  31. Wang M, Hua X-S, Song Y, Yuan X, Dai L, Zhang H-J (2006) Automatic video annotation by semi-supervised learning with kernel density estimation. ACM International Conference on Multimedia.

  32. Wu Y, Chang EY (2004) Optimal multimodal fusion for multimedia data analysis. ACM International Conference on Multimedia.

  33. Yan R, Hauptamann AG (2003) The combination limit in multimedia retrieval. ACM International Conference on Multimedia.

  34. Yan R, Naphade M (2005) Semi-supervised cross feature learning for semantic concept detection in videos. IEEE Conf Comput Vis Pattern Recogn.

  35. Yang J, Liu Y, Ping EX, Hauptmann AG (2007) Harmonium models for semantic video representation and classification. SIAM Conference on Data Mining.

  36. Yuan X, Hua X-S, Wang M, Wu X (2006) Manifold-ranking based video concept detection on large database and feature pool. ACM International Conference on Multimedia.

  37. Zhou D, Bousquet O, Lal TN, Weston J, Scholkopf B (2003) Learning with local and global consistency. 17-th Annual Conference on Neural Information Processing Systems.

Download references

Acknowledgement

The work presented in this paper was partially supported by National Nature Science Foundation of China (NSFC) under grants 61103059 and 61173104, and Jiangsu Nature Science Foundation under grant BK2011700.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinhui Tang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, J., Hua, XS. Typicality ranking: beyond accuracy for video semantic annotation. Multimed Tools Appl 70, 647–660 (2014). https://doi.org/10.1007/s11042-011-0892-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-011-0892-0

Keywords

Navigation