Skip to main content
Log in

Max-margin adaptive model for complex video pattern recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Patternrecognitionmodels are usually used in a variety of applications ranging from video concept annotation to event detection. In this paper we propose a new framework called the max-margin adaptive (MMA) model for complex video pattern recognition, which can utilize a large number of unlabeled videos to assist the model training. The MMA model considers the data distribution consistence between labeled training videos and unlabeled auxiliary ones from the statistical perspective by learning an optimal mapping function which also broadens the margin between positive labeled videos and negative labeled videos to improve the robustness of the model. The experiments are conducted on two public datasets including CCV for video object/event detection and HMDB for action recognition. Our results demonstrate that the proposed MMA model is very effective on complex video pattern recognition tasks, and outperforms the state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.nist.gov/itl/iad/mig/med.cfm

  2. http://www.ee.columbia.edu/dvmm/CCV/.

  3. http://serre-lab.clps.brown.edu/resources/HMDB/

References

  1. Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimedia Tools Appl 51(1):279–302

    Article  Google Scholar 

  2. Blitzer J, Crammer K, Kulesza A, Pereira F, Wortman J (2007) Learning bounds for domain adaptation. In: NIPS, pp 129–136

  3. Borgwardt KM, Gretton A, Rasch MJ, Kriegel HP, Schlkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14):49–57

    Article  Google Scholar 

  4. Brefeld U, Gärtner T, Scheffer T, Wrobel S (2006) Efficient co-regularised least squares regression. In: ICML, pp 137–144

  5. Charles J, Pfister T, Magee D, Hogg D, Zisserman A (2013) Domain adaptation for upper body pose tracking in signed tv broadcasts. In: Proceedings of the British machine vision conference

  6. Chen B, Lam W, Tsang IW, Wong TL (2013) Discovering low-rank shared concept space for adapting text mining models. IEEE Trans Pattern Anal Mach Intell 35(6):1284–1297

    Article  Google Scholar 

  7. Cortes C, Mohri M, Rostamizadeh A (2009) L2 regularization for learning kernels. In: UAI, pp 109–116

  8. Diane C, Feuz KD, Krishnan NC (2013) Transfer learning for activity recognition: a survey. Knowl Inf Syst 36(3):537–556

    Article  Google Scholar 

  9. Duan L, Tsang I, Xu D (2012) Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(3):465–479

    Article  Google Scholar 

  10. Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: ICCV, pp 221–228

  11. Jiang YG, Ye G, Chang SF, Ellis D, Loui AC (2011) Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: ICMR, pp 29:1–29:8

  12. Jiang YG, Bhattacharya S, Chang SF, Shah M (2013) High-level event recognition in unconstrained videos. Int J Multimedia Inf Retrieval 2(2):73–101

    Article  Google Scholar 

  13. Jie L, Tommasi T, Caputo B (2011) Multiclass transfer learning from unconstrained priors. In: Computer Vision (ICCV), pp 1863–1870

  14. Joachims T (1999) Transductive inference for text classification using support vector machines. In: ICML, vol 99, pp 200–209

  15. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: ICCV

  16. Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. In: AAAI

  17. Liang F, Tang S, Wang Y, Han Q, Li J (2013) A sparse coding based transfer learning framework for pedestrian detection. In: Advances in multimedia modeling, vol 7733, pp 272-282

  18. Lin W, Sun MT, Poovendran R, Zhang Z (2010) Group event detection with a varying number of group members for video surveillance. IEEE Trans Circ Syst Video Technol 20(8):1057–1067

    Article  Google Scholar 

  19. Lin YY, Liu TL, Fuh CS (2011) Multiple kernel learning for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 33(6):1147–1160

    Article  Google Scholar 

  20. Ma Z, Yang Y, Cai Y, Sebe N, Hauptmann AG (2012) Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In: ACM multimedia, pp 469–478

  21. Ma Z, Yang Y, Sebe N, Zheng K, Hauptmann A (2013a) Multimedia event detection using a classifier-specific intermediate representation. IEEE Trans 15(7):1628–1637

    Google Scholar 

  22. Ma Z, Yang Y, Xu Z, Yan S, Sebe N, Hauptmann A (2013b) Complex event detection via multi-source video attributes. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR), pp 2627–2633

  23. Merler M, Huang B, Xie L, Hua G, Natsev A (2012) Semantic model vectors for complex video event recognition. IEEE Trans Multimed 14(1):88–101

    Article  Google Scholar 

  24. Natarajan P, Wu S, Vitaladevuni S, Zhuang X, Tsakalidis S, Park U, Prasad R (2012) Multimodal feature fusion for robust event detection in web videos. In: Computer vision and pattern recognition (CVPR), pp 1298–1305

  25. Obozinski G, Taskar B, Jordan M (2010) Joint covariate selection and joint subspace selection for multiple classification problems. Stat Comput 20(2):231–252

    Article  MathSciNet  Google Scholar 

  26. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  27. Pan SJ, Ni X, Sun JT, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: WWW, pp 751–760

  28. Quattoni A, Collins M, Darrell T (2008) Transfer learning for image classification with sparse prototype representations. In: Computer vision and pattern recognition (CVPR), pp 1–8

  29. Rohrbach M, Ebert S, Schiele B (2013) Transfer learning in a transductive setting. In: Burges C, Bottou L, Welling M, Ghahramani Z, Weinberger K (eds) Advances in neural information processing systems, vol 26, pp 46–54

  30. Sugiyama M, Id T, Nakajima S, Sese J (2010) Semi-supervised local fisher discriminant analysis for dimensionality reduction. Mach Learn 78(1–2):35–61

    Article  MathSciNet  Google Scholar 

  31. Tamrakar A, Ali S, Yu Q, Liu J, Javed O, Divakaran A, Cheng H, Sawhney H (2012) Evaluation of low-level features and their combinations for complex event detection in open source videos. In: Computer vision and pattern recognition (CVPR), pp 3681–3688

  32. Tang K, Fei-Fei L, Koller D (2012) Learning latent temporal structure for complex event detection. In: Computer vision and pattern recognition (CVPR), pp 1250–1257

  33. Tjondronegoro D, Chen YP (2010) Knowledge-discounted event detection in sports video. IEEE Trans Syst, Man Cybern, Part A: Syst Hum 40(5):1009–1024

    Article  Google Scholar 

  34. Van Erp M, Vuurpijl L, Schomaker L (2002) An overview and comparison of voting methods for pattern recognition. In: Eighth international workshop on frontiers in handwriting recognition, pp 195–200

  35. Wang S, Ma Z, Yang Y, Li X, Pang C, Hauptmann A (2014) Semi-supervised multiple feature analysis for action recognition. IEEE Trans Multimed 16(2):289–298

    Article  Google Scholar 

  36. Xiao M, Guo Y (2012) Semi-supervised kernel matching for domain adaptation. In: AAAI

  37. Xu Z, Yang Y, Tsang I, Sebe N (2013) Feature weighting via optimal thresholding for video analysis. In: The IEEE international conference on computer vision (ICCV)

  38. Yang J, Yan R, Hauptmann AG (2007) Cross-domain video concept detection using adaptive svms. In: ACM Proceedings of the 15th international conference on Multimedia, pp 188–197

  39. Yang Y, Shah M (2012) Complex events detection using data-driven concepts. In: Computer vision–ECCV 2012. Springer, pp 722–735

  40. Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann A (2013a) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimed 15(3):572–581

    Article  Google Scholar 

  41. Yang Y, Yang Y, Shen HT (2013b) Effective transfer tagging from image to video. ACM Trans Multimed Comput Commun, Appl 9(2):1–20

    Article  Google Scholar 

  42. Yao Y, Doretto G (2010) Boosting for transfer learning with multiple sources. In: Computer vision and pattern recognition (CVPR), pp 1855–1862

  43. Younessian E, Quinn M, Mitamura T, Hauptmann A (2013) Multimedia event detection using visual concept signatures. Proc SPIE 8667(1)

  44. Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Analysis Mach Intell 31(1):39–58

    Article  Google Scholar 

  45. Zhang T, Xu C, Zhu G, Liu S, Lu H (2010) A generic framework for event detection in various video domains. In: ACM multimedia, pp 103–112

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Litao Yu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, L., Shao, J., Xu, XS. et al. Max-margin adaptive model for complex video pattern recognition. Multimed Tools Appl 74, 505–521 (2015). https://doi.org/10.1007/s11042-014-2010-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2010-6

Keywords

Navigation