Skip to main content
Log in

Realtime and robust object matching with a large number of templates

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Most of conventional object matching methods are based on comparing local features, which are too computational demanding. Recently, Dominant Orientation Templates (DOT) were proposed to solve the efficiency issue. Although DOT obtains promising results, it still suffers the problem of wasting too many bits in representation and fragility when partial occlusion occurs. As the number of templates increase, the performance will decrease. Therefore, we propose a compact DOT representation with a fast partial occlusion handling approach. Instead of using seven orientations in the original implementation, we employ single orientation of the highest gradients for the proposed compact DOT representation (C-DOT). Consequently, the size of feature vectors is reduced from 8 bits to 3 bits. To efficiently tackle the partial occlusion, we introduce the C-DOT similarity map to store the matching scores of individual grids in each sliding window, which is used to further infer the occlusion map. The experimental results demonstrate that the proposed method outperforms DOT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Babenko B, Yang MH, Belongie S (2009) Visual tracking with online multiple instance learning. In: IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 983–990

  2. Bay H, Ess A, Tuytelaars T, Gool LV (2008) Surf: Speeded up robust features. Comp Vision Image Underst 110:346–359

    Article  Google Scholar 

  3. Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 1948–1955

  4. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, vol 1. IEEE, pp 886–893

  5. Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: Ideas, influences, and trends of the new age. ACM Comput Surv 40(2):1–60

    Article  Google Scholar 

  6. Gavrila D, Philomin V (1999) Real-time object detection for ”smart” vehicles. In:IEEE international conference on computer vision, vol 1. IEEE, pp 87–93

  7. Guan N, Tao D, Luo Z, Yuan B (2012) Nenmf: An optimal gradient method for nonnegative matrix factorization. IEEE Trans Signal Proc 60(6):2882–2898

    Article  MathSciNet  Google Scholar 

  8. Guan N, Tao D, Luo Z, Yuan B (2012) Online nonnegative matrix factorization with robust stochastic approximation. IEEE Trans Neural Networks Learn Syst 23(7):1087–1099

    Article  Google Scholar 

  9. Hajdu A, Pitas I (2007) Optimal approach for fast object-template matching. IEEE Trans Image Process 16(8):2048–2057

    Article  MathSciNet  Google Scholar 

  10. Hinterstoisser S, Lepetit V, Ilic S, Fua P, Navab N (2010) Dominant orientation templates for real-time detection of texture-less objects. In: IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2257–2264

  11. Hong Z, Mei X, Prokhorov D, Tao D (2013) Tracking via robust multi-task multi-view joint sparse representation. In: IEEE international conference on computer vision. IEEE, pp 1–8

  12. Hong Z, Mei X, Tao D (2012) Dual-force metric learning for robust distracter-resistant tracker. In: European conference on computer vision. Springer, pp 513–527

    Chapter  Google Scholar 

  13. Ke Y, Sukthankar R (2004) Pca-sift: A more distinctive representation for local image descriptors. In: IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 506–513

  14. Kim HY (2010) Rotation-discriminating template matching based on fourier coefficients of radial projections with robustness to scaling and partial occlusion. Pattern Recog 43:105–119

    MATH  Google Scholar 

  15. Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: British machine vision conference, pp 995–1004. PASCAL EPrints

  16. Klimovitski A (2001) Using sse and sse2 : Misconceptions and reality. Intel developer update magazine:1–8

  17. Lampert C (2010) An efficient divide-and-conquer cascade for nonlinear object detection. In:IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 1022–1029

  18. Lampert CH (2009) Detecting objects in large image collections and videos by efficient subimage retrieval.In: IEEE international conference on computer vision. IEEE

  19. Lampert CH, Blaschko MB, Hofmann T (2009) Efficient subwindow search: A branch and bound framework for object localization. IEEE Trans Pattern Anal Mach Intell 31:2129–2142

    Article  Google Scholar 

  20. Li H, Tang J, Wu S, Zhang Y, Lin S (2010) Automatic detection and analysis of player action in moving background sports video sequences. IEEE Trans Circ Syst Video Technol 20(3):351–364

    Article  Google Scholar 

  21. Li H, Wang X, Tang J, Zhao C (2013) Combining global and local matching of multiple features for precise retrieval of item images. Multimedia Systems 19(1):37–49

    Article  Google Scholar 

  22. Li P, Wang M, Cheng J, Xu C, Lu H (2013) Spectral hashing with semantically consistent graph for image indexing. IEEE Trans Multimedia 15(1):141–152

    Article  Google Scholar 

  23. Li G, Wang M, Lu Z, Hong R, Cha T (2012) In-video product annotation with web information mining. ACM Trans Multimed Comput Commun Appl 8(4):1–55

    Article  Google Scholar 

  24. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  25. McFee B, Galleguillos C, Lanckriet G (2010) Contextual object localization with multiple kernel nearest neighbor. IEEE Trans Image Process 20(2):570

    Article  MathSciNet  Google Scholar 

  26. Mezaris V, Kompatsiaris I, Boulgouris N, Strintzis M (2004) Real-time compressed-domain spatiotemporal segmentation and ontologies for video indexing and retrieval. IEEE Trans Circ Systems Video Technol 14(5):606–621

    Article  Google Scholar 

  27. Olson CF, Huttenlocher DP (1997) Automatic target recognition by matching oriented edge pixels. IEEE Trans Image Process 6 (1):103–113

    Article  Google Scholar 

  28. Ouyang W, Zhang R (2010) W.K.C.: Fast pattern matching using orthogonal haar transform. In: IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 3050–3057

  29. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 1–8

  30. Rosten E, Drummond T (2005) Fusing points and lines for high performance tracking. In:IEEE international conference on computer vision, vol 2. IEEE, pp 1508–C1511

  31. Rosten E, Porter R, Drummond T (2010) Faster and better: A machine learning approach to corner detection. IEEE Trans Pattern Anal Mach Intell 32:105–119

    Article  Google Scholar 

  32. Santner J, Leistner C, Saffari A, Pock T, Bischof H (2010) Prost: Parallel robust online simple tracking. In: IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 723–730

  33. Sivic J, Zisserman A (2008) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31:591–606

    Article  Google Scholar 

  34. Steger C (2002) Occlusion, clutter, and illumination invariant object recognition. In: International archives of photogrammetry and remote sensing

  35. Takacs G, Chandrasekhar V, Tsai S, Chen D, Grzeszczuk R, Girod B (2010) Unified real-time tracking and recognition with rotation-invariant fast features. In: IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 934–941

  36. Taylor S, Rosten E, Drummond T (2009) Robust feature matching in 2.3 μs. In: IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, pp 15–20

  37. Wang X, Han TX (2009) Yan., S.: An hog-lbp human detector with partial occlusion handling. In: IEEE international conference on computer vision. IEEE, pp 32–39

  38. Wang M, Gao Y, Lu K, Rui Y (2013) View-based discriminative probabilistic modeling for 3D object retrieval and recognition. IEEE Trans Image Process 22(4):1395–1407

    Article  MathSciNet  Google Scholar 

  39. Wang M, Li H, Tao D, Lu K, Wu X (2012) Multimodal graph-based reranking for web image search. IEEE Trans Image Process 21(11):4649–4661

    Article  MathSciNet  Google Scholar 

  40. Wang M, Ni B, Hua X, Chua T (2012) Assistive tagging: a survey of multimedia tagging with human-computer joint exploration. ACM Comput Surv 4(4). Article 25

    Article  Google Scholar 

  41. Wang M, Hua X, Hong R, Tang J, Qi G, Song Y (2009) Unified video annotation via multigraph learning. IEEE Trans Circ Syst Video Technol 19(5):733–746

    Article  Google Scholar 

  42. Wei Y, Tao L (2010) Efficient histogram-based sliding window. In: IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 3003–3010

  43. Willems G, Tuytelaars T, Gool LV (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: European conference on computer vision, pp 650–663. LNCS

    Google Scholar 

  44. Willems G, Tuytelaars T, Gool LV (2008) Spatio-temporal features for robust content-based video copy detection. In: ACM international conference on multimedia and information retrieval. ACM, pp 283–290

  45. Wu C (2007) Siftgpu:A gpu implementation of scale invariant feature transform. http://cs.unc.edu/ccwu/siftgpu

  46. Wu Z, Ke Q, Isard M, Sun J (2009) Bundling features for large scale partial-duplicate web image search. In: IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 25–32

  47. Zha ZJ, Wang M, Zheng YT, Yang Y, Hong R, Chua TS (2012) Interactive video indexing with statistical active learning. IEEE Trans Multimedia 14(1):17–27

    Article  Google Scholar 

  48. Zha ZJ, Yang L, Mei T, Wang M, Wang Z (2009) Visual query suggestion. In: the 17th ACM international conference on Multimedia. ACM, pp 15–24

  49. Zha ZJ, Yang L, Mei T, Wang M, Wang Z, Chua TS, Hua XS (2010) Visual query suggestion: Towards capturing user intent in internet image search. ACM Transactions on Multimedia Computing. Commun Appl 6(3):1–19

    Google Scholar 

  50. Zha ZJ, Zhang H, Wang M, Luan H, Chua TS (2013) Detecting group activities with multi-camera context. IEEE Trans Circ Syst Video Technol 23(5):856–869

    Article  Google Scholar 

  51. Zhang Z, Cao Y, Salvi D, Oliver K, Waggoner J, Wang S (2010) Free-shape subwindow search for object localization. In: IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 1086–1093

  52. Zhou T, Tao D (2013) Shifted subspaces tracking on sparse outlier for motion segmentation. In: International joint conference on artificial intelligence. ACM, pp 1946–1952

Download references

Acknowledgments

This work is supported by the Natural Science Foundation of China (61472110, 61202145, 61100104 and 61065007), the Program for New Century Excellent Talents in University (No. NECT-12-0323) and the Natural Science Foundation of Fujian Province of China (2012J01287 and 2014J01256).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Yu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hong, C., Zhu, J., Yu, J. et al. Realtime and robust object matching with a large number of templates. Multimed Tools Appl 75, 1459–1480 (2016). https://doi.org/10.1007/s11042-014-2305-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2305-7

Keywords

Navigation