Skip to main content
Log in

Centroid-based graph matching networks for planar object tracking

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Recently, keypoint-based methods have received more attention on planar object tracking due to their abilities to deal with partial noises, such as occlusion and out-of-view. However, robust tracking is still a tricky problem in the case of fast movement, large transformation and motion blur. The key reason is that there are not enough matching inliers to reconstruct the homography in the presence of such perturbations. To this end, we propose a novel centroid-based graph matching networks (CGN), which consists of two components: centroid localization network (CLN) and graph matching network (GMN). In detail, the CLN reduces the search range of the tracker from the entire image to the target region by locating the centroid of the target. The CLN gives the initial guess of the position, which guarantees the proportion of inliers matching the template. Then, the keypoints in the template and the target region are modeled as two graphs connected by cross-edges, and their correspondences are established by the GMN. The GMN overcomes the impact of large transformation by exploiting the stability of the graph structure. Finally, the transformation from the template to the current frame is estimated from the matched keypoint pairs by the RANSAC algorithm. In addition, the number of labeled points in previous datasets for training matching models is too small to cope with complex transformations, so we synthesize a large-scale dataset with labels to train the GMN. Experimental results on POT-210, POIC and TMT datasets show that our proposed method outperforms the state-of-the-art baseline methods in general, with significant improvements on fast movement and motion blur.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.15607/RSS.2018.XIV.019

    Article  Google Scholar 

  2. Saputra, M.R.U., Markham, A., Trigoni, N.: Visual SLAM and structure from motion in dynamic environments: a survey. ACM Comput. Surv. 51(2), 37–13736 (2018). https://doi.org/10.1145/3177853

    Article  Google Scholar 

  3. Baker, S., Matthews, I.A.: Lucas-kanade 20 years on: A unifying framework. Int. J. Comput. Vis. 56(3), 221–255 (2004). https://doi.org/10.1023/B:VISI.0000011205.11775.fd

    Article  MATH  Google Scholar 

  4. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Int. Joint Conf. Artif. Intell., pp. 674– 679 ( 1981)

  5. Richa, R., Sznitman, R., Taylor, R., Hager, G.: Visual tracking using the sum of conditional variance. In: IEEE Int. Conf. Intell. Rob. Syst., pp. 2953– 2958 ( 2011). https://doi.org/10.1109/IROS.2011.6094650

  6. Malis, E.: Improving vision-based control using efficient second-order minimization techniques. In: IEEE Int. Conf. Robot. Autom 2, 1843–1848 (2004). https://doi.org/10.1109/ROBOT.2004.1308092

    Article  Google Scholar 

  7. Chen, L., Zhou, F., Shen, Y., Tian, X., Ling, H., Chen, Y.: Illumination insensitive efficient second-order minimization for planar object tracking. In: Proc. IEEE Int. Conf. Robot. Autom., pp. 4429– 4436 ( 2017). https://doi.org/10.1109/ICRA.2017.7989512

  8. Chen, L., Ling, H., Shen, Y., Zhou, F., Wang, P., Tian, X., Chen, Y.: Robust visual tracking for planar objects using gradient orientation pyramid. J. Electron. Imaging 28(1), 013007 (2019). https://doi.org/10.1117/1.JEI.28.1.013007

    Article  Google Scholar 

  9. Wang, T., Ling, H., Lang, C., Feng, S., Jin, Y., Li, Y.: Constrained confidence matching for planar object tracking. In: Proc. IEEE Int. Conf. Robot. Autom., pp. 659– 666 ( 2018). https://doi.org/10.1109/ICRA.2018.8460680

  10. Yi, S., Liu, W.: Multiscale salient region-based visual tracking. Mach. Vis. Appl. 28(3–4), 327–339 (2017). https://doi.org/10.1007/s00138-017-0836-4

    Article  Google Scholar 

  11. Liu, S., Xu, X., Zhang, Y., Muhammad, K., Fu, W.: A reliable sample selection strategy for weakly supervised visual tracking. IEEE Trans. Reliab. (2022). https://doi.org/10.1109/TR.2022.3162346

    Article  Google Scholar 

  12. Liu, S., Wang, S., Liu, X., Lin, C., Lv, Z.: Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans. Fuzzy Syst. 29(1), 90–102 (2021). https://doi.org/10.1109/TFUZZ.2020.3006520

    Article  Google Scholar 

  13. Liu, S., Wang, S., Liu, X., Dai, J., Muhammad, K., Gandomi, A.H., Ding, W., Hijji, M., de Albuquerque, V.H.C.: Human inertial thinking strategy: a novel fuzzy reasoning mechanism for IOT-assisted visual monitoring. IEEE Internet Things J. (2022). https://doi.org/10.1109/JIOT.2022.3142115

    Article  Google Scholar 

  14. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  15. Bay, H., Tuytelaars, T., Gool, L.V.: Surf: Speeded up robust features. In: European Conference on Computer Vision, pp. 404– 417 ( 2006). https://doi.org/10.1007/11744023_32

  16. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. In: Proc. Eur. Conf. Comput. Vis., pp. 778– 792 ( 2010). https://doi.org/10.1007/978-3-642-15561-1_56

  17. Babenko, B., Yang, M.-H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2010). https://doi.org/10.1109/TPAMI.2010.226

    Article  Google Scholar 

  18. Grabner, H., Bischof, H.: On-line boosting and vision. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog 1, 260–267 (2006). https://doi.org/10.1109/CVPR.2006.215

  19. Hare, S., Saffari, A., Torr, P.H.S.: Efficient online structured output learning for keypoint-based object tracking. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 1894– 1901 ( 2012). https://doi.org/10.1109/CVPR.2012.6247889

  20. Wang, T., Ling, H.: Gracker: A graph-based planar object tracker. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1494–1501 (2018). https://doi.org/10.1109/TPAMI.2017.2716350

    Article  Google Scholar 

  21. Yunus, R., Li, Y., Tombari, F.: Manhattanslam: Robust planar tracking and mapping leveraging mixture of manhattan frames. In: Proc. IEEE Int. Conf. Robot. Autom., pp. 6687– 6693 ( 2021). https://doi.org/10.1109/ICRA48506.2021.9562030

  22. DeTone, D., Malisiewicz, T., Rabinovich, A.: Deep image homography estimation. arXiv preprint arXiv:1606.03798 (2016)

  23. Japkowicz, N., Nowruzi, F.E., Laganière, R.: Homography estimation from image pairs with hierarchical convolutional networks. In: Proc. IEEE Int. Conf. Comput. Vis. Workshop, pp. 904– 911 ( 2017). https://doi.org/10.1109/ICCVW.2017.111

  24. Wang, X., Wang, C., Bai, X., Liu, Y., Zhou, J.: Deep homography estimation with pairwise invertibility constraint. In: Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), 11004, 204–214 (2018). https://doi.org/10.1007/978-3-319-97785-0_20

  25. DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. Workshops, pp. 224– 236 ( 2018). https://doi.org/10.1109/CVPRW.2018.00060

  26. Liu, Y., Shen, Z., Lin, Z., Peng, S., Bao, H., Zhou, X.: GIFT: learning transformation-invariant dense visual descriptors via group cnns. In: Proc. Adv. Neural Inf. Process. Syst., pp. 6990– 7001 ( 2019)

  27. Pautrat, R., Larsson, V., Oswald, M.R., Pollefeys, M.: Online invariance selection for local feature descriptors. In: Proc. Eur. Conf. Comput. Vis., pp. 707– 724 ( 2020). https://doi.org/10.1007/978-3-030-58536-5_42

  28. Sarlin, P., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: Learning feature matching with graph neural networks. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 4938– 4947 ( 2020). https://doi.org/10.1109/CVPR42600.2020.00499

  29. Zhan, X., Liu, Y., Zhu, J., Li, Y.: Homography decomposition networks for planar object tracking. In: AAAI Conf. Artif. Intell., pp. 3234– 3242 ( 2022)

  30. Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: Proc. IEEE Int. Conf. Comput. Vis., pp. 10448– 10457 ( 2021).https://doi.org/10.1109/ICCV48922.2021.01028

  31. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692

    Article  MathSciNet  Google Scholar 

  32. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Proc. Eur. Conf. Comput. Vis., pp. 740– 755 ( 2014). https://doi.org/10.1007/978-3-319-10602-1_48

  33. Liang, P., Wu, Y., Lu, H., Wang, L., Liao, C., Ling, H.: Planar object tracking in the wild: A benchmark. In: Proc. IEEE Int. Conf. Robot. Autom., pp. 651– 658 ( 2018). https://doi.org/10.1109/ICRA.2018.8461037

  34. Roy, A., Zhang, X., Wolleb, N., Quintero, C.P., Jägersand, M.: Tracking benchmark and evaluation for manipulation tasks. In: Proc. IEEE Int. Conf. Robot. Autom., pp. 2448– 2453 ( 2015). https://doi.org/10.1109/ICRA.2015.7139526

  35. Liang, P., Ji, H., Wu, Y., Chai, Y., Wang, L., Liao, C., Ling, H.: Planar object tracking benchmark in the wild. Neurocomputing 454, 254–267 (2021). https://doi.org/10.1016/j.neucom.2021.05.030

    Article  Google Scholar 

  36. Sarlin, P., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: Robust hierarchical localization at large scale. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 12716– 12725 ( 2019). https://doi.org/10.1109/CVPR.2019.01300

  37. Liu, D., Cui, Y., Yan, L., Mousas, C., Yang, B., Chen, Y.V.: Densernet: Weakly supervised visual localization using multi-scale feature aggregation. In: AAAI Conf. Artif. Intell., pp. 6101– 6109 ( 2021)

  38. Yan, L., Cui, Y., Chen, Y.V., Liu, D.: Hierarchical attention fusion for geo-localization. In: IEEE Int. Conf. Acoust. Speech Signal Process., pp. 2220– 2224 ( 2021). https://doi.org/10.1109/ICASSP39728.2021.9414517

  39. Zhu, S., Shah, M., Chen, C.: Transgeo: Transformer is all you need for cross-view image geo-localization. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 1152– 1161 ( 2022). https://doi.org/10.1109/CVPR52688.2022.00123

  40. Li, X., Han, K., Li, S., Prisacariu, V.: Dual-resolution correspondence networks. In: Adv. Neural Inf. Process. Syst. ( 2020)

  41. Rocco, I., Arandjelovic, R., Sivic, J.: Efficient neighbourhood consensus networks via submanifold sparse convolutions. In: Eur. Conf. Comput. Vis 12354, 605–621 (2020). https://doi.org/10.1007/978-3-030-58545-7_35

    Article  Google Scholar 

  42. Jiang, B., Sun, P., Luo, B.: Glmnet: graph learning-matching convolutional networks for feature matching. Pattern Recognit. 121, 108167 (2022). https://doi.org/10.1016/j.patcog.2021.108167

    Article  Google Scholar 

  43. Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: Loftr: Detector-free local feature matching with transformers. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 8922– 8931 ( 2021). https://doi.org/10.1109/CVPR46437.2021.00881

  44. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 770– 778 ( 2016). https://doi.org/10.1109/CVPR.2016.90

  45. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Proc. Eur. Conf. Comput. Vis 12346, 213–229 (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Article  Google Scholar 

  46. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: Keypoint triplets for object detection. In: Proc. IEEE Int. Conf. Comput. Vis., pp. 6568– 6577 ( 2019). https://doi.org/10.1109/ICCV.2019.00667

  47. Lee, D.-T., Schachter, B.J.: Two algorithms for constructing a delaunay triangulation. Int. J. Comput. Inf. Sci. 9(3), 219–242 (1980). https://doi.org/10.1007/BF00977785

    Article  MathSciNet  MATH  Google Scholar 

  48. Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: Proc. Int. Conf. Mach. Learn., pp. 1263– 1272 ( 2017)

  49. Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562–1577 (2021). https://doi.org/10.1109/TPAMI.2019.2957464

    Article  Google Scholar 

  50. Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: A high-quality benchmark for large-scale single object tracking. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 5374– 5383 ( 2019). https://doi.org/10.1109/CVPR.2019.00552

  51. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: Proc. Int. Conf. Learn. Represent. ( 2019)

  52. Ozuysal, M., Fua, P., Lepetit, V.: Fast keypoint recognition in ten lines of code. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 1– 8 ( 2007). https://doi.org/10.1109/CVPR.2007.383123

  53. Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 8126– 8135 ( 2021)

  54. Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: Object-aware anchor-free tracking. In: Proc. Eur. Conf. Comput. Vis., pp. 771– 787 ( 2020). https://doi.org/10.1007/978-3-030-58589-1_46

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, K., Liu, H. & Wang, T. Centroid-based graph matching networks for planar object tracking. Machine Vision and Applications 34, 31 (2023). https://doi.org/10.1007/s00138-023-01382-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01382-6

Keywords

Navigation