Abstract
Recently, keypoint-based methods have received more attention on planar object tracking due to their abilities to deal with partial noises, such as occlusion and out-of-view. However, robust tracking is still a tricky problem in the case of fast movement, large transformation and motion blur. The key reason is that there are not enough matching inliers to reconstruct the homography in the presence of such perturbations. To this end, we propose a novel centroid-based graph matching networks (CGN), which consists of two components: centroid localization network (CLN) and graph matching network (GMN). In detail, the CLN reduces the search range of the tracker from the entire image to the target region by locating the centroid of the target. The CLN gives the initial guess of the position, which guarantees the proportion of inliers matching the template. Then, the keypoints in the template and the target region are modeled as two graphs connected by cross-edges, and their correspondences are established by the GMN. The GMN overcomes the impact of large transformation by exploiting the stability of the graph structure. Finally, the transformation from the template to the current frame is estimated from the matched keypoint pairs by the RANSAC algorithm. In addition, the number of labeled points in previous datasets for training matching models is too small to cope with complex transformations, so we synthesize a large-scale dataset with labels to train the GMN. Experimental results on POT-210, POIC and TMT datasets show that our proposed method outperforms the state-of-the-art baseline methods in general, with significant improvements on fast movement and motion blur.
Similar content being viewed by others
References
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. Comput. Vis. Pattern Recognit. (2018). https://doi.org/10.15607/RSS.2018.XIV.019
Saputra, M.R.U., Markham, A., Trigoni, N.: Visual SLAM and structure from motion in dynamic environments: a survey. ACM Comput. Surv. 51(2), 37–13736 (2018). https://doi.org/10.1145/3177853
Baker, S., Matthews, I.A.: Lucas-kanade 20 years on: A unifying framework. Int. J. Comput. Vis. 56(3), 221–255 (2004). https://doi.org/10.1023/B:VISI.0000011205.11775.fd
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Int. Joint Conf. Artif. Intell., pp. 674– 679 ( 1981)
Richa, R., Sznitman, R., Taylor, R., Hager, G.: Visual tracking using the sum of conditional variance. In: IEEE Int. Conf. Intell. Rob. Syst., pp. 2953– 2958 ( 2011). https://doi.org/10.1109/IROS.2011.6094650
Malis, E.: Improving vision-based control using efficient second-order minimization techniques. In: IEEE Int. Conf. Robot. Autom 2, 1843–1848 (2004). https://doi.org/10.1109/ROBOT.2004.1308092
Chen, L., Zhou, F., Shen, Y., Tian, X., Ling, H., Chen, Y.: Illumination insensitive efficient second-order minimization for planar object tracking. In: Proc. IEEE Int. Conf. Robot. Autom., pp. 4429– 4436 ( 2017). https://doi.org/10.1109/ICRA.2017.7989512
Chen, L., Ling, H., Shen, Y., Zhou, F., Wang, P., Tian, X., Chen, Y.: Robust visual tracking for planar objects using gradient orientation pyramid. J. Electron. Imaging 28(1), 013007 (2019). https://doi.org/10.1117/1.JEI.28.1.013007
Wang, T., Ling, H., Lang, C., Feng, S., Jin, Y., Li, Y.: Constrained confidence matching for planar object tracking. In: Proc. IEEE Int. Conf. Robot. Autom., pp. 659– 666 ( 2018). https://doi.org/10.1109/ICRA.2018.8460680
Yi, S., Liu, W.: Multiscale salient region-based visual tracking. Mach. Vis. Appl. 28(3–4), 327–339 (2017). https://doi.org/10.1007/s00138-017-0836-4
Liu, S., Xu, X., Zhang, Y., Muhammad, K., Fu, W.: A reliable sample selection strategy for weakly supervised visual tracking. IEEE Trans. Reliab. (2022). https://doi.org/10.1109/TR.2022.3162346
Liu, S., Wang, S., Liu, X., Lin, C., Lv, Z.: Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans. Fuzzy Syst. 29(1), 90–102 (2021). https://doi.org/10.1109/TFUZZ.2020.3006520
Liu, S., Wang, S., Liu, X., Dai, J., Muhammad, K., Gandomi, A.H., Ding, W., Hijji, M., de Albuquerque, V.H.C.: Human inertial thinking strategy: a novel fuzzy reasoning mechanism for IOT-assisted visual monitoring. IEEE Internet Things J. (2022). https://doi.org/10.1109/JIOT.2022.3142115
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Bay, H., Tuytelaars, T., Gool, L.V.: Surf: Speeded up robust features. In: European Conference on Computer Vision, pp. 404– 417 ( 2006). https://doi.org/10.1007/11744023_32
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. In: Proc. Eur. Conf. Comput. Vis., pp. 778– 792 ( 2010). https://doi.org/10.1007/978-3-642-15561-1_56
Babenko, B., Yang, M.-H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2010). https://doi.org/10.1109/TPAMI.2010.226
Grabner, H., Bischof, H.: On-line boosting and vision. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog 1, 260–267 (2006). https://doi.org/10.1109/CVPR.2006.215
Hare, S., Saffari, A., Torr, P.H.S.: Efficient online structured output learning for keypoint-based object tracking. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 1894– 1901 ( 2012). https://doi.org/10.1109/CVPR.2012.6247889
Wang, T., Ling, H.: Gracker: A graph-based planar object tracker. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1494–1501 (2018). https://doi.org/10.1109/TPAMI.2017.2716350
Yunus, R., Li, Y., Tombari, F.: Manhattanslam: Robust planar tracking and mapping leveraging mixture of manhattan frames. In: Proc. IEEE Int. Conf. Robot. Autom., pp. 6687– 6693 ( 2021). https://doi.org/10.1109/ICRA48506.2021.9562030
DeTone, D., Malisiewicz, T., Rabinovich, A.: Deep image homography estimation. arXiv preprint arXiv:1606.03798 (2016)
Japkowicz, N., Nowruzi, F.E., Laganière, R.: Homography estimation from image pairs with hierarchical convolutional networks. In: Proc. IEEE Int. Conf. Comput. Vis. Workshop, pp. 904– 911 ( 2017). https://doi.org/10.1109/ICCVW.2017.111
Wang, X., Wang, C., Bai, X., Liu, Y., Zhou, J.: Deep homography estimation with pairwise invertibility constraint. In: Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), 11004, 204–214 (2018). https://doi.org/10.1007/978-3-319-97785-0_20
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. Workshops, pp. 224– 236 ( 2018). https://doi.org/10.1109/CVPRW.2018.00060
Liu, Y., Shen, Z., Lin, Z., Peng, S., Bao, H., Zhou, X.: GIFT: learning transformation-invariant dense visual descriptors via group cnns. In: Proc. Adv. Neural Inf. Process. Syst., pp. 6990– 7001 ( 2019)
Pautrat, R., Larsson, V., Oswald, M.R., Pollefeys, M.: Online invariance selection for local feature descriptors. In: Proc. Eur. Conf. Comput. Vis., pp. 707– 724 ( 2020). https://doi.org/10.1007/978-3-030-58536-5_42
Sarlin, P., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: Learning feature matching with graph neural networks. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 4938– 4947 ( 2020). https://doi.org/10.1109/CVPR42600.2020.00499
Zhan, X., Liu, Y., Zhu, J., Li, Y.: Homography decomposition networks for planar object tracking. In: AAAI Conf. Artif. Intell., pp. 3234– 3242 ( 2022)
Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: Proc. IEEE Int. Conf. Comput. Vis., pp. 10448– 10457 ( 2021).https://doi.org/10.1109/ICCV48922.2021.01028
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Proc. Eur. Conf. Comput. Vis., pp. 740– 755 ( 2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liang, P., Wu, Y., Lu, H., Wang, L., Liao, C., Ling, H.: Planar object tracking in the wild: A benchmark. In: Proc. IEEE Int. Conf. Robot. Autom., pp. 651– 658 ( 2018). https://doi.org/10.1109/ICRA.2018.8461037
Roy, A., Zhang, X., Wolleb, N., Quintero, C.P., Jägersand, M.: Tracking benchmark and evaluation for manipulation tasks. In: Proc. IEEE Int. Conf. Robot. Autom., pp. 2448– 2453 ( 2015). https://doi.org/10.1109/ICRA.2015.7139526
Liang, P., Ji, H., Wu, Y., Chai, Y., Wang, L., Liao, C., Ling, H.: Planar object tracking benchmark in the wild. Neurocomputing 454, 254–267 (2021). https://doi.org/10.1016/j.neucom.2021.05.030
Sarlin, P., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: Robust hierarchical localization at large scale. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 12716– 12725 ( 2019). https://doi.org/10.1109/CVPR.2019.01300
Liu, D., Cui, Y., Yan, L., Mousas, C., Yang, B., Chen, Y.V.: Densernet: Weakly supervised visual localization using multi-scale feature aggregation. In: AAAI Conf. Artif. Intell., pp. 6101– 6109 ( 2021)
Yan, L., Cui, Y., Chen, Y.V., Liu, D.: Hierarchical attention fusion for geo-localization. In: IEEE Int. Conf. Acoust. Speech Signal Process., pp. 2220– 2224 ( 2021). https://doi.org/10.1109/ICASSP39728.2021.9414517
Zhu, S., Shah, M., Chen, C.: Transgeo: Transformer is all you need for cross-view image geo-localization. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 1152– 1161 ( 2022). https://doi.org/10.1109/CVPR52688.2022.00123
Li, X., Han, K., Li, S., Prisacariu, V.: Dual-resolution correspondence networks. In: Adv. Neural Inf. Process. Syst. ( 2020)
Rocco, I., Arandjelovic, R., Sivic, J.: Efficient neighbourhood consensus networks via submanifold sparse convolutions. In: Eur. Conf. Comput. Vis 12354, 605–621 (2020). https://doi.org/10.1007/978-3-030-58545-7_35
Jiang, B., Sun, P., Luo, B.: Glmnet: graph learning-matching convolutional networks for feature matching. Pattern Recognit. 121, 108167 (2022). https://doi.org/10.1016/j.patcog.2021.108167
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: Loftr: Detector-free local feature matching with transformers. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 8922– 8931 ( 2021). https://doi.org/10.1109/CVPR46437.2021.00881
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 770– 778 ( 2016). https://doi.org/10.1109/CVPR.2016.90
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Proc. Eur. Conf. Comput. Vis 12346, 213–229 (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: Keypoint triplets for object detection. In: Proc. IEEE Int. Conf. Comput. Vis., pp. 6568– 6577 ( 2019). https://doi.org/10.1109/ICCV.2019.00667
Lee, D.-T., Schachter, B.J.: Two algorithms for constructing a delaunay triangulation. Int. J. Comput. Inf. Sci. 9(3), 219–242 (1980). https://doi.org/10.1007/BF00977785
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: Proc. Int. Conf. Mach. Learn., pp. 1263– 1272 ( 2017)
Huang, L., Zhao, X., Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562–1577 (2021). https://doi.org/10.1109/TPAMI.2019.2957464
Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: A high-quality benchmark for large-scale single object tracking. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 5374– 5383 ( 2019). https://doi.org/10.1109/CVPR.2019.00552
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: Proc. Int. Conf. Learn. Represent. ( 2019)
Ozuysal, M., Fua, P., Lepetit, V.: Fast keypoint recognition in ten lines of code. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 1– 8 ( 2007). https://doi.org/10.1109/CVPR.2007.383123
Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 8126– 8135 ( 2021)
Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: Object-aware anchor-free tracking. In: Proc. Eur. Conf. Comput. Vis., pp. 771– 787 ( 2020). https://doi.org/10.1007/978-3-030-58589-1_46
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, K., Liu, H. & Wang, T. Centroid-based graph matching networks for planar object tracking. Machine Vision and Applications 34, 31 (2023). https://doi.org/10.1007/s00138-023-01382-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01382-6