Abstract
Object detection has great application value. It is frequently used in many fields such as industry, transportation, security, aerospace, etc. In the object detection task, the objects are inevitably occluded, blurred, etc. At the same time, the multi-object detection datasets inevitably have certain data bias and long-tailed distribution, which makes the model unable to extract complete visual features and seriously hinders the object detection. Therefore, we propose an object detection algorithm based on embedding internal and external knowledge (EIEK) into the detection network to enrich the feature representation and guide the model to get better detection performance. First, the basic object detection framework is built on Faster RCNN with Swin Transformer. And then, the internal and external knowledge embedding modules are presented to give the model spatial awareness and semantic understanding abilities. The experiments on multiple object detection datasets show the superiority of our EIEK and it achieves improvements of the mAP by 1.2%, 2.2%, and 3.1% on the PASCAL VOC 2007, MS COCO 2017, and ADE20k datasets, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xu, H., Jiang, C., Liang, X., et al.: Spatial-aware graph relation network for large-scale object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9298–9307 (2019)
Xu, H., Jiang, C.H., Liang, X., et al.: Reasoning-rcnn: unifying adaptive global reasoning into large-scale object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6419–6428 (2019)
Jiang, C., Xu, H., Liang, X., et al.: Hybrid knowledge routed modules for large-scale object detection. Adv. Neural Inf. Process. Syst. 31 (2018)
Ren, S., He, K., Girshick, R., et al.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
Duan, K., Bai, S., Xie, L., et al.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection (2020). arXiv preprint arXiv:2004.10934
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Beal, J., Kim, E., Tzeng, E., et al.: Toward transformer-based object detection (2020). arXiv preprint arXiv:2012.09958
Caron, M., Bojanowski, P., Joulin, A., et al.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 132–149 (2018)
Ahsan, U., Madhok, R., Essa, I.: Video jigsaw: unsupervised learning of spatiotemporal context for video action recognition. In: Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 179–189. IEEE (2019)
Xu, H., Fang, L., Liang, X., et al.: Universal-rcnn: universal object detector via transferable graph r-cnn. Proc. AAAI Conf. Artif. Intell. 34(07), 12492–12499 (2020)
Scarselli, F., Gori, M., Tsoi, A.C., et al.: The graph neural network model. IEEE Trans. Neural Networks 20(1), 61–80 (2008)
Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: Proceedings of NIPS, pp. 1993–2001 (2016)
Zhuang, C., Ma, Q.: Dual graph convolutional networks for graph based semi-supervised classification. In: WWW, pp. 499–508 (2018)
Monti, F., Boscaini, D., Masci, J., et al.: Geometric deep learning on graphs and manifolds using mixture model cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5115–5124 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Dai, Z., Cai, B., Lin, Y., et al.: Up-detr: unsupervised pre-training for object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1601–1610 (2021)
Zhu, X., Su, W., Lu, L., et al.: Deformable detr: deformable transformers for end-to-end object detection (2020). arXiv preprint arXiv:2010.04159
Tian, Z., Shen, C., Chen, H., et al.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Sun, Z., Cao, S., Yang, Y., et al.: Rethinking transformer-based set prediction for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3611–3620 (2021)
Sun, P., Zhang, R., Jiang, Y., et al.: Sparse r-cnn: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
Li, J., Cheng, B., Feris, R., et al.: Pseudo-IoU: improving label assignment in anchor-free object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2378–2387 (2021)
Cai, Z., Vasconcelos, N.,: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Ma, W., Tian, T., Xu, H., Huang, Y., Li, Z.: Aabo: adaptive anchor box optimization for object detection via bayesian sub-sampling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 560–575. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_33
Zhang, Y., Wu, X., Zhu, R.: Adaptive word embedding module for semantic reasoning in large-scale detection. In: Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), pp. 2103–2109. IEEE (2021)
Zhang, H., Fromont, E., Lefèvre, S., et al.: Localize to classify and classify to localize: mutual guidance in object detection. In: Proceedings of the Asian Conference on Computer Vision (2020)
Wu, S., Xu, Y., Zhang, B., et al.: Deformable template network (dtn) for object detection. IEEE Trans. Multimedia 24, 2058–2068 (2021)
Wang, K., Zhang, L.: Reconcile prediction consistency for balanced object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3631–3640 (2021)
Krishna, R., Zhu, Y., Groth, O., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)
Acknowledgments
This work is supported by National Key R&D Program of China (No. 2021YYF0900701) and National Natural Science Foundation of China (No. 61801441).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Q., Wu, X. (2022). Object Detection Based on Embedding Internal and External Knowledge. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13537. Springer, Cham. https://doi.org/10.1007/978-3-031-18916-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-18916-6_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18915-9
Online ISBN: 978-3-031-18916-6
eBook Packages: Computer ScienceComputer Science (R0)