Object Detection Based on Embedding Internal and External Knowledge

Liu, Qian; Wu, Xiaoyu

doi:10.1007/978-3-031-18916-6_29

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13537))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

1463 Accesses

Abstract

Object detection has great application value. It is frequently used in many fields such as industry, transportation, security, aerospace, etc. In the object detection task, the objects are inevitably occluded, blurred, etc. At the same time, the multi-object detection datasets inevitably have certain data bias and long-tailed distribution, which makes the model unable to extract complete visual features and seriously hinders the object detection. Therefore, we propose an object detection algorithm based on embedding internal and external knowledge (EIEK) into the detection network to enrich the feature representation and guide the model to get better detection performance. First, the basic object detection framework is built on Faster RCNN with Swin Transformer. And then, the internal and external knowledge embedding modules are presented to give the model spatial awareness and semantic understanding abilities. The experiments on multiple object detection datasets show the superiority of our EIEK and it achieves improvements of the mAP by 1.2%, 2.2%, and 3.1% on the PASCAL VOC 2007, MS COCO 2017, and ADE20k datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Xu, H., Jiang, C., Liang, X., et al.: Spatial-aware graph relation network for large-scale object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9298–9307 (2019)
Google Scholar
Xu, H., Jiang, C.H., Liang, X., et al.: Reasoning-rcnn: unifying adaptive global reasoning into large-scale object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6419–6428 (2019)
Google Scholar
Jiang, C., Xu, H., Liang, X., et al.: Hybrid knowledge routed modules for large-scale object detection. Adv. Neural Inf. Process. Syst. 31 (2018)
Google Scholar
Ren, S., He, K., Girshick, R., et al.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
Google Scholar
Duan, K., Bai, S., Xie, L., et al.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection (2020). arXiv preprint arXiv:2004.10934
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Beal, J., Kim, E., Tzeng, E., et al.: Toward transformer-based object detection (2020). arXiv preprint arXiv:2012.09958
Caron, M., Bojanowski, P., Joulin, A., et al.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 132–149 (2018)
Google Scholar
Ahsan, U., Madhok, R., Essa, I.: Video jigsaw: unsupervised learning of spatiotemporal context for video action recognition. In: Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 179–189. IEEE (2019)
Google Scholar
Xu, H., Fang, L., Liang, X., et al.: Universal-rcnn: universal object detector via transferable graph r-cnn. Proc. AAAI Conf. Artif. Intell. 34(07), 12492–12499 (2020)
Google Scholar
Scarselli, F., Gori, M., Tsoi, A.C., et al.: The graph neural network model. IEEE Trans. Neural Networks 20(1), 61–80 (2008)
Article Google Scholar
Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: Proceedings of NIPS, pp. 1993–2001 (2016)
Google Scholar
Zhuang, C., Ma, Q.: Dual graph convolutional networks for graph based semi-supervised classification. In: WWW, pp. 499–508 (2018)
Google Scholar
Monti, F., Boscaini, D., Masci, J., et al.: Geometric deep learning on graphs and manifolds using mixture model cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5115–5124 (2017)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Dai, Z., Cai, B., Lin, Y., et al.: Up-detr: unsupervised pre-training for object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1601–1610 (2021)
Google Scholar
Zhu, X., Su, W., Lu, L., et al.: Deformable detr: deformable transformers for end-to-end object detection (2020). arXiv preprint arXiv:2010.04159
Tian, Z., Shen, C., Chen, H., et al.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Google Scholar
Sun, Z., Cao, S., Yang, Y., et al.: Rethinking transformer-based set prediction for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3611–3620 (2021)
Google Scholar
Sun, P., Zhang, R., Jiang, Y., et al.: Sparse r-cnn: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
Google Scholar
Li, J., Cheng, B., Feris, R., et al.: Pseudo-IoU: improving label assignment in anchor-free object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2378–2387 (2021)
Google Scholar
Cai, Z., Vasconcelos, N.,: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Google Scholar
Ma, W., Tian, T., Xu, H., Huang, Y., Li, Z.: Aabo: adaptive anchor box optimization for object detection via bayesian sub-sampling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 560–575. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_33
Chapter Google Scholar
Zhang, Y., Wu, X., Zhu, R.: Adaptive word embedding module for semantic reasoning in large-scale detection. In: Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), pp. 2103–2109. IEEE (2021)
Google Scholar
Zhang, H., Fromont, E., Lefèvre, S., et al.: Localize to classify and classify to localize: mutual guidance in object detection. In: Proceedings of the Asian Conference on Computer Vision (2020)
Google Scholar
Wu, S., Xu, Y., Zhang, B., et al.: Deformable template network (dtn) for object detection. IEEE Trans. Multimedia 24, 2058–2068 (2021)
Article Google Scholar
Wang, K., Zhang, L.: Reconcile prediction consistency for balanced object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3631–3640 (2021)
Google Scholar
Krishna, R., Zhu, Y., Groth, O., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work is supported by National Key R&D Program of China (No. 2021YYF0900701) and National Natural Science Foundation of China (No. 61801441).

Author information

Authors and Affiliations

Communication University of China, Beijing, China
Qian Liu & Xiaoyu Wu

Authors

Qian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoyu Wu .

Editor information

Editors and Affiliations

Southern University of Science and Technology, Shenzhen, China
Shiqi Yu
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhaoxiang Zhang
Hong Kong Baptist University, Hong Kong, China
Pong C. Yuen
Northwestern Polytechnical University, Xi'an, China
Junwei Han
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hong Kong Baptist University, Hong Kong, China
Yike Guo
Sun Yat-sen University, Guangzhou, China
Jianhuang Lai
Southern University of Science and Technology, Shenzhen, China
Jianguo Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Q., Wu, X. (2022). Object Detection Based on Embedding Internal and External Knowledge. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13537. Springer, Cham. https://doi.org/10.1007/978-3-031-18916-6_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-18916-6_29
Published: 27 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18915-9
Online ISBN: 978-3-031-18916-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics