Skip to main content

Object Detection Based on Embedding Internal and External Knowledge

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2022)

Abstract

Object detection has great application value. It is frequently used in many fields such as industry, transportation, security, aerospace, etc. In the object detection task, the objects are inevitably occluded, blurred, etc. At the same time, the multi-object detection datasets inevitably have certain data bias and long-tailed distribution, which makes the model unable to extract complete visual features and seriously hinders the object detection. Therefore, we propose an object detection algorithm based on embedding internal and external knowledge (EIEK) into the detection network to enrich the feature representation and guide the model to get better detection performance. First, the basic object detection framework is built on Faster RCNN with Swin Transformer. And then, the internal and external knowledge embedding modules are presented to give the model spatial awareness and semantic understanding abilities. The experiments on multiple object detection datasets show the superiority of our EIEK and it achieves improvements of the mAP by 1.2%, 2.2%, and 3.1% on the PASCAL VOC 2007, MS COCO 2017, and ADE20k datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Xu, H., Jiang, C., Liang, X., et al.: Spatial-aware graph relation network for large-scale object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9298–9307 (2019)

    Google Scholar 

  2. Xu, H., Jiang, C.H., Liang, X., et al.: Reasoning-rcnn: unifying adaptive global reasoning into large-scale object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6419–6428 (2019)

    Google Scholar 

  3. Jiang, C., Xu, H., Liang, X., et al.: Hybrid knowledge routed modules for large-scale object detection. Adv. Neural Inf. Process. Syst. 31 (2018)

    Google Scholar 

  4. Ren, S., He, K., Girshick, R., et al.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  5. Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  6. Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

    Google Scholar 

  7. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)

    Google Scholar 

  8. Duan, K., Bai, S., Xie, L., et al.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019)

    Google Scholar 

  9. Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  10. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection (2020). arXiv preprint arXiv:2004.10934

  11. Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  12. Beal, J., Kim, E., Tzeng, E., et al.: Toward transformer-based object detection (2020). arXiv preprint arXiv:2012.09958

  13. Caron, M., Bojanowski, P., Joulin, A., et al.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 132–149 (2018)

    Google Scholar 

  14. Ahsan, U., Madhok, R., Essa, I.: Video jigsaw: unsupervised learning of spatiotemporal context for video action recognition. In: Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 179–189. IEEE (2019)

    Google Scholar 

  15. Xu, H., Fang, L., Liang, X., et al.: Universal-rcnn: universal object detector via transferable graph r-cnn. Proc. AAAI Conf. Artif. Intell. 34(07), 12492–12499 (2020)

    Google Scholar 

  16. Scarselli, F., Gori, M., Tsoi, A.C., et al.: The graph neural network model. IEEE Trans. Neural Networks 20(1), 61–80 (2008)

    Article  Google Scholar 

  17. Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: Proceedings of NIPS, pp. 1993–2001 (2016)

    Google Scholar 

  18. Zhuang, C., Ma, Q.: Dual graph convolutional networks for graph based semi-supervised classification. In: WWW, pp. 499–508 (2018)

    Google Scholar 

  19. Monti, F., Boscaini, D., Masci, J., et al.: Geometric deep learning on graphs and manifolds using mixture model cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5115–5124 (2017)

    Google Scholar 

  20. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

    Google Scholar 

  21. Dai, Z., Cai, B., Lin, Y., et al.: Up-detr: unsupervised pre-training for object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1601–1610 (2021)

    Google Scholar 

  22. Zhu, X., Su, W., Lu, L., et al.: Deformable detr: deformable transformers for end-to-end object detection (2020). arXiv preprint arXiv:2010.04159

  23. Tian, Z., Shen, C., Chen, H., et al.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)

    Google Scholar 

  24. Sun, Z., Cao, S., Yang, Y., et al.: Rethinking transformer-based set prediction for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3611–3620 (2021)

    Google Scholar 

  25. Sun, P., Zhang, R., Jiang, Y., et al.: Sparse r-cnn: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)

    Google Scholar 

  26. Li, J., Cheng, B., Feris, R., et al.: Pseudo-IoU: improving label assignment in anchor-free object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2378–2387 (2021)

    Google Scholar 

  27. Cai, Z., Vasconcelos, N.,: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)

    Google Scholar 

  28. Ma, W., Tian, T., Xu, H., Huang, Y., Li, Z.: Aabo: adaptive anchor box optimization for object detection via bayesian sub-sampling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 560–575. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_33

    Chapter  Google Scholar 

  29. Zhang, Y., Wu, X., Zhu, R.: Adaptive word embedding module for semantic reasoning in large-scale detection. In: Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), pp. 2103–2109. IEEE (2021)

    Google Scholar 

  30. Zhang, H., Fromont, E., Lefèvre, S., et al.: Localize to classify and classify to localize: mutual guidance in object detection. In: Proceedings of the Asian Conference on Computer Vision (2020)

    Google Scholar 

  31. Wu, S., Xu, Y., Zhang, B., et al.: Deformable template network (dtn) for object detection. IEEE Trans. Multimedia 24, 2058–2068 (2021)

    Article  Google Scholar 

  32. Wang, K., Zhang, L.: Reconcile prediction consistency for balanced object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3631–3640 (2021)

    Google Scholar 

  33. Krishna, R., Zhu, Y., Groth, O., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is supported by National Key R&D Program of China (No. 2021YYF0900701) and National Natural Science Foundation of China (No. 61801441).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyu Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Q., Wu, X. (2022). Object Detection Based on Embedding Internal and External Knowledge. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13537. Springer, Cham. https://doi.org/10.1007/978-3-031-18916-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-18916-6_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18915-9

  • Online ISBN: 978-3-031-18916-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics