Abstract
In this paper, we incorporate the attention module and the lambda layer into the existing object detection method, Faster R-CNN, to improve its detection accuracy. We propose three methods that incorporate mechanisms based on the attention module to capture the relationship between object candidate regions within an input frame, or a mechanism based on the lambda layer to improve the feature representation within each candidate region. We evaluated the performance of the proposed methods on BDD100K, which includes diverse scene types, weather conditions and times of the day. The results show that the detection accuracy of the proposed methods are improved compared to Faster R-CNN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural. Inf. Process. Syst. 28, 91–99 (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Beery, S., Wu, G., Rathod, V., Votel, R., Huang, J.: Context R-CNN: Long term temporal context for per-camera object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13075–13085 (2020)
Bello, I.: Lambdanetworks: modeling long-range interactions without attention. arXiv:2102.08602 (2021)
Yu, F., et al.: BDD100K:: a diverse driving video database with scalable annotation tooling, vol. 2, issue 5, p. 6. arXiv:1805.04687 (2018)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Acknowledgement
This work was supported by Research Institute for Science and Technology of Tokyo Denki University Grant Number Q20J-02.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ikeda, R., Hidaka, A. (2022). Improvement of On-Road Object Detection Using Inter-region and Intra-region Attention for Faster R-CNN. In: Sumi, K., Na, I.S., Kaneko, N. (eds) Frontiers of Computer Vision. IW-FCV 2022. Communications in Computer and Information Science, vol 1578. Springer, Cham. https://doi.org/10.1007/978-3-031-06381-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-06381-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06380-0
Online ISBN: 978-3-031-06381-7
eBook Packages: Computer ScienceComputer Science (R0)