ABSTRACT
Detecting densely arranged objects is challenging due to the lack of generic definitions and the feature coupling between nearby objects. This paper proposes mathematical definitions of the instance-level, image-level, and dataset-level object density by information theory, called Density Index (DI). The DI shows a high consistency with human perception, serving as a powerful guide for aerial object detection, including data assessment and detector customization. Under the guidance of the DI, we design a DeDet to enhance the detector's performance in detecting densely arranged objects. DeDet pursues accurate location for densely arranged objects by the Density-aware Label Assignment (DLA) and Density-aware Feature Extraction (DFE), conquering the heuristic that the sample assignment and feature extraction are performed independently for each object. Experiments on the DOTA-v1.0 and DOTA-v2.0 show that DeDet can bring a significant improvement to the baseline detector.
- S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang, and J. Sun, “Crowdhuman: A benchmark for detecting human in a crowd,” arXiv preprint arXiv:1805.00123, 2018.Google Scholar
- E. Goldman, R. Herzig, A. Eisenschtat, J. Goldberger, and T. Hassner, “Precise detection in densely packed scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5227–5236.Google ScholarCross Ref
- G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang, “DOTA: A large-scale dataset for object detection in aerial images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 3974–3983.Google ScholarCross Ref
- J. Ding, N. Xue, G.-S. Xia, X. Bai, W. Yang, M. Y. Yang, S. Belongie, J. Luo, M. Datcu, M. Pelillo , “Object detection in aerial images: A large-scale benchmark and challenges,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7778–7796, 2021.Google ScholarCross Ref
- S. Liu, D. Huang, and Y. Wang, “Adaptive nms: Refining pedestrian detection in a crowd,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 6459–6468.Google ScholarCross Ref
- X. Yang, J. Yang, J. Yan, Y. Zhang, T. Zhang, Z. Guo, X. Sun, and K. Fu, “SCRDet: Towards more robust detection for small, cluttered and rotated objects,” in IEEE International Conference on Computer Vision, 2019, pp. 8232–8241.Google ScholarCross Ref
- X. Pan, Y. Ren, K. Sheng, W. Dong, H. Yuan, X. Guo, C. Ma, and C. Xu, “Dynamic refinement network for oriented and densely packed object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11207–11216.Google ScholarCross Ref
- Z. Guo, C. Liu, X. Zhang, J. Jiao, X. Ji, and Q. Ye, “Beyond bounding- box: Convex-hull feature adaptation for oriented and densely packed object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8792–8801.Google ScholarCross Ref
- D. M. Endres and J. E. Schindelin, “A new metric for probability distributions,” IEEE Transactions on Information Theory (TIT), vol. 49, no. 7, pp. 1858–1860, 2003.Google ScholarDigital Library
- F. Nielsen, “On a generalization of the jensen–shannon divergence and the jensen–shannon centroid,” Entropy, vol. 22, no. 2, p. 221, 2020.Google ScholarCross Ref
- Y. Liu, L. Geng, W. Zhang, Y. Gong, and Z. Xu, “Survey of video based small target detection,” Journal of Image and Graphics, vol. 9, no. 4, pp. 122–134, 2021.Google ScholarCross Ref
- E. Lo, “Target detection algorithms in hyperspectral imaging based on discriminant analysis,” Journal of Image and Graphics, vol. 7, no. 4, pp. 140–144, 2019.Google ScholarCross Ref
- F. Utaminingrum and R. P. Prasetya, “Combining multiple feature for robust traffic sign detection,” Journal of Image and Graphics, vol. 8, no. 2, pp. 53–58, 2020.Google ScholarCross Ref
- R. Khan, T. F. Raisa, and R. Debnath, “An efficient contour based fine-grained algorithm for multi category object detection,” Journal of Image and Graphics, vol. 6, no. 2, pp. 127–136, 2018.Google ScholarCross Ref
- J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue. “Arbitrary-oriented scene text detection via rotation proposals,” IEEE Transactions on Multimedia, 20(11):3111–3122, 2018.Google ScholarDigital Library
- J. Ding, N. Xue, Y. Long, G.-S. Xia, and Q. Lu, “Learning roi transformer for detecting oriented objects in aerial images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2849–2858.Google Scholar
- X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, “Oriented r-cnn for object detection,” in IEEE International Conference on Computer Vision, 2021, pp. 3520-3529.Google ScholarCross Ref
- Y. Xu, M. Fu, Q. Wang, Y. Wang, K. Chen, G.-S. Xia, and X. Bai, “Gliding vertex on the horizontal bounding box for multi-oriented object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.Google Scholar
- J. Wang, J. Ding, H. Guo, W. Cheng, T. Pan, and W. Yang, “Mask OBB: A semantic attention-based mask oriented bounding box representation for multi-category object detection in aerial images,” Remote Sensing, vol. 11, no. 24, p. 2930, 2019.Google ScholarCross Ref
- J. Wang, W. Yang, H.-c. Li, H. Zhang, and G.-S. Xia, “Learning center probability map for detecting objects in aerial images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 5, pp. 4307–4323, 2021.Google ScholarCross Ref
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” in IEEE International Conference on Computer Vision, 2017, pp. 2980–2988.Google ScholarCross Ref
- Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: A simple and strong anchor-free object detector,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.Google ScholarCross Ref
- X. Zhou, D. Wang, and P. Kr ̈ahenb ̈uhl, “Objects as points,” arXiv preprint arXiv:1904.07850, 2019.Google Scholar
- X. Yang, J. Yan, Z. Feng, and T. He, “R3det: Refined single-stage detector with feature refinement for rotating object,” in AAAI Conference on Artificial Intelligence, vol. 35, no. 4, 2021, pp. 3163–317.Google ScholarCross Ref
- J. Han, J. Ding, J. Li, and G.-S. Xia, “Align deep features for oriented object detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2021.Google Scholar
- Q. Ming, Z. Zhou, L. Miao, H. Zhang, and L. Li, “Dynamic anchor learning for arbitrary-oriented object detection,” in AAAI Conference on Artificial Intelligence, vol. 35, no. 3, 2021, pp. 2355–2363.Google ScholarCross Ref
- L. Hou, K. Lu, J. Xue, and Y. Li, “Shape-adaptive selection and measurement for oriented object detection,” in AAAI Conference on Artificial Intelligence, 2022.Google ScholarCross Ref
- W. Li, Y. Chen, K. Hu, and J. Zhu, “Oriented reppoints for aerial object detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 1829–1838.Google ScholarCross Ref
- J. Lin, “Divergence measures based on the shannon entropy,” IEEE Transactions on Information theory, vol. 37, no. 1, pp. 145–151, 1991.Google ScholarDigital Library
- Y. Li, “Detecting lesion bounding ellipses with gaussian proposal networks,” in International Workshop on Machine Learning in Medical Imaging. Springer, 2019, pp. 337–344.Google ScholarDigital Library
- C. Xu, J. Wang, W. Yang, H. Yu, L. Yu, and G.-S. Xia, “Detecting tiny objects in aerial images: A normalized wasserstein distance and a new benchmark,” in ISPRS Journal of Photogrammetry and Remote Sensing, vol. 190, 2022, pp. 79–93.Google ScholarCross Ref
- X. Yang, G. Zhang, X. Yang, Y. Zhou, W. Wang, J. Tang, T. He, and J. Yan, “Detecting rotated objects as gaussian distributions and its 3d generalization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.Google ScholarDigital Library
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real- time object detection with region proposal networks,” in Advances in Neural Information Processing Systems, 2015, pp. 91–99.Google Scholar
- S. Zhang, C. Chi, Y. Yao, Z. Lei, and S. Z. Li, “Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9759–9768.Google ScholarCross Ref
- X. Yang, X. Yang, J. Yang, Q. Ming, W. Wang, Q. Tian, and J. Yan, “Learning high-precision bounding box for rotated object detection via kullback-leibler divergence,” Advances in Neural Information Processing Systems, vol. 34, 2021.Google Scholar
- A. Paszke, S. Gross, F. Massa, A. Lerer , “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems, 2019, pp. 8024–8035.Google Scholar
- Y. Zhou, X. Yang, G. Zhang, J. Wang, Y. Liu, L. Hou, X. Jiang, X. Liu, J. Yan, C. Lyu , “Mmrotate: A rotated object detection benchmark using pytorch,” arXiv preprint arXiv:2204.13317, 2022.Google ScholarDigital Library
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein , “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.Google ScholarDigital Library
- K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in IEEE International Conference on Computer Vision, 2017, pp. 2961–2969.Google ScholarCross Ref
- K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang , “Hybrid task cascade for instance segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4974–4983.Google ScholarCross Ref
- J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 764–773.Google ScholarCross Ref
- Z. Chen, K. Chen, W. Lin, J. See, H. Yu, Y. Ke, and C. Yang, “Piou loss: Towards accurate oriented object detection in complex environments,” in European Conference on Computer Vision, 2020, pp. 195–211.Google Scholar
Index Terms
- Density-aware Object Detection in Aerial Images
Recommendations
Towards Accurate Oriented Object Detection in Aerial Images with Adaptive Multi-level Feature Fusion
Detecting objects in aerial images is a long-standing and challenging problem since the objects in aerial images vary dramatically in size and orientation. Most existing neural network based methods are not robust enough to provide accurate oriented ...
Orientation Robust Object Detection in Aerial Images Based on R-NMS
AbstractObject detection in aerial images is a challenging task which plays an important role in many fields, such as intelligent traffic management, fishery management and so on. Different from object detection in natural images, the orientation of ...
Dense-and-Similar Object detection in aerial images
AbstractThe general object detection performance has been improving significantly due to the prosperity of deep learning. When applied to aerial images, these algorithms perform poorly. There are, as we summarized, two practical reasons: (1) photographed ...
Highlights- Introduce a separate detector for dense and small objects, and cluster detection results to form foreground region images.
- Treat similar classes as one merged class, and take advantage of their common features to achieve better ...
Comments