Abstract
State-of-the-art object detectors exploit multi-branch structure and predict objects at several different scales, although substantially boosted accuracy is acquired, low efficiency is inevitable as fragmented structure is hardware unfriendly. To solve this issue, we propose a packing operator (PackOp) to combine all head branches together at spatial. Packed features are computationally more efficient and allow to use cross-head group normalization (GN) at handy, leading to notable accuracy improvement against the common head-separate GN. All of these are only at the cost of less than 5.7% relative increase on runtime memory and introduction of a few noisy training samples, however, whose side-effects could be diminished by good packing patterns design. With PackOp, we propose a new anchor-free one-stage detector, PackDet, which features a single deeper/longer but narrower head compared to the existing methods: multiple shallow but wide heads. Our best models on COCO test-dev achieve better speed-accuracy balance: 35.1%, 42.3%, 44.0%, 47.4% AP with 22.6, 16.9, 12.4, 4.7 FPS using MobileNet-v2, ResNet-50, ResNet-101, and ResNeXt-101-DCN backbone, respectively. Codes will be released.(https://github.com/kding1225/PackDet)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.: YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR (2018)
Dai, J., Li, Y., He, K., et al.: R-FCN: object detection via region-based fully convolutional networks. In: NeurIPS (2016)
Duan, K., Bai, S., Xie, L., et al.: CenterNet: Keypoint triplets for object detection. arXiv:1904.08189 (2019)
Fu, C., Liu, W., Ranga, A., et al.: DSSD: Deconvolutional single shot detector. arXiv:1701.06659 (2017)
Girshick, R.B.: Fast R-CNN. In: ICCV (2015)
He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: ICCV (2017)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: CVPR (2016)
Huang, E., Korf, R.E.: New improvements in optimal rectangle packing. In: IJCAI (2009)
Iandola, F., Moskewicz, M., Karayev, S., et al.: DenseNet: Implementing efficient convnet descriptor pyramids. arXiv:1404.1869 (2014)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
Korf, R.E.: Optimal rectangle packing: initial results. In: ICAPS (2003)
Li, Y., Chen, Y., Wang, N., et al.: Scale-aware trident networks for object detection. In: ICCV (2019)
Li, Z., Peng, C., Yu, G., et al.: Light-head R-CNN: In defense of two-stage object detector. arXiv:1711.07264 (2017)
Lin, T., Dollár, P., Girshick, R.B., et al.: Feature pyramid networks for object detection. In: CVPR (2017)
Lin, T., Goyal, P., Girshick, R.B., et al.: Focal loss for dense object detection. In: ICCV (2017)
Liu, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Pang, J., Chen, K., Shi, J., et al.: Libra R-CNN: towards balanced learning for object detection. In: CVPR (2019)
Papandreou, G., Kokkinos, I., Savalle, P.A.: Untangling local and global deformations in deep convolutional networks for image classification and sliding window detection. arXiv:1412.0296 (2014)
Peng, C., Xiao, T., Li, Z., et al.: MegDet: a large mini-batch object detector. In: CVPR (2018)
Redmon, J., Divvala, S.K., Girshick, R.B., et al.: You only look once: unified, real-time object detection. In: CVPR (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)
Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R.B., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS (2015)
Rezatofighi, H., Tsoi, N., Gwak, J., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: CVPR (2019)
Sandler, M., Howard, A.G., Zhu, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR (2018)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. arXiv:1911.09070 (2019)
Tian, Z., Shen, C., Chen, H., et al.: FCOS: Fully convolutional one-stage object detection. arXiv:1904.01355 (2019)
Wang, N., Gao, Y., Chen, H., et al.: NAS-FCOS: Fast neural architecture search for object detection. arXiv:1906.04423 (2019)
Wei, C., Xie, L., Ren, X., et al.: Iterative reorganization with weak spatial constraints: solving arbitrary Jigsaw puzzles for unsupervised representation learning. In: CVPR (2019)
Wu, B., Dai, X., Zhang, P., et al.: FBNet: hardware-aware efficient ConvNet design via differentiable neural architecture search. In: CVPR (2019)
Wu, Y., He, K.: Group normalization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_1
Xie, S., Girshick, R.B., Dollár, P., et al.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)
Yang, T., Zhang, X., Li, Z., et al.: MetaAnchor: learning to detect objects with customized anchors. In: NeurIPS (2018)
Yang, Z., Liu, S., Hu, H., et al.: RepPoints: Point set representation for object detection. arXiv:1904.11490 (2019)
Zhang, S., Wen, L., Bian, X., et al.: Single-shot refinement neural network for object detection. In: CVPR (2018)
Zhang, Z., He, T., Zhang, H., et al.: Bag of freebies for training object detection neural networks. arXiv:1902.04103 (2019)
Zhao, Q., Sheng, T., Wang, Y., et al.: M2Det: a single-shot object detector based on multi-level feature pyramid network. In: AAAI (2019)
Zhong, Y., Wang, J., Peng, J., et al.: Anchor box optimization for object detection. arXiv:1812.00469 (2018)
Zhu, C., Chen, F., Shen, Z., et al.: Soft anchor-point object detection. arXiv, arXiv:1911.12448 (2019)
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: CVPR (2019)
Zhu, X., Hu, H., Lin, S., et al.: Deformable ConvNets v2: more deformable, better results. In: CVPR (2019)
Acknowledgement
This research was financially supported by National Natural Science Foundation of China (61731022, 91646207) and the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA19090300). We would like to thank Rui Yang and Chaoyi Liu from EvaVisdom Tech for the inspiring discussions. We also thank the anonymous reviewers for their valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ding, K., He, G., Gu, H., Zhong, Z., Xiang, S., Pan, C. (2020). PackDet: Packed Long-Head Object Detector. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12358. Springer, Cham. https://doi.org/10.1007/978-3-030-58601-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-58601-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58600-3
Online ISBN: 978-3-030-58601-0
eBook Packages: Computer ScienceComputer Science (R0)