skip to main content
10.1145/3625156.3625184acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicissConference Proceedingsconference-collections
research-article

FL-YOLO: Lightweight Small Target Detection Algorithm Based on Transformer and CNN Hybrid Structure

Authors Info & Claims
Published:21 November 2023Publication History

ABSTRACT

Although the target detection technology has made amazing progress in the detection of large and medium objects, due to the limited size of small objects, the occlusion between targets and the limitations of the convolutional network itself, small target detection is still as a challenging. In this paper, by improving the YOLOv5s algorithm, a lightweight small target detection FL-YOLO model with a hybrid structure is proposed, and the standard convolution in the original backbone network is replaced by depth-separable convolution and the residual connection method is adjusted. lightweight. A Patch-Attention (PA) module is proposed to extract context information of small objects and insert it into the backbone to enhance the feature extraction ability for tiny target. A new upsampling module named “Transition” is proposed to replace the nearest interpolation in YOLOv5s to minimize information conflict and information redundancy. Finally, the public dataset VisDrone2019 specially constructed for small targets was used to conduct experiments, which verified that the FL-YOLO proposed in this paper is more effective than YOLOv5s.

CCS Concepts • Computing methodologies∼Visual inspection

References

  1. Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).Google ScholarGoogle ScholarCross RefCross Ref
  3. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.Google ScholarGoogle Scholar
  4. Noh, J., Bae, W., Lee, W., Seo, J., & Kim, G. (2019). Better to follow, follow to be better: Towards precise supervision of feature super-resolution for small object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9725-9734)Google ScholarGoogle ScholarCross RefCross Ref
  5. Yang, C., Huang, Z., & Wang, N. (2022). Querydet: Cascaded sparse query for accelerating high-resolution small object detection. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (pp. 13668-13677).Google ScholarGoogle ScholarCross RefCross Ref
  6. Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759-8768).Google ScholarGoogle ScholarCross RefCross Ref
  7. Ghiasi, G., Lin, T. Y., & Le, Q. V. (2019). Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7036-7045).Google ScholarGoogle ScholarCross RefCross Ref
  8. Liang, Z., Shao, J., Zhang, D., & Gao, L. (2018). Small object detection using deep feature pyramid networks. In Advances in Multimedia Information Processing–PCM 2018: 19th Pacific-Rim Conference on Multimedia, Hefei, China, September 21-22, 2018, Proceedings, Part III 19 (pp. 554-564). Springer International Publishing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781-10790).Google ScholarGoogle ScholarCross RefCross Ref
  10. Xu, C., Wang, J., Yang, W., Yu, H., Yu, L., & Xia, G. S. (2022, November). RFLA: Gaussian receptive field based label assignment for tiny object detection. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX (pp. 526-543). Cham: Springer Nature Switzerland.Google ScholarGoogle Scholar
  11. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012-10022).Google ScholarGoogle ScholarCross RefCross Ref
  12. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.Google ScholarGoogle Scholar
  13. Vaswani A, Shazeer N, Parmar N, Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.Google ScholarGoogle Scholar
  14. Dosovitskiy A, Beyer L, Kolesnikov A, An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.Google ScholarGoogle Scholar
  15. Zhang H, Li F, Liu S, Dino: Detr with improved denoising anchor boxes for end-to-end object detection[J]. arXiv preprint arXiv:2203.03605, 2022.Google ScholarGoogle Scholar
  16. Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).Google ScholarGoogle ScholarCross RefCross Ref
  17. Law, H., & Deng, J. (2018). Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV) (pp. 734-750).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.Google ScholarGoogle Scholar
  19. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6569-6578).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. FL-YOLO: Lightweight Small Target Detection Algorithm Based on Transformer and CNN Hybrid Structure
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICISS '23: Proceedings of the 2023 6th International Conference on Information Science and Systems
          August 2023
          301 pages
          ISBN:9798400708206
          DOI:10.1145/3625156

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 November 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)20
          • Downloads (Last 6 weeks)5

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format