skip to main content
10.1145/3487075.3487097acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsaeConference Proceedingsconference-collections
research-article

Multi Scale Attention Network for Crowd Counting

Authors Info & Claims
Published:07 December 2021Publication History

ABSTRACT

Reasonable management and control of extra crowded scenes have become a hot topic in recent years. Counting people from density map generated from the object location annotations is an effective way to analyze crowd information and control crowds in severely congested scenes. In this paper, we propose a novel end-to-end crowd counting method called MSANet for crowd counting. MSANet consists of the VGG16 backbone as the fronted part, two branches as the back-end part, including the attention map extractor to predict crowd states (means with people or not), and density map branch to regress the density map. What is more, to obtain high-resolution density map, we combine different scale maps from the front part to the back-end part. On the design of the loss function, to enhance the resolution of the predicted map and its structural similarity to ground truth, we proposed a new loss function for crowd counting. The test result based on the public dataset ShanghaiTech and Subway Crowd Counting Dataset supported by the Nanjing Metro demonstrates the effectiveness of our method.

References

  1. Viola P, Jones M J (2004). Robust real-time face detection. International journal of computer vision, pp. 137-154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dalal, Navneet, and Bill Triggs (2005). Histograms of oriented gradients for human detection. IEEE computer society conference on computer vision and pattern recognition, pp. 886-893.Google ScholarGoogle Scholar
  3. Hu P, Ramanan D. (2017). Finding tiny faces. In Proceedings of the IEEE conference on. computer vision and pattern recognition, pp 951-959.Google ScholarGoogle ScholarCross RefCross Ref
  4. Najibi M, Samangouei P, Chellappa R, (2017). Ssh: Single stage headless face detector. Proceedings of the IEEE international conference on computer vision, pp. 4875-4884.Google ScholarGoogle ScholarCross RefCross Ref
  5. Lempitsky V, Zisserman A. (2010). Learning to count objects in images. Advances in neural. information processing systems, pp 1324-1332.Google ScholarGoogle Scholar
  6. Zhang Y, Zhou D, Chen S, (2016). Single-image crowd counting via multi-column convolutional. neural network. Proceedings of the IEEE conference on computer vision andpattern recognition, pp. 589-597.Google ScholarGoogle ScholarCross RefCross Ref
  7. Li Y, Zhang X, Chen D. (2018). Csrnet: Dilated convolutional neural networks for understanding the. highly congested scenes. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1091-1100.Google ScholarGoogle ScholarCross RefCross Ref
  8. Liu W, Salzmann M, Fua P. (2019). Context-aware crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5099-5108.Google ScholarGoogle ScholarCross RefCross Ref
  9. Lin T Y, Dollár P, Girshick R, (2017). Feature pyramid networks for object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125.Google ScholarGoogle ScholarCross RefCross Ref
  10. Idrees H, Tayyab M, Athrey K, (2018). Composition loss for counting, density map estimation and localization in dense crowds. Proceedings of the European Conference on Computer Vision, pp. 532-546.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Rong L, Li C. (2021). Coarse-and fine-grained attention network with background-aware loss for crowd density map estimation. Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 3675-3684.Google ScholarGoogle ScholarCross RefCross Ref
  12. Shi M, Yang Z, Xu C, (2019). Revisiting perspective information for efficient crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7279-7288.Google ScholarGoogle ScholarCross RefCross Ref
  13. Sindagi V A, Patel V M. (2019). Multi-level bottom-top and top-bottom feature fusion for crowd counting. Proceedings of the IEEE International Conference on Computer Vision, pp 1002-1012.Google ScholarGoogle ScholarCross RefCross Ref
  14. Cheng Z Q, Li J X, Dai Q, (2019). Improving the learning of multi-column convolutional neural network for crowd counting. Proceedings of the 27th ACM international conference on multimedia, pp. 1897-1906.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Wei B, Yuan Y, Wang Q. (2020). MSPNET: multi-supervised parallel network for crowd counting. ICASSP-IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2418-2422.Google ScholarGoogle ScholarCross RefCross Ref
  16. Wu X, Zheng Y, Ye H, (2020). Counting crowds with varying densities via adaptive scenario discovery framework. Neurocomputing, pp. 127-138.Google ScholarGoogle Scholar
  17. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. (2018). Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141.Google ScholarGoogle Scholar
  18. Zhao Z, Han T, Gao J, (2020). A flow base bi-path network for cross-scene video crowd understanding in aerial view. European Conference on Computer Vision. Springer, Cham, pp. 574-587.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Valloli V K, Mehta K. (2019). W-net: Reinforced u-net for density map estimation[J]. arXiv preprint arXiv. pp. 1903.11249.Google ScholarGoogle Scholar
  20. Ronneberger O, Fischer P, Brox T. (2015). U-net: Convolutional networks for biomedical image. Segmentation. International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, pp. 234-241.Google ScholarGoogle Scholar
  21. Shi Z, Mettes P, Snoek C G M. (2019).Counting with focus for free. Proceedings of the IEEE International Conference on Computer Vision, pp. 4200-4209.Google ScholarGoogle ScholarCross RefCross Ref
  22. Lin T Y, Goyal P, Girshick R, (2017). Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision, pp. 2980-2988.Google ScholarGoogle ScholarCross RefCross Ref
  23. Liu N, Long Y, Zou C, (2019). ADcrowdNet: An attention-injective deformable convolutional network for crowd understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3225-3234.Google ScholarGoogle ScholarCross RefCross Ref
  24. Cao X, Wang Z, Zhao Y, (2018) Scale aggregation network for accurate and efficient crowd counting. Proceedings of the European Conference on Computer Vision, pp. 734-750.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Liu L, Qiu Z, Li G, (2019). Crowd counting with deep structured scale integration network. Proceedings of the IEEE International Conference on Computer Vision, pp. 1774-1783.Google ScholarGoogle ScholarCross RefCross Ref
  26. Wang Q, Gao J, Lin W, (2019). Learning from synthetic data for crowd counting in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8198-8207.Google ScholarGoogle ScholarCross RefCross Ref
  27. He G, Chen Q, Jiang D, (2017). A double-region learning algorithm for counting the number of pedestrians in subway surveillance videos[J]. Engineering Applications of Artificial Intelligence, pp. 302-314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jiang S, Lu X, Lei Y, (2019). Mask-aware networks for crowd counting[J]. IEEE Transactions on. Circuits and Systems for Video Technology, 30(9): 3119-3129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zhu L, Zhao Z, Lu C, (2019). Dual path multi-scale fusion networks with attention for crowd. counting[J]. arXiv preprint arXiv:1902.01115.Google ScholarGoogle Scholar
  30. Ioffe S, Szegedy C. (2015). Batch normalization: Accelerating deep network training by reducing. internal covariate shift. International conference on machine learning. PMLR, pp448-456.Google ScholarGoogle Scholar
  31. Jiang X, Xiao Z, Zhang B, (2019). Crowd counting and density estimation by trellis encoder-decoder networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6133-6142.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi Scale Attention Network for Crowd Counting
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          CSAE '21: Proceedings of the 5th International Conference on Computer Science and Application Engineering
          October 2021
          660 pages
          ISBN:9781450389853
          DOI:10.1145/3487075

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 December 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate368of770submissions,48%
        • Article Metrics

          • Downloads (Last 12 months)28
          • Downloads (Last 6 weeks)4

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format