Skip to main content
Log in

AutoScale: Learning to Scale for Crowd Counting

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Recent works on crowd counting mainly leverage Convolutional Neural Networks (CNNs) to count by regressing density maps, and have achieved great progress. In the density map, each person is represented by a Gaussian blob, and the final count is obtained from the integration of the whole map. However, it is difficult to accurately predict the density map on dense regions. A major issue is that the density map on dense regions usually accumulates density values from a number of nearby Gaussian blobs, yielding different large density values on a small set of pixels. This makes the density map present variant patterns with significant pattern shifts and brings a long-tailed distribution of pixel-wise density values. In this paper, we aim to address such issue in the density map. Specifically, we propose a simple and effective Learning to Scale (L2S) module, which automatically scales dense regions into reasonable closeness levels (reflecting image-plane distance between neighboring people). L2S directly normalizes the closeness in different patches such that it dynamically separates the overlapped blobs, decomposes the accumulated values in the ground-truth density map, and thus alleviates the pattern shifts and long-tailed distribution of density values. This helps the model to better learn the density map. We also explore the effectiveness of L2S in localizing people by finding the local minima of the quantized distance (w.r.t. person location map), which has a similar issue as density map regression. To the best of our knowledge, such localization method is also novel in localization-based crowd counting. We further introduce a customized dynamic cross-entropy loss, significantly improving the localization-based model optimization. Extensive experiments demonstrate that the proposed framework termed AutoScale improves upon some state-of-the-art methods in both regression and localization benchmarks on three crowded datasets and achieves very competitive performance on two sparse datasets. An implementation of our method is available at https://github.com/dk-liang/AutoScale.git.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. https://github.com/gjy3035/NWPU-Crowd-Sample-Code-for-Localization.

References

  • Arteta, C., Lempitsky, V., Noble, J. A. & Zisserman, A. (2014). Interactive object counting. In Proceedings of European conference on computer vision (pp. 504–518). Springer.

  • Arteta, C., Lempitsky, V., & Zisserman, A. (2016). Counting in the wild. In Proceedings of European conference on computer vision (pp. 483–498). Springer

  • Babu Sam, D., Sajjan, N. N., Venkatesh Babu, R. & Srinivasan, M. (2018). Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. In Proceedings of IEEE international conferences on computer vision and pattern recognition (pp. 3618–3626).

  • Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495.

    Article  Google Scholar 

  • Bai, S., He, Z., Qiao, Y., Hu, H., Wu, W. & Yan, J. (2020). Adaptive dilated network with self-correction supervision for counting. In Proceedings of IEEE international conferences on computer vision and pattern recognition (pp. 4594–4603).

  • Baxes, G. A. (1994). Digital image processing: Principles and applications. New York: Wiley.

    Google Scholar 

  • Brostow, G. J., & Cipolla, R. (2006). Unsupervised Bayesian detection of independent motion in crowds. In Proceedings of IEEE international conference on computer vision and pattern recognition (vol. 1, pp. 594–601).

  • Cao, K., Wei, C., Gaidon, A., Arechiga, N., & Ma, T. (2019). Learning imbalanced datasets with label-distribution-aware margin loss. In Advances in neural information processing systems (pp. 1565–1576).

  • Cao, X., Wang, Z., Zhao, Y. & Su, F. (2018). Scale aggregation network for accurate and efficient crowd counting. In Proceedings of European conference on computer vision (pp. 734–750).

  • Chan, A. B., Liang, Z. S. J. & Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 1–7).

  • Chen, K., Loy, C. C., Gong, S. & Xiang, T. (2012). Feature mining for localised crowd counting. In Proceedings of BMVC (p. 3).

  • Chen, T. Y., Chen, C. H., Wang, D. J. & Kuo, Y. L. (2010). A people counting system based on face-detection. In Proceedings of international conference on genetic and evolutionary computing (pp. 699–702).

  • Cheng, Z. Q., Li, J. X., Dai, Q., Wu, X. & Hauptmann, A. G. (2019). Learning spatial awareness to improve crowd counting. In Proceedings of IEEE international conference on computer vision (pp. 6152–6161).

  • Cui, Y., Song, Y., Sun, C., Howard, A., & Belongie, S. (2018). Large scale fine-grained categorization and domain-specific transfer learning. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 4109–4118).

  • Dong, Q., Gong, S., & Zhu, X. (2017). Class rectification hard mining for imbalanced deep learning. In Proceedings of IEEE international conference on computer vision (pp. 1851–1860).

  • Fu, H., Gong, M., Wang, C., Batmanghelich, K., & Tao, D. (2018). Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2002–2011).

  • Gao, J., Han, T., Wang, Q., & Yuan, Y. (2019). Domain-adaptive crowd counting via inter-domain features segregation and gaussian-prior reconstruction. arXiv preprint arXiv:1912.03677

  • Gao, J., Lin, W., Zhao, B., Wang, D., Gao, C., & Wen, J. (2019). C 3 framework: An open-source pytorch code for crowd counting. arXiv:1907.02724.

  • Ge, W., & Collins, R. T. (2009). Marked point processes for crowd counting. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 2913–2920).

  • Geng, C., Huang, S. J., & Chen, S. (2020). Recent advances in open set recognition: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Girshick, R. (2015). Fast R-CNN. In Proceedings of IEEE international conference on computer vision (pp. 1440–1448).

  • Guerrero-Gómez-Olmedo, R., Torre-Jiménez, B., López-Sastre, R., Maldonado-Bascón, S., & Onoro-Rubio, D. (2015). Extremely overlapping vehicle counting. In Iberian conference on pattern recognition and image analysis (pp. 423–431). Springer.

  • Ha, D., Dai, A., & Le, Q. V. (2016). Hypernetworks. arXiv preprint arXiv:1609.09106

  • He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284.

    Article  Google Scholar 

  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of IEEE international conference on computer vision (pp. 2961–2969).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 770–778).

  • Hossain, M., Hosseinzadeh, M., Chanda, O., & Wang, Y. (2019). Crowd counting using scale-aware attention networks. In 2019 IEEE winter conference on applications of computer vision (WACV) (pp. 1280–1288). IEEE.

  • Hu, P., & Ramanan, D. (2017). Finding tiny faces. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 951–959).

  • Hu, Y., Jiang, X., Liu, X., Zhang, B., Han, J., Cao, X., & Doermann, D. (2020). Nas-count: Counting-by-density with neural architecture search. In Proceedings of European conference on computer vision. Springer.

  • Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 4700–4708).

  • Idrees, H., Soomro, K., & Shah, M. (2015). Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(10), 1986–1998.

    Article  Google Scholar 

  • Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., & Shah, M. (2018). Composition loss for counting, density map estimation and localization in dense crowds. In Proceedings of European conference on computer vision. Springer.

  • Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In Proceedings of advances in neural information processing systems (pp. 2017–2025).

  • Jiang, S., Lu, X., Lei, Y., & Liu, L. (2019). Mask-aware networks for crowd counting. IEEE Transactions on Circuits and Systems for Video Technology.

  • Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., & Shao, L. (2019). Crowd counting and density estimation by trellis encoder-decoder networks. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 6133–6142).

  • Jiang, X., Zhang, L., Lv, P., Guo, Y., Zhu, R., Li, Y., Pang, Y., Li, X., Zhou, B., & Xu, M. (2019). Learning multi-level density maps for crowd counting. IEEE Transactions on Neural Networks and Learning Systems.

  • Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P., Zhou, B., Yang, X., & Pang, Y. (2020). Attention scaling for crowd counting. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 4706–4715).

  • Kang, D., & Chan, A. (2018). Crowd counting by adaptively fusing predictions from an image pyramid. In Proceedings of BMVC.

  • Kang, D., Ma, Z., & Chan, A. B. (2018). Beyond counting: Comparisons of density maps for crowd analysis tasks-counting, detection, and tracking. IEEE Transactions on Circuits and Systems for Video Technology, 29(5), 1408–1422.

    Article  Google Scholar 

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. In Proceedings of international conference on learning representations.

  • Laradji, I. H., Rostamzadeh, N., Pinheiro, P. O., Vazquez, D., & Schmidt, M. (2018). Where are the blobs: Counting by localization with point supervision. In Proceedings of European conference on computer vision (pp. 547–562).

  • Li, Y., Zhang, X., & Chen, D. (2018). CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 1091–1100).

  • Lian, D., Li, J., Zheng, J., Luo, W., & Gao, S. (2019). Density map regression guided detection network for RGB-D crowd counting and localization. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 1821–1830).

  • Lin, T. Y., Dollár, P., Girshick, R. B., He, K., Hariharan, B., & Belongie, S. J. (2017). Feature pyramid networks for object detection. In Proceedings of IEEE international conference on computer vision and pattern recognition (Vol. 1, p. 4).

  • Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of IEEE international conference on computer vision (pp. 2980–2988).

  • Liu, B., & Vasconcelos, N. (2015). Bayesian model adaptation for crowd counts. In Proceedings of IEEE international conference on computer vision (pp. 4175–4183).

  • Liu, C., Weng, X., & Mu, Y. (2019). Recurrent attentive zooming for joint crowd counting and precise localization. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 1217–1226).

  • Liu, J., Gao, C., Meng, D., & Hauptmann, A. G. (2018). Decidenet: counting varying density crowds through attention guided detection and density estimation. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 5197–5206).

  • Liu, L., Lu, H., Zou, H., Xiong, H., Cao, Z., & Chun, H. (2020). Weighing counts: Sequential crowd counting by reinforcement learning.

  • Liu, L., Qiu, Z., Li, G., Liu, S., Ouyang, W., & Lin, L. (2019). Crowd counting with deep structured scale integration network. In Proceedings of IEEE international conference on computer vision (pp. 1774–1783).

  • Liu, L., Wang, H., Li, G., Ouyang, W., & Lin, L. (2018). Crowd counting using deep recurrent spatial-aware network. IJCAI.

  • Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., & Wu, H. (2019). Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 3225–3234).

  • Liu, W., Salzmann, M., & Fua, P. (2019). Context-aware crowd counting. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 5099–5108).

  • Liu, X., Van De Weijer, J., & Bagdanov, A. D. (2019). Exploiting unlabeled data in cnns by self-supervised learning to rank. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Liu, X., van de Weijer, J., & Bagdanov, A. D. (2018). Leveraging unlabeled data for crowd counting by learning to rank. In Proceedings of IEEE international conference on computer vision and pattern recognition.

  • Liu, X., Yang, J., & Ding, W. (2020). Adaptive mixture regression network with local counting map for crowd counting. In Proceedings of European conference on computer vision. Springer.

  • Liu, Y., Shi, M., Zhao, Q., & Wang, X. (2019). Point in, box out: Beyond counting persons in crowds. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 6469–6478).

  • Luo, A., Yang, F., Li, X., Nie, D., Jiao, Z., Zhou, S., & Cheng, H. (2020). Hybrid graph neural networks for crowd counting. In Proceedings of the AAAI conference on artificial intelligence.

  • Ma, Z., Wei, X., Hong, X., & Gong, Y. (2019). Bayesian loss for crowd count estimation with point supervision. In Proceedings of IEEE international conference on computer vision (pp. 6142–6151).

  • Miao, Y., Lin, Z., Ding, G., & Han, J. (2020). Shallow feature based dense attention network for crowd counting. In Proceedings of the AAAI conference on artificial intelligence.

  • Najibi, M., Singh, B., & Davis, L. S. (2019) Autofocus: Efficient multi-scale inference. In Proceedings of IEEE international conference on computer vision.

  • Oh, M. H., Olsen, P. A., & Ramamurthy, K. N. (2020). Crowd counting with decomposed uncertainty. In Proceedings of the AAAI conference on artificial intelligence.

  • Oh Song, H., Xiang, Y., Jegelka, S., & Savarese, S. (2016). Deep metric learning via lifted structured feature embedding. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 4004–4012).

  • Olmschenk, G., Tang, H., & Zhu, Z. (2019). Improving dense crowd counting convolutional neural networks using inverse k-nearest neighbor maps and multiscale upsampling. arXiv preprint arXiv:1902.05379.

  • Onoro-Rubio, D., & López-Sastre, R. J. (2016). Towards perspective-free object counting with deep learning. In Proceedings of European conference on computer vision (pp. 615–629).

  • Ouyang, W., Wang, X., Zhang, C., & Yang, X. (2016). Factors in finetuning deep model for object detection with long-tail distribution. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 864–873).

  • Ranjan, V., Le, H., & Hoai, M. (2018). Iterative crowd counting. In Proceedings of European conference on computer vision.

  • Recasens, A., Kellnhofer, P., Stent, S., Matusik, W., & Torralba, A. (2018) Learning to zoom: A saliency-based sampling layer for neural networks. In Proceedings of European conference on computer vision (pp. 51–66).

  • Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of advances in neural information processing systems (pp. 91–99).

  • Ribera, J., Güera, D., Chen, Y., & Delp, E. J. (2019). Locating objects without bounding boxes. In Proceedings of IEEE international conference on computer vision and pattern recognition, Long Beach, CA.

  • Rodriguez, M., Laptev, I., Sivic, J., & Audibert, J. Y. (2011) Density-aware person detection and tracking in crowds. In Proceedings of IEEE international conference on computer vision (pp. 2423–2430).

  • Sajid, U., Sajid, H., Wang, H., & Wang, G. (2020). Zoomcount: A zooming mechanism for crowd counting in static images. IEEE Transactions on Circuits and Systems for Video Technology.

  • Salakhutdinov, R., Torralba, A., & Tenenbaum, J. (2011). Learning to share visual appearance for multiclass object detection. In CVPR 2011 (pp. 1481–1488). IEEE.

  • Sam, D. B., Peri, S. V., Sundararaman, M. N., Kamath, A., & Radhakrishnan, V. B. (2020). Locate, size and count: Accurately resolving people in dense crowds via detection. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Sam, D. B., Surya, S., & Babu, R. V. (2017). Switching convolutional neural network for crowd counting. In Proceedings of IEEE international conference on computer vision and pattern recognition (vol. 1, p. 6).

  • Shi, M., Yang, Z., Xu, C., & Chen, Q. (2019). Revisiting perspective information for efficient crowd counting. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 7279–7288).

  • Shi, Z., Mettes, P., & Snoek, C. G. (2019). Counting with focus for free. In Proceedings of IEEE international conference on computer vision (pp. 4200–4209).

  • Shi, Z., Zhang, L., Liu, Y., Cao, X., Ye, Y., Cheng, M. M., & Zheng, G. (2018). Crowd counting with deep negative correlation learning. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 5382–5390).

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of international conference on learning representations.

  • Sindagi, V. A., & Patel, V. M. (2017). Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In Proceedings of IEEE international conference on advanced video and signal based surveillance (pp. 1–6).

  • Sindagi, V. A., & Patel, V. M. (2017). Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of IEEE international conference on computer vision.

  • Sindagi, V. A., & Patel, V. M. (2018). A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recognition Letters, 107, 3–16.

    Article  Google Scholar 

  • Sindagi, V. A. & Patel, V. M. (2019, Accepted). HA-CCN: Hierarchical attention-based crowd counting network. IEEE Transactions on Image Processing.

  • Sindagi, V. A., & Patel, V. M. (2019). Multi-level bottom-top and top-bottom feature fusion for crowd counting. In Proceedings of IEEE international conference on computer vision (pp. 1002–1012).

  • Sindagi, V. A., Yasarla, R., & Patel, V. M. (2019). Pushing the frontiers of unconstrained crowd counting: New dataset and benchmark method. In Proceedings of IEEE international conference on computer vision (pp. 1221–1231).

  • Sindagi, V. A., Yasarla, R., & Patel, V. M. (2020). Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Singh, B., & Davis, L. S. (2018). An analysis of scale invariance in object detection–snip. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 3578–3587).

  • Singh, B., Najibi, M., & Davis, L. S. (2018). Sniper: Efficient multi-scale training. In Proceedings of advances in neural information processing systems (pp. 9310–9320).

  • Tian, Y., Lei, Y., Zhang, J., & Wang, J. Z. (2019). Padnet: Pan-density crowd counting. IEEE Transactions on Image Processing.

  • Van Horn, G., & Perona, P. (2017). The devil is in the tails: Fine-grained classification in the wild. arXiv preprint arXiv:1709.01450.

  • Viola, P., Jones, M. J., & Snow, D. (2005). Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, 63(2), 153–161.

    Article  Google Scholar 

  • Wan, J., & Chan, A. (2019). Adaptive density map generation for crowd counting. In Proceedings of IEEE international conference on computer vision (pp. 1130–1139).

  • Wan, J., Luo, W., Wu, B., Chan, A. B., & Liu, W. (2019). Residual regression with semantic prior for crowd counting. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 4036–4045).

  • Wang, M., & Wang, X. (2011). Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 3401–3408).

  • Wang, Q., Gao, J., Lin, W., & Li, X. (2020). Nwpu-crowd: A large-scale benchmark for crowd counting and localization. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2020.3013269

    Article  Google Scholar 

  • Wang, Q., Gao, J., Lin, W., & Yuan, Y. (2019). Learning from synthetic data for crowd counting in the wild. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 8198–8207).

  • Wang, Y. X., Ramanan, D., & Hebert, M. (2017). Learning to model the tail. In Advances in neural information processing systems (pp. 7029–7039).

  • Xiong, H., Lu, H., Liu, C., Liu, L., Cao, Z., & Shen, C. (2019). From open set to closed set: Counting objects by spatial divide-and-conquer. In Proceedings of IEEE international conference on computer vision (pp. 8362–8371).

  • Xu, C., Qiu, K., Fu, J., Bai, S., Xu, Y., & Bai, X. (2019). Learn to scale: Generating multipolar normalized density map for crowd counting. In Proceedings of IEEE international conference on computer vision.

  • Yan, Z., Yuan, Y., Zuo, W., Tan, X., Wang, Y., Wen, S., & Ding, E. (2019). Perspective-guided convolution networks for crowd counting. In Proceedings of IEEE international conference on computer vision (pp. 952–961).

  • Yang, Y., Li, G., Wu, Z., Su, L., Huang, Q., & Sebe, N. (2020). Reverse perspective network for perspective-aware object counting. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 4374–4383).

  • Zhang, A., Shen, J., Xiao, Z., Zhu, F., Zhen, X., Cao, X., & Shao, L. (2019). Relational attention network for crowd counting. In Proceedings of IEEE international conference on computer vision (pp. 6788–6797).

  • Zhang, A., Yue, L., Shen, J., Zhu, F., Zhen, X., Cao, X., & Shao, L. (2019). Attentional neural fields for crowd counting. In Proceedings of IEEE international conference on computer vision (pp. 5714–5723).

  • Zhang, C., Li, H., Wang, X., & Yang, X. (2015). Cross-scene crowd counting via deep convolutional neural networks. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 833–841).

  • Zhang, L., Shi, Z., Cheng, M. M., Liu, Y., Bian, J. W., Zhou, J. T., Zheng, G., & Zeng, Z. (2019, Accepted). Nonlinear regression via deep negative correlation learning. t IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Zhang, Q., & Chan, A. B. (2019). Wide-area crowd counting via ground-plane density maps and multi-view fusion cnns. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 8297–8306).

  • Zhang, Q., & Chan, A. B. (2020). 3d crowd counting via multi-view fusion with 3d gaussian kernels. Proceedings of the AAAI conference on artificial intelligence.

  • Zhang, X., Fang, Z., Wen, Y., Li, Z., & Qiao, Y. (2017). Range loss for deep face recognition with long-tailed training data. In Proceedings of IEEE international conference on computer vision (pp. 5409–5418).

  • Zhang, Y., Zhou, D., Chen, S., Gao, S., & Ma, Y. (2016). Single-image crowd counting via multi-column convolutional neural network. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 589–597).

  • Zhao, M., Zhang, J., Zhang, C., & Zhang, W. (2019). Leveraging heterogeneous auxiliary tasks to assist crowd counting. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 12736–12745).

  • Zhao, X., Delleandrea, E., & Chen, L. (2009). A people counting system based on face detection and tracking in a video. In Proceedings of IEEE international conference on advanced video and signal based surveillance (pp. 67–72).

  • Zhao, Z., Li, H., Zhao, R., & Wang, X. (2016). Crossing-line crowd counting with two-phase deep neural networks. In Proceedings of European conference on computer vision (pp. 712–726). Springer.

  • Zheng, H., Fu, J., Mei, T., & Luo, J. (2017). Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of IEEE international conference on computer vision (pp. 5209–5217).

  • Zhu, X., Anguelov, D., & Ramanan, D. (2014). Capturing long-tail distributions of object subcategories. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 915–922).

  • Zhu, X., Vondrick, C., Fowlkes, C. C., & Ramanan, D. (2016). Do we need more training data? International Journal of Computer Vision, 119(1), 76–92.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The work of Yongchao Xu was supported by the National Key Research and Development Program of China (2018AAA0100400), the National Natural Science Foundation of China (61936003 and 62176186), and in part by the Young Elite Scientists Sponsorship Program by CAST. The work of Xiang Bai was supported by the National Program for Support of Top-Notch Young Professionals and in part by the Program for HUST Academic Frontier Youth Team.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongchao Xu.

Additional information

Communicated by Chen Change Loy.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Disclaimer: This work was mostly done when Chenfeng Xu was an undergraduate student in Huazhong University of Science and Technology.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, C., Liang, D., Xu, Y. et al. AutoScale: Learning to Scale for Crowd Counting. Int J Comput Vis 130, 405–434 (2022). https://doi.org/10.1007/s11263-021-01542-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01542-z

Keywords

Navigation