Skip to main content
Log in

Dual-focus: person search from Coarse-Grained Focus to Fine-Grained Focus

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Person search, which aims to find the target person in the whole scene images, has become a crucial application in security surveillance with various types of noise. Existing methods solve this problem by addressing two subtasks simultaneously: detection, distinguishing people from background noise; and identification, separating target from others noise. However, these two tasks are fundamentally contradictory, as detection is designed to be robust to intra-class variations to capture general human features, while identification is designed to be sensitive to inter-class variations to focus only on the target. Inspired by the observation that the police first narrowed the scope to a small group of people similar to the suspect, and then located the target, we propose a Dual-Focus framework consisting of Coarse-Grained Focus (search for similar people) and Fine-Grained Focus (find the target). Extensive experiments on datasets such as CUHK-SYSU and PRW are conducted to evaluate the effectiveness of our proposed methods. The evaluations demonstrate that our method delivers superior results compared with the state-of-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Chu, Y., Zhao, L., Ahmad, T.: Multiple feature subspaces analysis for single sample per person face recognition. Vis. Comput. 1, 1–18 (2018)

    Google Scholar 

  2. Fan, H., Yang, Y.: Person tube retrieval via language description. Proc. AAAI Conf. Artif. Intell. 34, 10754–10761 (2020)

    Google Scholar 

  3. Fan, H., Zheng, L., Yan, C., Yang, Y.: Unsupervised person re-identification clustering and fine-tuning. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 14(4), 1–18 (2018)

    Article  Google Scholar 

  4. Verma, A., Subramanyam, A., Wang, Z., Satoh, S., Shah, R.R.: Unsupervised domain adaptation for person re-identification via individual-preserving and environmental-switching cyclic generation. IEEE Trans. Multimed. (2021). https://doi.org/10.1109/TMM.2021.3126404

    Article  Google Scholar 

  5. Cheng, Y., Liu, Y.: Person reidentification based on automotive radar point clouds. IEEE Trans. Geosci. Remote Sens. 60, 1–13 (2021)

    Google Scholar 

  6. Wan, Z., Xu, X., Wang, Z., Yamasaki, T., Zhang, X., Hu, R.: Efficient virtual data search for annotation-free vehicle reidentification. Int. J. Intell. Syst. 37(5), 2988–3005 (2022)

    Article  Google Scholar 

  7. Yang, X., Wang, M., Tao, D.: Person re-identification with metric learning using privileged information. IEEE Trans. Image Process. 27(2), 791–805 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  8. Zeng, Z., Wang, Z., Yang, F., Satoh, S.: Geo-localization via ground-to-satellite cross-view image retrieval. IEEE Trans. Multimed. (2022). https://doi.org/10.1109/TMM.2022.3144066

    Article  Google Scholar 

  9. Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., Tian, Q.: Person re-identification in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1367–1376 (2017)

  10. Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3415–3424 (2017)

  11. Wang, X., Wang, Z., Liu, W., Xu, X., Chen, J., Lin, C.-W.: Consistency-constancy bi-knowledge learning for pedestrian detection in night surveillance. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4463–4471 (2021)

  12. Wang, X., Liang, C., Chen, C., Chen, J., Wang, Z., Han, Z., Xiao, C.: S3d: scalable pedestrian detection via score scale surface discrimination. IEEE Trans. Circ. Syst. Video Technol. 30(10), 3332–3344 (2019)

    Article  Google Scholar 

  13. Wang, W., Peng, Y., Cao, G., Guo, X., Kwok, N.: Low-illumination image enhancement for night-time UAV pedestrian detection. IEEE Trans. Industr. Inf. 17(8), 5208–5217 (2020)

    Article  Google Scholar 

  14. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  15. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

  16. Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 480–496 (2018)

  17. Munjal, B., Amin, S., Tombari, F., Galasso, F.: Query-guided end-to-end person search. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 811–820 (2019)

  18. Han, C., Ye, J., Zhong, Y., Tan, X., Zhang, C., Gao, C., Sang, N.: Re-id driven localization refinement for person search. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9814–9823 (2019)

  19. Wang, X., Liu, W., Chen, J., Wang, X., Yan, C., Mei, T.: Listen, look, and find the one: Robust person search with multimodality index. ACM Trans. Multimed. Comput. Commun. Appl. 16(2), 1–20 (2020)

    Article  Google Scholar 

  20. Dong, W., Zhang, Z., Song, C., Tan, T.: Bi-directional interaction network for person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2839–2848 (2020)

  21. Yang, W., Huang, H., Chen, X., Huang, K.: Bottom-up foreground-aware feature fusion for practical person search. IEEE Trans. Circ. Syst. Video Technol. 32(1), 262–274 (2021)

    Article  Google Scholar 

  22. Yan, Y., Zhang, Q., Ni, B., Zhang, W., Xu, M., Yang, X.: Learning context graph for person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2158–2167 (2019)

  23. Chen, D., Zhang, S., Ouyang, W., Yang, J., Schiele, B.: Hierarchical online instance matching for person search. Proc. AAAI Conf. Artif. Intell. 34, 10518–10525 (2020)

    Google Scholar 

  24. Hou, S., Zhao, C., Chen, Z., Wu, J., Wei, Z., Miao, D.: Improved instance discrimination and feature compactness for end-to-end person search. IEEE Trans. Circ. Syst. Video Technol. 32(4), 2079–2090 (2021)

    Article  Google Scholar 

  25. Kim, H., Joung, S., Kim, I.-J., Sohn, K.: Prototype-guided saliency feature learning for person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4865–4874 (2021)

  26. Yan, Y., Li, J., Qin, J., Bai, S., Liao, S., Liu, L., Zhu, F., Shao, L.: Anchor-free person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7690–7699 (2021)

  27. Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)

  28. Lan, X., Zhu, X., Gong, S.: Person search by multi-scale matching. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 536–552 (2018)

  29. Chen, D., Zhang, S., Ouyang, W., Yang, J., Tai, Y.: Person search by separated modeling and a mask-guided two-stream CNN model. IEEE Trans. Image Process. 29, 4669–4682 (2020)

    Article  MATH  Google Scholar 

  30. Wang, C., Ma, B., Chang, H., Shan, S., Chen, X.: Tcts: A task-consistent two-stage framework for person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11952–11961 (2020)

  31. Yao, H., Xu, C.: Joint person objectness and repulsion for person search. IEEE Trans. Image Process. 30, 685–696 (2020)

    Article  Google Scholar 

  32. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: IEEE International Conference on Computer Vision, pp. 1116–1124 (2015)

  33. Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2197–2206 (2015)

  34. Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3586–3593 (2013)

  35. Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2288–2295 (2012)

  36. Wang, M., Li, H., Tao, D., Lu, K., Wu, X.: Multimodal graph-based reranking for web image search. IEEE Trans. Image Process. 21(11), 4649–4661 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  37. Wang, M., Hong, R., Yuan, X., Yan, S., Chua, T.-S.: Movie2comics: towards a lively video content presentation. IEEE Trans. Multimed. 14(3), 858–870 (2012)

    Article  Google Scholar 

  38. Yang, X., Du, X., Wang, M.: Learning to match on graph for fashion compatibility modeling. In: The AAAI Conference on Artificial Intelligence, pp. 287–294 (2020)

  39. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)

  40. Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)

    Article  Google Scholar 

  41. Nam, W., Dollár, P., Han, J.H.: Local decorrelation for improved pedestrian detection. In: Advances in Neural Information Processing Systems, pp. 424–432 (2014)

  42. Zhang, S., Benenson, R., Schiele, B., et al.. Filtered channel features for pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4–10 (2015)

  43. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  44. Girshick, R.: Fast r-cnn. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  45. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  46. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

  47. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  48. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016)

  49. Farenzena, M., Bazzani, L., Perina, A., Murino, V., Cristani, M.: Person re-identification by symmetry-driven accumulation of local features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2360–2367 (2010)

  50. Gray, D., Tao, H.: Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: European Conference on Computer Vision, pp. 262–275 (2008)

  51. Wu, L., Shen, C., Hengel, A.v.d.: Personnet: Person re-identification with deep convolutional neural networks. arXiv preprint arXiv:1601.07255 (2016)

  52. Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1249–1258 (2016)

  53. Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3908–3916 (2015)

  54. Wang, Z., Bai, X., Ye, M., Satoh, S.: Incremental deep hidden attribute learning. In: ACM Multimedia Conference on Multimedia Conference, pp. 72–80 (2018)

  55. Li, D., Chen, X., Zhang, Z., Huang, K.: Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 384–393 (2017)

  56. Yao, H., Zhang, S., Zhang, Y., Li, J., Qi, T.: Deep representation learning with part loss for person re-identification. IEEE Trans. Image Process. 28(6), 2860–2871 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  57. Chen, D., Zhang, S., Yang, J., Schiele, B.: Norm-aware embedding for efficient person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12615–12624 (2020)

Download references

Acknowledgements

The work was supported by National Natural Science Foundation of China (U1803262, 62171325), National Key R&D Project (2021YFC3320301), Key Project of Scientific Research Plan of Hubei Provincial Department of Education (No. D20211106), and Opening Foundation of Key Laboratory of Fundamental Science for National Defense on Vision Synthetization, Sichuan University, China (Grant No. 2021SCUVS003). The numerical calculations in this paper have been done on the super-computing system in the Supercomputing Center of Wuhan University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruimin Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, W., Wang, X., Wang, Z. et al. Dual-focus: person search from Coarse-Grained Focus to Fine-Grained Focus. Multimedia Systems 29, 3105–3114 (2023). https://doi.org/10.1007/s00530-022-00929-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-022-00929-3

Keywords

Navigation