Abstract:
The main challenge of Unsupervised Domain Adaptation (UDA) crowd counting is the large domain gap between a synthetic domain with annotations (source) and a real-world do...Show MoreMetadata
Abstract:
The main challenge of Unsupervised Domain Adaptation (UDA) crowd counting is the large domain gap between a synthetic domain with annotations (source) and a real-world domain of interest without annotations (target). Previous mainstream UDA crowd counting methods either employ feature alignment or a semi-supervised learning paradigm via pseudo-labels. We for the first time combine both of their advantages and propose an Adversarial Mean Teacher (AMT) framework. On the one hand, we optimize the student model with domain adversarial learning. On the other hand, we feed perturbed target images to the teacher model to generate pseudo-labels. Furthermore, to improve the quality of the pseudo-labels, we propose an Adaptive Teaching (AT) module, consisting of pseudo-label refinement and credible pseudo-label selection. Concretely, we first generate two candidate pseudo-labels from the prediction of the teacher model and obtain a refined pseudo-label by mixing them at the pixel-level. Moreover, we introduce an auxiliary task of foreground-background classification to assist credible region selection and only activate supervision signals on those regions. Extensive experiments on four real-world crowd counting benchmarks demonstrate the effectiveness of our method namely Cross-Domain Adaptive Teacher (CDAT).
Published in: IEEE Transactions on Multimedia ( Volume: 26)