Pedestrian Attribute Recognition (PAR) can provide valuable clues for several innovative surveillance applications. It is also a difficult task because inference of the multiple attributes at a far distance is challenging in real complex scenarios. Most existing methods improve the PAR with visual attention mechanisms or body-part detection modules, which increase the complexity of networks and require manual annotations of the human body. Also, uneven data distribution, leading to a decline in recall values, is still underestimated. This paper presents a novel multi-label optimization algorithm to mitigate these issues, named Multi-label Contrastive Focal Loss (MCFL). Specifically, we first propose a multi-label focal loss to emphasize the error-prone and minority attributes with a separated re-weighting scheme. And then, we introduce a multi-label contrastive learning strategy based on the multi-label divergences to help the deep network to distinguish the hard fine-grained attributes. We conduct extensive experiments on seven PAR benchmarks, and results indicate that the proposed MCFL with the native ResNet-50 backbone surpasses the state-of-the-art comparison methods in mean accuracy and recall.

This research is supported by the National Nature Science Foundation of China (61902370), and in part by the Chongqing Research Program of Technology Innovation and Application (cstc2019jscx-zdztzxX0019), and is also by the key cooperation project of the Chongqing Municipal Eduction Commission (HZ2021008 and HZ2021017).
Chen, L., Song, J., Zhang, X. et al. MCFL: multi-label contrastive focal loss for deep imbalanced pedestrian attribute recognition. Neural Comput & Applic 34, 16701–16715 (2022). https://doi.org/10.1007/s00521-022-07300-7
