Abstract
Most of fine-grained recognition researches are implemented based on generic classification models as the backbone. However, it is a sub-optimal choice because the differences between similar categories in this task are so small that the models must capture discriminative fine-grained subtle variances. In this paper, we design a dedicated backbone network for fine-grained recognition. To this end, we propose a novel Disentangled Feature Network (DFN) that gradually disentangles and incorporates coarse- and fine-grained features to explicitly capture multi-grained features. Thus, it promotes the models to learn more representative features that potentially determine the classification results via easily replacing the original inappropriate backbone. Moreover, we further present an optional error correction loss to adaptively penalize misclassification between extremely similar categories and guide to capture fine-grained feature diversity. Extensive experiments fully demonstrate that when adopting our DFN as the backbone, like freebies, the baseline models boost the performance by about 2% with negligible extra parameters on widely used CUB, AirCraft, and Stanford Car dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chang, D., et al.: The devil is in the channels: mutual-channel loss for fine-grained image classification. In: TIP, pp. 4683ā4695 (2020)
Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., Liu, Z.: Dynamic convolution: attention over convolution kernels. In: CVPR, June 2020
Chen, Y., Bai, Y., Zhang, W., Mei, T.: Destruction and construction learning for fine-grained image recognition. In: CVPR, pp. 5157ā5166 (2019)
Cheng, C., et al.: Dual skipping networks. In: CVPR (2018)
Ding, Y., Zhou, Y., Zhu, Y., Ye, Q., Jiao, J.: Selective sparse sampling for fine-grained image recognition. In: ICCV, October 2019
Duta, I.C., Liu, L., Zhu, F., Shao, L.: Pyramidal convolution: rethinking convolutional neural networks for visual recognition (2020)
Gao, S., Cheng, M., Zhao, K., Zhang, X., Yang, M., Torr, P.H.S.: Res2Net: a new multi-scale backbone architecture. In: TPAMI, p. 1 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, June 2016
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. ArXiv (2015)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, June 2018
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, July 2017
Ji, R., et al.: Attention convolutional binary neural tree for fine-grained visual categorization. In: CVPR, June 2020
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, July 2017
Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV, December 2015
Liu, J.J., Hou, Q., Cheng, M.M., Feng, J., Wang, C.: Improving convolutional networks with self-calibrated convolutions. In: CVPR (2020)
shawnleezx: calculating receptive field of CNN (2017). http://shawnleezx.github.io/blog/2017/02/11/calculating-receptive-field-of-cnn
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: CVPR (2014)
Sun, M., Yuan, Y., Zhou, F., Ding, E.: Multi-attention multi-class constraint for fine-grained image recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 834ā850. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_49
Yu, C., Zhao, X., Zheng, Q., Zhang, P., You, X.: Hierarchical bilinear pooling for fine-grained visual recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 595ā610. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_35
Zhang, H., et al.: ResNeSt: split-attention networks. ArXiv (2020)
Zhou, M., Bai, Y., Zhang, W., Zhao, T., Mei, T.: Look-into-object: self-supervised structure modeling for object recognition. In: CVPR, June 2020
Zhuang, P., Wang, Y., Qiao, Y.: Learning attentive pairwise interaction for fine-grained classification. In: AAAI, vol. 34, pp. 13130ā13137 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Miao, S. et al. (2021). Disentangled Feature Network forĀ Fine-Grained Recognition. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13109. Springer, Cham. https://doi.org/10.1007/978-3-030-92270-2_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-92270-2_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92269-6
Online ISBN: 978-3-030-92270-2
eBook Packages: Computer ScienceComputer Science (R0)