Two-attribute e-commerce image classification based on a convolutional neural network

Cao, Zhihao; Mu, Shaomin; Dong, Mengping

doi:10.1007/s00371-019-01763-x

Two-attribute e-commerce image classification based on a convolutional neural network

Original Article
Published: 05 October 2019

Volume 36, pages 1619–1634, (2020)
Cite this article

The Visual Computer Aims and scope Submit manuscript

511 Accesses
9 Citations
Explore all metrics

Abstract

A novel two-task learning method based on an improved convolutional neural network (CNN) using the idea of parameter transfer in transfer learning is proposed, aiming at the problem that a traditional convolutional neural network cannot simultaneously classify two attributes of e-commerce images. The network designed in this method has two channels, and each channel is responsible for learning a unique attribute of the image. First, the network is pre-trained by the channel corresponding to the most important attribute in the image, and the former network parameters are optimized. Then, two channels are used to train the network simultaneously. In the training process, the two learning tasks help each other by sharing parameters, which improves the convergence speed of the network and the generalization ability of the model. Aiming at the problem that there are fewer specific types of e-commerce images in datasets and the problem of class imbalance exists, a method of over-sampling based on the mix-up algorithm is proposed. The relationship between the complexity of the two attributes and the sparse rate of the CNN output feature matrix is studied, and the improved Grad-CAM algorithm is used to visualize and analyze the key areas for classification of two attributes, which improves the interpretability of the network. Experiments show that the proposed CNN method has good classification effect for two-attribute e-commerce images and traditional images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Face Attribute Recognition Based on Multistage Adaptive Task Convolution Neural Network

The Effect of Color Channel Representations on the Transferability of Convolutional Neural Networks

Supervised Contrastive Learning for Product Classification

References

Abdulnabi, A.H., Wang, G., Lu, J., Jia, K.: Multi-task CNN model for attribute prediction. IEEE Trans. Multimed. 17(11), 1949–1959 (2015)
Article Google Scholar
Ak, K.E., Lim, J.H., Tham, J.Y., Kassim, A.A.: Efficient multi-attribute similarity learning towards attribute-based fashion search. In: 2018 IEEE Winter Conference on Applications of Computer Vision, IEEE, pp. 1671–1679 (2018)
Bao, Q.P., Sun, Z.F.: Clothing image classification and retrieval based on metric learning. Comput. Appl. Softw. 34(4), 255–259 (2017). https://doi.org/10.3969/j.issn.1000-386x.2017.04.043
Article Google Scholar
Baxter, J.: A Bayesian/information theoretic model of learning to learn via multiple task sampling. Mach. Learn. 28(1), 7–39 (1997)
Article Google Scholar
Bonilla, E.V., Chai, K.M., Williams, C.: Multi-task Gaussian process prediction. In: Advances in Neural Information Processing Systems, pp. 153–160 (2008)
Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack, T., Van Gool, L.: Apparel classification with style. In: Asian Conference on Computer Vision. Springer, pp. 321–335 (2012)
Bui, G., Le, T., Morago, B., Duan, Y.: Point-based rendering enhancement via deep learning. Vis. Comput. 34(6–8), 829–841 (2018)
Article Google Scholar
Cai, N., Su, Z., Lin, Z., Wang, H., Yang, Z., Ling, B.W.K.: Blind inpainting using the fully convolutional neural network. Vis. Comput. 33(2), 249–261 (2017)
Article Google Scholar
Ching, T., Himmelstein, D.S., Beaulieu-Jones, B.K., Kalinin, A.A., Do, B.T., Way, G.P., Xie, W.: Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15(141), 20170387 (2018)
Article Google Scholar
Das, A., Agrawal, H., Zitnick, L., Parikh, D., Batra, D.: Human attention in visual question answering: do humans and deep networks look at the same regions? Comput. Vis. Image Underst. 163, 90–100 (2017)
Article Google Scholar
Eaton-Rosen, Z., Bragman, F., Ourselin, S., Cardoso, M.J.: Improving data augmentation for medical image segmentation. In: International Conference on Medical Imaging with Deep Learning (2018)
Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 109–117 (2004)
Finkel, J.R., Manning, C.D.: Hierarchical bayesian domain adaptation. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 602–610 (2009)
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6904–6913 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Huang, S., Li, X., Cheng, Z.Q., Zhang, Z., Hauptmann, A.: GNAS: a greedy neural architecture search method for multi-attribute learning. In: 2018 ACM Multimedia Conference on Multimedia Conference, ACM, pp. 2049–2057 (2018)
Inoue, H.: Data augmentation by pairing samples for images classification. arXiv preprint arXiv:1801.02929 (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Li, D., Chen, X., Huang, K.: Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), IEEE, pp. 111–115 (2015)
Li, J.C., Yuan, C., Song, Y.: Multi-label image annotation based on convolutional neural network. Comput. Sci. 43(07), 41–45 (2016)
Google Scholar
Li, X., Huang, H., Zhao, H., Wang, Y., Hu, M.: Learning a convolutional neural network for propagation-based stereo image segmentation. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1582-y
Article Google Scholar
Liu, S., Song, Z., Liu, G., Xu, C., Lu, H., Yan, S.: Street-to-shop: cross-scenario clothing retrieval via parts alignment and auxiliary set. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 3330–3337 (2012)
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp. 1717–1724 (2014)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Park, J.K., Kang, D.J.: Unified convolutional neural network for direct facial keypoints detection. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1561-3
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Shu, X., Qi, G.J., Tang, J., Wang, J.: Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 35–44 (2015)
Shu, X., Tang, J., Qi, G.J., Li, Z., Jiang, Y.G., Yan, S.: Image classification with tailored fine-grained dictionaries. IEEE Trans. Circuits Syst. Video Technol. 28(2), 454–467 (2018)
Article Google Scholar
Shu, X., Tang, J., Qi, G.J., Liu, W., Yang, J.: Hierarchical long short-term concurrent memory for human interaction recognition. arXiv preprint arXiv: 1811.00270 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Taigman, Y., Yang, M., Ranzato, M.A., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)
Verma, V., Lamb, A., Beckham, C., Courville, A., Mitliagkis, I., Bengio, Y.: Manifold mixup: encouraging meaningful on-manifold interpolation as a regularizer. arXiv preprint arXiv:1806.05236 (2018)
Wang, Y.W., Tang, L., Liu, Y.L., Chen, Q.B.: Vehicle multi-attribute recognition based on multi-task convolutional neural network. Comput. Eng. Appl. 54(08), 21–27 (2018). https://doi.org/10.3778/j.issn.1002-8331.1801-0170
Article Google Scholar
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)
Yu, K., Tresp, V., Schwaighofer, A.: Learning Gaussian processes from multiple tasks. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 1012–1019 (2005)
Zhang, H., Cisse, M., Dauphin, Y. N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)
Zhao, J., Mao, X., Zhang, J.: Learning deep facial expression features from image and optical flow sequences using 3D CNN. Vis. Comput. 34(10), 1461–1475 (2018)
Article Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Zhou, F., Hu, Y., Shen, X.: MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1559-x
Article Google Scholar
Zhou, Z.H.: Machine Learning. Tsinghua University Press, Beijing (2015). (in Chinese)
Google Scholar

Download references

Funding

This work is supported by the First Class Discipline Funding of Shandong Agricultural University.

Author information

Authors and Affiliations

College of Information Science and Engineering, Shandong Agricultural University, Tai’an, 271018, Shandong, China
Zhihao Cao, Shaomin Mu & Mengping Dong

Authors

Zhihao Cao
View author publications
You can also search for this author in PubMed Google Scholar
Shaomin Mu
View author publications
You can also search for this author in PubMed Google Scholar
Mengping Dong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaomin Mu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, Z., Mu, S. & Dong, M. Two-attribute e-commerce image classification based on a convolutional neural network. Vis Comput 36, 1619–1634 (2020). https://doi.org/10.1007/s00371-019-01763-x

Download citation

Published: 05 October 2019
Issue Date: August 2020
DOI: https://doi.org/10.1007/s00371-019-01763-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-attribute e-commerce image classification based on a convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Face Attribute Recognition Based on Multistage Adaptive Task Convolution Neural Network

The Effect of Color Channel Representations on the Transferability of Convolutional Neural Networks

Supervised Contrastive Learning for Product Classification

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two-attribute e-commerce image classification based on a convolutional neural network

Abstract

Access this article

Similar content being viewed by others

Face Attribute Recognition Based on Multistage Adaptive Task Convolution Neural Network

The Effect of Color Channel Representations on the Transferability of Convolutional Neural Networks

Supervised Contrastive Learning for Product Classification

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation