Skip to main content
Log in

Two-attribute e-commerce image classification based on a convolutional neural network

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

A novel two-task learning method based on an improved convolutional neural network (CNN) using the idea of parameter transfer in transfer learning is proposed, aiming at the problem that a traditional convolutional neural network cannot simultaneously classify two attributes of e-commerce images. The network designed in this method has two channels, and each channel is responsible for learning a unique attribute of the image. First, the network is pre-trained by the channel corresponding to the most important attribute in the image, and the former network parameters are optimized. Then, two channels are used to train the network simultaneously. In the training process, the two learning tasks help each other by sharing parameters, which improves the convergence speed of the network and the generalization ability of the model. Aiming at the problem that there are fewer specific types of e-commerce images in datasets and the problem of class imbalance exists, a method of over-sampling based on the mix-up algorithm is proposed. The relationship between the complexity of the two attributes and the sparse rate of the CNN output feature matrix is studied, and the improved Grad-CAM algorithm is used to visualize and analyze the key areas for classification of two attributes, which improves the interpretability of the network. Experiments show that the proposed CNN method has good classification effect for two-attribute e-commerce images and traditional images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Abdulnabi, A.H., Wang, G., Lu, J., Jia, K.: Multi-task CNN model for attribute prediction. IEEE Trans. Multimed. 17(11), 1949–1959 (2015)

    Article  Google Scholar 

  2. Ak, K.E., Lim, J.H., Tham, J.Y., Kassim, A.A.: Efficient multi-attribute similarity learning towards attribute-based fashion search. In: 2018 IEEE Winter Conference on Applications of Computer Vision, IEEE, pp. 1671–1679 (2018)

  3. Bao, Q.P., Sun, Z.F.: Clothing image classification and retrieval based on metric learning. Comput. Appl. Softw. 34(4), 255–259 (2017). https://doi.org/10.3969/j.issn.1000-386x.2017.04.043

    Article  Google Scholar 

  4. Baxter, J.: A Bayesian/information theoretic model of learning to learn via multiple task sampling. Mach. Learn. 28(1), 7–39 (1997)

    Article  Google Scholar 

  5. Bonilla, E.V., Chai, K.M., Williams, C.: Multi-task Gaussian process prediction. In: Advances in Neural Information Processing Systems, pp. 153–160 (2008)

  6. Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack, T., Van Gool, L.: Apparel classification with style. In: Asian Conference on Computer Vision. Springer, pp. 321–335 (2012)

  7. Bui, G., Le, T., Morago, B., Duan, Y.: Point-based rendering enhancement via deep learning. Vis. Comput. 34(6–8), 829–841 (2018)

    Article  Google Scholar 

  8. Cai, N., Su, Z., Lin, Z., Wang, H., Yang, Z., Ling, B.W.K.: Blind inpainting using the fully convolutional neural network. Vis. Comput. 33(2), 249–261 (2017)

    Article  Google Scholar 

  9. Ching, T., Himmelstein, D.S., Beaulieu-Jones, B.K., Kalinin, A.A., Do, B.T., Way, G.P., Xie, W.: Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15(141), 20170387 (2018)

    Article  Google Scholar 

  10. Das, A., Agrawal, H., Zitnick, L., Parikh, D., Batra, D.: Human attention in visual question answering: do humans and deep networks look at the same regions? Comput. Vis. Image Underst. 163, 90–100 (2017)

    Article  Google Scholar 

  11. Eaton-Rosen, Z., Bragman, F., Ourselin, S., Cardoso, M.J.: Improving data augmentation for medical image segmentation. In: International Conference on Medical Imaging with Deep Learning (2018)

  12. Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 109–117 (2004)

  13. Finkel, J.R., Manning, C.D.: Hierarchical bayesian domain adaptation. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 602–610 (2009)

  14. Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6904–6913 (2017)

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  16. Huang, S., Li, X., Cheng, Z.Q., Zhang, Z., Hauptmann, A.: GNAS: a greedy neural architecture search method for multi-attribute learning. In: 2018 ACM Multimedia Conference on Multimedia Conference, ACM, pp. 2049–2057 (2018)

  17. Inoue, H.: Data augmentation by pairing samples for images classification. arXiv preprint arXiv:1801.02929 (2018)

  18. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

  19. Li, D., Chen, X., Huang, K.: Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), IEEE, pp. 111–115 (2015)

  20. Li, J.C., Yuan, C., Song, Y.: Multi-label image annotation based on convolutional neural network. Comput. Sci. 43(07), 41–45 (2016)

    Google Scholar 

  21. Li, X., Huang, H., Zhao, H., Wang, Y., Hu, M.: Learning a convolutional neural network for propagation-based stereo image segmentation. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1582-y

    Article  Google Scholar 

  22. Liu, S., Song, Z., Liu, G., Xu, C., Lu, H., Yan, S.: Street-to-shop: cross-scenario clothing retrieval via parts alignment and auxiliary set. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 3330–3337 (2012)

  23. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp. 1717–1724 (2014)

  24. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  25. Park, J.K., Kang, D.J.: Unified convolutional neural network for direct facial keypoints detection. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1561-3

    Article  Google Scholar 

  26. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  27. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

  28. Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)

  29. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

  30. Shu, X., Qi, G.J., Tang, J., Wang, J.: Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 35–44 (2015)

  31. Shu, X., Tang, J., Qi, G.J., Li, Z., Jiang, Y.G., Yan, S.: Image classification with tailored fine-grained dictionaries. IEEE Trans. Circuits Syst. Video Technol. 28(2), 454–467 (2018)

    Article  Google Scholar 

  32. Shu, X., Tang, J., Qi, G.J., Liu, W., Yang, J.: Hierarchical long short-term concurrent memory for human interaction recognition. arXiv preprint arXiv: 1811.00270 (2018)

  33. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  34. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  35. Taigman, Y., Yang, M., Ranzato, M.A., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)

  36. Verma, V., Lamb, A., Beckham, C., Courville, A., Mitliagkis, I., Bengio, Y.: Manifold mixup: encouraging meaningful on-manifold interpolation as a regularizer. arXiv preprint arXiv:1806.05236 (2018)

  37. Wang, Y.W., Tang, L., Liu, Y.L., Chen, Q.B.: Vehicle multi-attribute recognition based on multi-task convolutional neural network. Comput. Eng. Appl. 54(08), 21–27 (2018). https://doi.org/10.3778/j.issn.1002-8331.1801-0170

    Article  Google Scholar 

  38. Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)

  39. Yu, K., Tresp, V., Schwaighofer, A.: Learning Gaussian processes from multiple tasks. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 1012–1019 (2005)

  40. Zhang, H., Cisse, M., Dauphin, Y. N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)

  41. Zhao, J., Mao, X., Zhang, J.: Learning deep facial expression features from image and optical flow sequences using 3D CNN. Vis. Comput. 34(10), 1461–1475 (2018)

    Article  Google Scholar 

  42. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)

  43. Zhou, F., Hu, Y., Shen, X.: MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1559-x

    Article  Google Scholar 

  44. Zhou, Z.H.: Machine Learning. Tsinghua University Press, Beijing (2015). (in Chinese)

    Google Scholar 

Download references

Funding

This work is supported by the First Class Discipline Funding of Shandong Agricultural University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaomin Mu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, Z., Mu, S. & Dong, M. Two-attribute e-commerce image classification based on a convolutional neural network. Vis Comput 36, 1619–1634 (2020). https://doi.org/10.1007/s00371-019-01763-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-019-01763-x

Keywords

Navigation