Multi-depth dilated network for fashion landmark detection with batch-level online hard keypoint mining

https://doi.org/10.1016/j.imavis.2020.103930Get rights and content

Hightlights

  • A novel Multi-Depth Dilated (MDD)block that can efficiently extract different levels of large-scale context information, which is beneficial for the inference of hard keypoints, is proposed.

  • The Batch-level Online Hard Keypoints Mining(B-OHKM)method is proposed for training to further improve the effectiveness of hard keypoints detection.

  • It is demonstrated that anetwork (MDDNet) that uses the MDD block and B-OHKM training method obtains significant improvements over state-of-the-artmethodsonstandardbenchmarksforfashionlandmarkdetection.

Abstract

Deep learning has been applied to fashion landmark detection in recent years, and great progress has been made. However, the detection of hard keypoints, such as those which are occluded or invisible, remains challenging and must be addressed. To tackle this problem, in the feature exaction level a novel Multi-Depth Dilated (MDD) block which is composed of different numbers of dilated convolutions in parallel and a Multi-Depth Dilated Network (MDDNet) constructed by MDD blocks are proposed in this paper, and in the training level a network training method of Batch-level Online Hard Keypoint Mining (B-OHKM) is proposed. During the training of network, each clothing keypoint is one-to-one corresponding to the related loss value calculated at that keypoint. The greater the loss of the keypoint, the more difficult it is for the network to detect that keypoint. In that way, hard keypoints can be effectively mined, so that the network can be trained in a targeted manner to improve the performance of hard keypoints. The results of experiments on two large-scale fashion benchmark datasets demonstrate that the proposed MDDNet that uses the MDD block and B-OHKM method achieves state-of-the-art results.

Introduction

With the development of electronic commerce, such Amazon, Taobao, Jingdong, etc., clothing image analysis, including the study of fashion landmark detection [1], [2], clothing attribute prediction [3], [4], clothing retrieval [5], [6], and fashion recommendation [7], [8], has received much attention from researchers. The development and application of these analysis methods can bring huge commercial value to e-commerce companies, and lay a foundation for new application fields in the future, such as online custom clothing design and virtual dressing rooms. Additionally, the rapid development of deep neural networks [9], [10] and the emergence of large-scale clothing image datasets [2], [11] enable us to overcome these challenging tasks more quickly and effectively.

In this paper, one fundamental task and a key problem in clothing image analysis, namely fashion landmark detection, is addressed. Extracting features from detected fashion landmarks can significantly improve the performance of other types of clothing image analysis, such as clothing attribute prediction and clothing retrieval. The general aim of fashion landmark detection is to recognize and locate the functional keypoints defined on clothes, such as the corners of necklines, hemlines, and cuffs.

A recent notable work in fashion landmark detection is called the attentive fashion grammar network [12]. It is a model of encoding a knowledge set over fashion clothes based on deep learning, and uses two attention mechanisms to make the network pay more attention to the areas that surround the fashion landmarks. Although great progress has been made, there remains room for further improvement; for example, it is still difficult to locate hard keypoints that are either invisible or occluded by other objects in the image.

To address this problem, dilated convolution [13] was utilized in the present study to design the novel Multi-Depth Dilated (MDD) block that parallelly extracts the multi-depth convolutional features; this allows multi-level context information to be obtained and is beneficial for hard keypoints location. By stacking the MDD blocks, the hard keypoints can be located more accurately. In addition, to further improve the recognition of hard keypoints, the method of Batch-level Online Hard Keypoints Mining (B-OHKM) is also proposed for training. Compared with Online Hard Keypoints Mining (OHKM) [14] in a single image, this method focuses on hard keypoints mining in batch-level images, which is better for training a network to detect hard keypoints. Considering these two important factors for addressing the problem of hard keypoints detection, i.e., the feature extraction level and the network training level, the Multi-Depth Dilated Network (MDDNet) is ultimately proposed, and is demonstrated to outperform current state-of-the-art methods on two fashion benchmark datasets. Additionally, it is experimentally established that the proposed MDD block exhibits non-trivial improvements.

In summary, the main contributions of this work are three-fold:

1) A novel MDD block that can efficiently extract different levels of large-scale context information, which is beneficial for the inference of hard keypoints, is developed.

2) The B-OHKM method is proposed for training to further improve the effectiveness of hard keypoints detection.

3) It is demonstrated that a network (MDDNet) that uses the MDD block and B-OHKM training method obtains significant improvements over state-of-the-art methods on standard benchmarks for fashion landmark detection.

Section snippets

Related Work

Visual fashion understanding has recently attracted wide attention with the development of e-commerce, and a series of related applied subjects have been studied, including clothing identification [3], [15] and retrieval [5], [16], the prediction of semantic properties [4], [17], and fashion trend discovery [7], [18]. These research projects have all relied on the understanding of the information content of clothing images, which is transformed into the required information on the garments.

The Proposed Approach

In this section, data preprocessing for network input and training is first introduced. The architecture of the deep neural network that includes the structure of the proposed MDD block is then described in detail. Additionally, an explanation of how to stack MDD blocks for building the MDDNet that can effectively tackle the hard keypoints problem and accurately locate the fashion landmarks is provided. Finally, the network learning policy, which includes the training loss function and the

Experiment

The performance of the proposed method was evaluated on two large-scale fashion benchmark datasets, namely DeepFashion [2] and the Fashion Landmark Dataset (FLD) [1]. An ablation study was also carried out to verify the effectiveness of the proposed MDD block and B-OHKM method.

Conclusion

This paper proposed a Multi-Depth Dilated Network (MDDNet) that can effectively tackle the hard keypoints location problem, and hence accurately detect fashion landmarks. The network is stacked by the Multi-Depth Dilated (MDD) blocks used for extracting the multi-level, large-scale context information which is necessary for the inference of hard keypoints. In addition, the Batch-level Online Hard Keypoints Mining (B-OHKM) method was proposed for training to further increase the accuracy of hard

Acknowledgement

This work is supported by the National Key R&D Program of China under grant 2017YFB1002504, Shaanxi international science and technology cooperation and exchange program of China(2017KW-010).

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

References (28)

  • Z. Liu, S. Yan, P. Luo, X. Wang, and X. Tang, “Fashion Landmark Detection in the Wild,” in Computer Vision - ECCV 2016...
  • Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang, “DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich...
  • H. Chen, A. C. Gallagher, and B. Girod, “Describing Clothing by Semantic Attributes,” in Computer Vision - ECCV 2012 -...
  • X. Wang and T. Zhang, “Clothes search in consumer photos via color matching and attribute learning,” in Proceedings of...
  • M. H. Kiapour, X. Han, S. Lazebnik, A. C. Berg, and T. L. Berg, “Where to Buy It: Matching Street Clothing Photos in...
  • J. Huang, R. S. Feris, Q. Chen, and S. Yan, “Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network,”...
  • M. H. Kiapour, K. Yamaguchi, A. C. Berg, and T. L. Berg, “Hipster Wars: Discovering Elements of Fashion Styles,” in...
  • E. Simo-Serra, S. Fidler, F. Moreno-Noguer, and R. Urtasun, “Neuroaesthetics in fashion: Modeling the perception of...
  • K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” CoRR, vol....
  • K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on...
  • S. Yan, Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang, “Unconstrained Fashion Landmark Detection via Hierarchical...
  • W. Wang, Y. Xu, J. Shen, and S.-C. Zhu, “Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing...
  • F. Yu and V. Koltun, “Multi-Scale Context Aggregation by Dilated Convolutions,” CoRR, vol. abs/1511.07122,...
  • Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun, “Cascaded Pyramid Network for Multi-Person Pose Estimation,” in...
  • Cited by (7)

    • Development of Deep Learning-based Automatic Scan Range Setting Model for Lung Cancer Screening Low-dose CT Imaging

      2022, Academic Radiology
      Citation Excerpt :

      For the detection of hard keypoints, such as obscured or invisible keypoints, we use the OHKM technique. It has a strong ability to express the keypoint information, and helps the network to understand the positional relationship between keypoints (28). By using this technique, the hard keypoints can be more accurately located, and the overall keypoints detection effect can be further improved.

    • Intelligent garment detection using deep learning

      2022, Handbook of Intelligent Computing and Optimization for Sustainable Development
    • Layout-Aware Bidirectional Transfer Network for Fashion Landmark Detection

      2022, Proceedings of SPIE - The International Society for Optical Engineering
    • Fashion Landmark Detection via Deep Residual Spatial Attention Network

      2021, Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
    View all citing articles on Scopus
    View full text