Combining Cross Entropy Loss with Manually Defined Hard Example for Semantic Image Segmentation

Deng, Zelu; Gao, Jianbin; Huang, Tao; Gee, James C.

doi:10.1007/978-3-030-34120-6_3

Zelu Deng^14,15,
Jianbin Gao^14,15,
Tao Huang^14,15 &
…
James C. Gee¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11901))

Included in the following conference series:

International Conference on Image and Graphics

2050 Accesses

Abstract

Semantic image segmentation has been one of the fundamental tasks in computer vision, which aims to assign a label to each pixel in an image. Nowadays, approaches based on fully convolutional network (FCN) have shown state-of-the-art performance in this task. However, most of them adopt cross entropy as the loss function, which will lead to poor performance in regions near object boundary. In this paper, we introduce two region-based metrics to quantitatively evaluate the performance of segmentation detail, which provides insights about the bottleneck of model. Based on this analysis, by use of a modified multi-task learning scheme, we combine cross entropy loss with manually defined hard example to propose a simple yet effective loss function named $\mathcal {L}_\mathrm{{cehe}}$, which helps model focus on the learning of segmentation detail. Experiments show that model using $\mathcal {L}_\mathrm{{cehe}}$ can better utilize spatial information comparing with the conventional cross entropy loss $\mathcal {L}_\mathrm{{ce}}$. Statistically, metrics indicate that the proposed method outperforms the widely used $\mathcal {L}_\mathrm{{ce}}$ by $1.12\%$ in terms of MIoU on Cityscapes validation set, and by $4.15\%$ in terms of the region-based metric MIoUiER proposed in this paper, proving that $\mathcal {L}_\mathrm{{cehe}}$ performs better in segmentation detail.

You have full access to this open access chapter, Download conference paper PDF

Boundary-aware dichotomous image segmentation

Article 26 February 2024

Haonan Tang, Shuhan Chen, … Xuelong Hu

SFSM: sensitive feature selection module for image semantic segmentation

Article 26 September 2022

Yan Gao, Xiangjiu Che, … Huan Xu

Semantic Guided Deep Unsupervised Image Segmentation

Keywords

1 Introduction

Over the past years, the performance of semantic image segmentation, a per-pixel classification problem, has been dramatically advanced by fully convolutional network (FCN) based approaches [1]. Generally, FCN can be converted from a classification model [2,3,4] pre-trained on ImageNet [5] by replacing fully connected layers with corresponding convolution ones. However, due to the strided layers existing in FCN, result usually performs poorly where small objects or object boundary exists, namely segmentation detail. In order to solve this problem, atrous convolution [6] and many other novel modules [7, 8] are introduced, attempting to preserve or recover spatial information when doing strided operations. However, most of the work seems to give model the capability to handle spatial information, instead of learning it.

The current mainstream loss function for semantic image segmentation is cross entropy, which treats all pixels equally. However, in the context of semantic segmentation, there are some pixels for which model is more difficult to make the right prediction. This is also in line with intuition, for human-beings, we can easily roughly circle the object in an image, but precisely segment it requiring careful consideration. This motivates us to develop a method that pays more attention to these pixels.

In this paper, we introduce two region-based metrics to analyze the performance bottleneck of model and based on this analysis, we propose a simple yet effective loss function $\mathcal {L}_\mathrm{{cehe}}$ by combining cross entropy with hard example [9], which can alleviate the problem discovered by region-based metrics. The proposed $\mathcal {L}_\mathrm{{cehe}}$ can be implemented as cross entropy loss $\mathcal {L}_\mathrm{{ce}}$ with pixel-wise weight, which can replace $\mathcal {L}_\mathrm{{ce}}$ without damage to training speed. Experiments show that model using $\mathcal {L}_\mathrm{{cehe}}$ outperforms its counterpart $\mathcal {L}_\mathrm{{ce}}$ by $1.12\%$ in terms of MIoU on Cityscapes validation set, and by $4.15\%$ in terms of the region-based metric MIoUiER proposed in this paper, indicating that our proposed method performs better in segmentation detail.

In summary, our contributions are:

We propose two region-based metrics which can quantitatively evaluate the performance of segmentation detail.
By analyzing model using region-based metrics, we find the key factor that limits model’s performance, which can provide insights for future research.
We propose a simple yet effective loss function $\mathcal {L}_\mathrm{{cehe}}$, which outperforms the widely used cross entropy $\mathcal {L}_\mathrm{{ce}}$.

2 Related Work

Approaches based on FCN have made remarkable progress in the field of semantic image segmentation. However, some properties, such as spatial in-variance, which make deep convolution networks successful in image classification, are precisely the factors that lead model failing to produce fine-grained segmentation. Quite a part of research focuses on preserving or recovering spatial information [6, 8, 10, 11]. Besides, compared with classification, this dense prediction task has it own properties which we need take into consideration. Current methods to solve problems existing in semantic image segmentation can be divided into three categories: approaches solving intra-class inconsistency or inter-class indistinction problem, or simultaneously both [12]. In this paper, we focus on the inter-class indistinction problem.

Atrous convolution [6, 13] is a solution for preserving spatial information and keeping receptive size at the same time. What’s more, it will not introduce extra computation by sparsely sampling the input feature map. Currently, nearly all of the segmentation models replace the conventional convolution by the atrous one in the deep layer considering the trade-off between performance and memory usage. However, simply stacking atrous convolutions may cause the gridding issue described in [14] and they propose Hybrid Dilated Convolution (HDC) to alleviate this problem.

As for recovering spatial information, there is no general solution. The most straightforward method is to combine the low-level spatial information and the high-level semantic one by simply adding or stacking them together. This idea produces an encoder-decoder series like UNet [15], which shows good performance in the field of medical image segmentation. However, in semantic segmentation task, due to the complex content of the input image, the above method seems to cause chaos when fusing feature of different levels. Thus, various methods are proposed to alleviate this problem. For example, SegNet [7] is an instance of encoder-decoder series, which memorizes the indices of response when doing max-pooling and then uses them in decoder stage. RefineNet [8] presents a multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections.

The concept of hard example [9], proposed in the field of object detection, can be generalized to semantic segmentation task by regarding each pixel as an example. Based on this, [16] only back propagates the gradients of hard example determined by the predicted probability and a threshold, which is a fairly direct expansion of the idea stated in [9]. [17] divides a deep model into several cascade sub-models and each sub-model only operates on the hard example decided by the previous one, here hard example is also defined by a pre-defined threshold and the predicted probability. These work defines the term hard exactly as in [9], different from them, we define hard example based on our analysis of model, and then integrate this information into the loss function by multi-task learning scheme.

3 Method

3.1 Region Partition

Given an image, we divide it into two parts: edge region and object region. Formally, assuming (x, y) and set $I = \{(x,y)\}$ represent the coordinate of a pixel and all pixels in an image respectively, we define object region $I_\mathrm{{object}} = I - I_\mathrm{{edge}}$, where $I_\mathrm{{edge}}$ denotes edge region. From the definition of these two regions, we can see that the key step of region partition is to obtain $I_\mathrm{{edge}}$. The following will introduce how to get the edge map of an image and how to efficiently and quantitatively obtain $I_\mathrm{{edge}}$.

In the dataset of semantic image segmentation task, there are usually two types of images: original image and ground truth. The value of pixel in ground truth represents the target class which the corresponding pixel in the original image belongs to. Because of this unique property of ground truth, we can utilize Canny [18] algorithm to extract edge map of an image by setting the threshold $t_1$ and $t_2$ of Canny to 0. Some examples are illustrated with Fig. 1.

In order to efficiently and quantitatively obtain $I_\mathrm{{edge}}$, we use chessboard distance as the distance between pixels in an image. Assuming that two pixels $q_i, q_j \in I$ and the distance between them is $d_{ij}$, the edge region $I_\mathrm{{edge}}$ is quantitatively defined based on $d_{ij}$. We let $I_\mathrm{{canny}}$ denote the set of edge pixels obtained by Canny algorithm described above, as shown in Fig. 1(b), then we define $I_\mathrm{{edge}}^{(r)} = \{q\,|\,q \in I \ \mathrm{{and}} \ d(q, q_\mathrm{{canny}}) < r, \exists \, q_\mathrm{{canny}} \in I_\mathrm{{canny}}\}$, where r is called the radius of edge region. By using chessboard distance, the process of computing $I_\mathrm{{edge}}^{(r)}$ can be efficiently implemented by convolution operation in modern deep learning framework.

Assuming $E_{\mathrm{{H}} \times \mathrm{{W}}}$ represents the edge map obtained by Canny algorithm, where the value of pixel is 1 if it’s an edge pixel, otherwise 0, as shown in Fig. 1(b). The method for efficiently computing $I_\mathrm{{edge}}^{(r)}$ is summarized in Algorithm 1.

3.2 Loss Function

Cross Entropy. Cross entropy is widely used as the loss function for semantic segmentation, which can be formulized as Eq. 1.

$$\begin{aligned} \mathcal {L}_\mathrm{{ce}}=-\frac{1}{N}\sum _{i=1}^N \sum _{j=1}^K \mathcal {I}\{y_i=j\}\log {p_{ij}} \end{aligned}$$

(1)

where N and K represent the number of pixels and classes, respectively. $y_i$ is the target class of pixel i, and $p_{ij}$ is the probability of pixel i assigned to class j. $\mathcal {I}\{\cdot \}$ is indicator function whose value is set to 1 if condition is satisfied, otherwise 0.

Combine Cross Entropy with Hard Example. $\mathcal {L}_\mathrm{{ce}}$ implies that all pixels equally contribute to the total loss, however, it seems that some pixels are more difficult to be correctly predicted, as detailed in [16, 19]. Different from them, we combine cross entropy with manually defined hard example by multi-task learning scheme, as shown in Eq. 2.

$$\begin{aligned} \mathcal {L}_\mathrm{{cehe}} = \mathcal {L}_\mathrm{{ce}} + \lambda \mathcal {L}_\mathrm{{he}} \end{aligned}$$

(2)

where $\mathcal {L}_\mathrm{{he}}$ is the loss function for hard example and $\lambda $ is a weight factor for these two losses.

We manually define pixels in edge region are hard example for semantic segmentation task, and the radius r of edge region is a hyper-parameter. The reason for this definition will be discussed in the experiment part. Function m(i) indicates that whether pixel i is hard example, and it is defined as below:

$$\begin{aligned} m(i) = \left\{ \begin{aligned} 1,&\quad \text {pixel}\ i \in I_{\text {edge}}^{(r)}\\ 0,&\quad \text {otherwise} \end{aligned}\right. \end{aligned}$$

(3)

Then we can formulate $\mathcal {L}_\mathrm{{he}}$ as:

$$\begin{aligned} \mathcal {L}_\mathrm{{he}} = -\frac{1}{N}\sum _{i=1}^N m(i)\sum _{j=1}^K \mathcal {I}\{y_i=j\}\log {p_{ij}} \end{aligned}$$

(4)

Different from the conventional multi-task learning, here we compute $\mathcal {L}_\mathrm{{ce}}$ and $\mathcal {L}_\mathrm{{he}}$ on the same logits outputted by model, so them can be merged into a single loss function, shown as below:

$$\begin{aligned} \mathcal {L}_\mathrm{{cehe}} =-\frac{1}{N}\sum _{i=1}^N \sum _{j=1}^K \mathcal {I}\{y_i=j\}(1 + \lambda m(i))\log {p_{ij}} \end{aligned}$$

(5)

3.3 Hierarchical Edge Region

In our experiments, the performance of $\mathcal {L}_\mathrm{{cehe}}$ largely depends on the choice of $\lambda $ and r. In order to alleviate this problem, we further divide edge region into different levels by the shortest distance between pixel in edge region and edge pixels obtained by Canny algorithm. Formally, $I_\mathrm{{edge}}^{(r_1, r_2, \cdots , r_n)}$ represents an edge region of n levels, and ith region equals the set $I_\mathrm{{edge}}^{(r_i)} - I_\mathrm{{edge}}^{(r_{i - 1})}$ when $i > 1$ or $I_\mathrm{{edge}}^{(r_1)}$ when $i = 1$. Figure 2 shows some examples.

We re-definite m(i) according to the number of levels, shown as below:

$$\begin{aligned} m(i) = \left\{ \begin{aligned}&n - l(i) + 1, \quad \mathrm{{pixel}}\ i \in I_\mathrm{{edge}}^{(r_1, \cdots , r_n)}\\&0,\quad \quad \quad \quad \quad \ \ \mathrm{{otherwise}} \end{aligned}\right. \end{aligned}$$

(6)

where $l(i) \in \{1, 2, \cdots , n\}$ is level index of pixel i.

4 Experiment

The purpose of this paper is to improve the performance of segmentation detail, rather than push the state-of-the-art. All experiments are conducted on a TITAN X (Pascal) GPU with 12 GB RAM, and the training parameters are detailed in the following part so that the results are easy to reproduce.

4.1 Region-Based Metric

Conventional metric like MIoU cannot quantitatively evaluate the performance of segmentation detail. In order to solve this problem, we introduce two region-based metrics: MIoUiER and MIoUiOR, which are defined on edge and object region, respectively. The calculation method of them is the same as MIoU except that MIoUiER only considers pixels belonging to set $I_\mathrm{{edge}}$ and MIoUiOR set $I_\mathrm{{object}}$. The former can quantitatively evaluate the performance of segmentation detail. It should be noted that both of them are the function of radius r.

4.2 Dataset

We adopt Cityscapes [20] as the evaluation dataset. This dataset involves 19 semantic labels for segmentation task, which belong to 7 groups: flat, human, vehicle, construction, object, nature and sky. The dataset focuses on semantic understanding of urban street scenes, which has 5,000 fine and 20,000 coarse annotations. The former contains 2,975 (train), 500 (val) and 1,525 (test) pixel-level labeled images for training, validation and test, respectively. Previous work shows that model pre-trained on coarse annotations will have superior performance. Since our purpose is to study the effect of the proposed method rather than push the state-of-the-art, we will not use coarse annotations for the simplicity of training process. The performance is measured by MIoU, MIoUiER and MIoUiOR over 19 classes.

4.3 Implementation Details

Model. We use PyTorch framework for implementation, and we adopt DeepLab V3 Plus [11] with ResNet-50 [3] as the backbone. The output stride is set to 16. It should be noted that the proposed method can be applied to any model which uses cross entropy as the loss function.

Data Preprocess. Data augmentation is a powerful way to expand dataset and it makes the learned model robust to input varieties. Similar to previous work [11], we first scale the image by a factor randomly chosen from a pre-defined array (0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 0), then randomly horizontally flip it and crop it to the size $513 \times 513$ for training. In order to make full use of the memory, the batch size is set to 12.

Learning Rate Policy. We adopt poly learning rate policy with initial learning rate 0.007 and power 0.9, the same as [11]. We train the model for 30,000 steps, considering the trade-off between accuracy and training time.

Inference. During inference stage, we do not use any data augmentation techniques. All metrics are obtained by a single-scale test on Cityscapes validation set.

Loss Function. As shown in Eq. 5, the proposed loss function $\mathcal {L}_\mathrm{{cehe}}$ has some hyper-parameters to be specified. In all our experiments, the number of levels n is set to 3 and correspondingly, the radii for different levels are $r_1=7$, $r_2=9$ and $r_3=11$, the factor $\lambda $ is set to 2. All these parameters are not carefully chosen, and performance gains may be obtained by grid search of them.

4.4 Evaluation

Metric Analysis. The proposed method helps to improve the performance of segmentation detail, thus leading to an improvement in terms of the overall performance. To evaluate $\mathcal {L}_\mathrm{{cehe}}$, we use different loss functions to train the DeepLab model with other parameter settings and training pipeline unchanged. As listed in Table 1, our proposed $\mathcal {L}_\mathrm{{cehe}}$ yields $72.94\%$ in terms of MIoU, outperforming the loss function $\mathcal {L}_\mathrm{{ce}}$ by $1.12\%$, which proves the effectiveness of the proposed method.

Table 1. Performance (MIoU) on Cityscapes validation set.

Full size table

Region-Based Metric Analysis. The analysis based on MIoU only gives us an overall evaluation of the segmentation results, and it does not seem to be able to verify the purpose of the proposed method $\mathcal {L}_\mathrm{{cehe}}$: improve segmentation detail. The following part will utilize two region-based metrics to (1) analyze the performance bottleneck of the model and (2) prove that the loss function $\mathcal {L}_\mathrm{{cehe}}$ can enhance segmentation detail.

As shown in Fig. 3, generally, the curves of three metrics of these two loss functions have the same trend. Both of them perform well (MIoUiOR is larger than $80\%$) in object region even if radius r is just larger than 10. However, both have inferior performance in edge region comparing with their MIoUiOR. For example, MIoUiER is $45.99\%$ and $50.13\%$ for $\mathcal {L}_\mathrm{{ce}}$ and $\mathcal {L}_\mathrm{{cehe}}$ respectively when $r=11$, which are $36.68\%$ and $31.84\%$ less than the value of corresponding MIoUiOR. This indicates that the factor limiting the performance of the model is the prediction of the pixels in edge region. Since the architecture of DeepLab, FCN-based feature extractor with a decoder, is commonly used in semantic segmentation, we argue that this conclusion is applicable to most models.

Table 2. Performance on the Cityscapes validation set obtained under different r settings.

Full size table

Figure 4 illustrates the metric curves of two loss functions, and detail statistics can be found in Table 2. In terms of MIoUiER, our proposed $\mathcal {L}_\mathrm{{cehe}}$ outperforms the widely used $\mathcal {L}_\mathrm{{ce}}$ by a large margin (between $3\%$ and $4\%$) under almost all r settings, indicating that $\mathcal {L}_\mathrm{{cehe}}$ performs better in edge region, namely segmentation detail. Some visualized examples are shown in Fig. 5. In terms of MIoUiOR, $\mathcal {L}_\mathrm{{cehe}}$ has inferior performance compared with $\mathcal {L}_\mathrm{{ce}}$, which further confirms that $\mathcal {L}_\mathrm{{cehe}}$ can improve the performance of segmentation detail since $\mathcal {L}_\mathrm{{cehe}}$ is superior than $\mathcal {L}_\mathrm{{ce}}$, considering the MIoU listed in Table 1. We conjecture the reason may be that the gradients of hard example are emphasized by the product of m(i) and $\lambda $ in Eq. 5, so they play a leading role in the update direction of some model parameters. Further performance gains may be obtained by carefully choosing m(i) and $\lambda $, and can be obtained by fusing models trained with different loss functions. We leave this for future research.

5 Conclusion

In this paper, we introduce two region-based metrics to quantitatively evaluate the performance of segmentation detail, which we use to analyze the performance bottleneck of the model. What’s more, we combine cross entropy with manually defined hard example to propose a loss function named $\mathcal {L}_\mathrm{{cehe}}$, which outperforms the widely used cross entropy $\mathcal {L}_\mathrm{{ce}}$ by $1.12\%$ in terms of MIoU, and by $4.15\%$ in terms of MIoUiER when radius $r=13$, indicating that the proposed $\mathcal {L}_\mathrm{{cehe}}$ performs better in segmentation detail.

References

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Google Scholar
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062 (2014)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1925–1934 (2017)
Google Scholar
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)
Google Scholar
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Chapter Google Scholar
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857–1866 (2018)
Google Scholar
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Wang, P., et al.: Understanding convolution for semantic segmentation. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1451–1460 (2018)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Zhuang, Y., et al.: Dense relation network: learning consistent and context-aware representation for semantic image segmentation. In: IEEE International Conference on Image Processing, pp. 3698–3702 (2018)
Google Scholar
Li, X., Liu, Z., Luo, P., Change Loy, C., Tang, X.: Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3193–3202 (2017)
Google Scholar
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 679–698 (1986)
Article Google Scholar
Wu, Z., Shen, C., Hengel, A.: High-performance semantic segmentation using very deep fully convolutional networks. arXiv preprint arXiv:1604.04339 (2016)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the programs of International Science and Technology Cooperation and Exchange of Sichuan Province under Grant 2017HH0028, Grant 2018HH0102 and Grant 2019YFH0014.

Author information

Authors and Affiliations

School of Resources and Environment, University of Electronic Science and Technology of China, Chengdu, China
Zelu Deng, Jianbin Gao & Tao Huang
Center for Digital Health, University of Electronic Science and Technology of China, Chengdu, China
Zelu Deng, Jianbin Gao, Tao Huang & James C. Gee

Authors

Zelu Deng
View author publications
You can also search for this author in PubMed Google Scholar
Jianbin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Tao Huang
View author publications
You can also search for this author in PubMed Google Scholar
James C. Gee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianbin Gao .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yao Zhao
The Australian National University, Canberra, Australia
Nick Barnes
Peking University, Beijing, China
Baoquan Chen
The Technical University of Munich, Munich, Bayern, Germany
Rüdiger Westermann
Zhejiang University, Hangzhou, China
Xiangwei Kong
Beijing Jiaotong University, Beijing, China
Chunyu Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deng, Z., Gao, J., Huang, T., Gee, J.C. (2019). Combining Cross Entropy Loss with Manually Defined Hard Example for Semantic Image Segmentation. In: Zhao, Y., Barnes, N., Chen, B., Westermann, R., Kong, X., Lin, C. (eds) Image and Graphics. ICIG 2019. Lecture Notes in Computer Science(), vol 11901. Springer, Cham. https://doi.org/10.1007/978-3-030-34120-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-34120-6_3
Published: 28 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34119-0
Online ISBN: 978-3-030-34120-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)