Abstract
Weakly supervised semantic segmentation (WSSS) methods are more flexible and less costly than supervised ones since no pixel-level annotation is required. Class activation maps (CAMs) are commonly used in existing WSSS methods with image-level annotations to identify seed localization cues. However, as CAMs are obtained from a classification network that mainly focuses on the most discriminative parts of an object, less discriminative parts may be ignored and not identified. This study aims to improve the local visual understanding on objects of the classification network by considering an additional metric learning task on patches sampled from each CAM-based object proposal. As the patches contain different object parts and surrounding backgrounds, not only the most discriminative object parts but the entire objects are learned through leveraging the patch similarity. After the joint training process with the proposed patch-based metric learning and classification tasks, we expect more discriminative local features can be learned by the backbone network. As a result, more complete class-specific regions of an object can be identified. Extensive experiments on the PASCAL VOC 2012 dataset validate the superiority of our method. Our proposed model achieves improvement compared with the state-of-the-art methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Dai, J.F., He, K.M., Sun, J.: BoxSup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1635–1643. IEEE (2015)
Lin, D., Dai, J.F., et al.: ScribbleSup: scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3159–3167. IEEE (2017)
Kolesnikov, A., Lampert, C.H.: Seed, expand and constrain: three principles for weakly-supervised image segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 695–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_42
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Zhou, B.L., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929. IEEE (2016)
Wu, C., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2859–2867. IEEE (2017)
Wei, Y.C., Feng, J.S., Liang, X.D., Cheng, M.M., Zhao, Y., Yan, S.C: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6488–6496. IEEE (2017)
Lee, J., Kim, E., et al.: FickleNet: weakly and semi-supervised semantic image segmentation using stochastic inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5267–5276. IEEE (2019)
Wang, Y.D., Zhang, J., Kan, M.N., Shan, S.G., Chen, X.L.: Self-supervised scale equivariant network for weakly supervised semantic segmentation. arXiv: Computer Vision and Pattern Recognition (2019)
Wang, Y., Zhang, J., Kan, M., Shan, S., Chen, X.: Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12272–12281. IEEE (2020)
Chang, Y.T., Wang, Q.S., Hung, W.C., Robinson, P.: Weakly-supervised semantic segmentation via sub-category exploration. In: Proceedings of the International Conference on Computer Vision, IEEE (2020)
Ahn, J., Kwak, S.: Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4981–4990. IEEE (2018)
Ahn, J., Cho, S., Kwak, S.: Weakly supervised learning of instance segmentation with inter-pixel relations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2204–2213. IEEE (2019)
Huang, Z.L., Wang, X.G., Wang, J.S., Liu, W.Y., Wang, J.D.: Weakly-supervised semantic segmentation network with deep seeded region growing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7014–7023. IEEE (2018)
Fan, J.S., Zhang, Z.X., Tan, T.N., Song, C.F., Xiao, J.: CIAN: cross-image affinity net for weakly supervised semantic segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10762–10769 (2020)
Fan, J., Zhang, Z., Song, C., Tan, T.: Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4282–4291. IEEE (2020)
Zhang, D., Zhang, H.W., Tang, J.H., Hua, X.S., Sun, Q.R.: Causal intervention for weakly-supervised semantic segmentation. In: Proceedings of the Conference on Neural Information Processing Systems (2020)
Boiarov, A., Tyantov, E.: Large scale landmark recognition via deep metric learning. In: Proceedings of the ACM International Conference, pp. 169–178 (2019)
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823. IEEE (2015)
Neubeck, A., Van Gool, L.: Efficient non-maximum suppression. In: Proceedings of the International Conference on Pattern Recognition, pp. 850–855. IEEE (2006)
Everingham, M., Eslami, S.M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Jiang, P.T., Hou, Q.B., Cao, Y., Cheng, M.M., Wei, Y.C., Xiong, H.K.: Integral object mining via online attention accumulation. In: Proceedings of the International Conference on Computer Vision. IEEE (2019)
Acknowledgments
This paper is supported by the Natural Science Foundation of Guangdong Province, China (No. 2018A030313203).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chan, P.P.K., Chen, K., Xu, L., Hu, X., Yeung, D.S. (2021). Weakly Supervised Semantic Segmentation with Patch-Based Metric Learning Enhancement. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12893. Springer, Cham. https://doi.org/10.1007/978-3-030-86365-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-86365-4_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86364-7
Online ISBN: 978-3-030-86365-4
eBook Packages: Computer ScienceComputer Science (R0)