Skip to main content
Log in

Semantic Image Matting: General and Specific Semantics

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Although conventional matting formulation can separate foreground from background in fractional occupancy which can be caused by highly transparent objects, complex foreground (e.g., net or tree), and objects containing very fine details (e.g., hairs), no previous work has attempted to reason the underlying causes of matting due to various foreground semantics in general. We show how to obtain better alpha mattes by incorporating into our framework semantic classification of matting regions. Specifically, we consider and learn 20 classes of general matting patterns, and propose to extend the conventional trimap to semantic trimap. The proposed semantic trimap can be obtained automatically through patch structure analysis within trimap regions. Meanwhile, we learn a multi-class discriminator to regularize the alpha prediction at semantic level, and content-sensitive weights to balance different regularization losses. Experiments on multiple benchmarks show that our method outperforms other methods benefit from such general alpha semantics and has achieved the most competitive state-of-the-art performance. We further explore the effectiveness of our method on specific semantics by specializing our method into human matting and transparent object matting. Experimental results on specific semantics demonstrate alpha matte semantic information can boost performance for not only general matting but also class-specific matting. Finally, we contribute a large-scale Semantic Image Matting Dataset constructed with careful consideration of data balancing across different semantic classes. Code and dataset are available in https://github.com/nowsyn/SIM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Availability of data and materials

The dataset proposed in this work is avaiable at https://github.com/nowsyn/SIM.

Code availability

Not applicable.

References

  • Aksoy, Y., Ozan Aydin, T., & Pollefeys, M. (2017). Designing effective inter-pixel information flow for natural image matting. In CVPR.

  • Aksoy, Y., Oh, T.-H., Paris, S., Pollefeys, M., & Matusik, W. (2018). Semantic soft segmentation. ACM Transactions on Graphics, 37(4), 1–13.

    Article  Google Scholar 

  • Amin, B., Riaz, M. M., & Ghafoor, A. (2019). A hybrid defocused region segmentation approach using image matting. Multidimensional Systems and Signal Processing, 30(2), 561–569.

    Article  Google Scholar 

  • Bai, X., & Sapiro, G. (2007). A geodesic framework for fast interactive image and video segmentation and matting. In ICCV.

  • Cai, S., Zhang, X., Fan, H., Huang, H., Liu, J., Liu, J., Liu, J., Wang, J., & Sun, J. (2019). Disentangled image matting. In ICCV.

  • Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020) End-to-end object detection with transformers. In European conference computer vision.

  • Chen, Q., Ge, T., Xu, Y., Zhang, Z., Yang, X., & Gai, K. (2018). Semantic human matting. In ACM MM.

  • Chen, G., Han, K., & Wong, K.-Y. K. (2018). Tom-net: Learning transparent object matting from a single image. In CVPR.

  • Chen, Q., Li, D., & Tang, C.-K. (2012). Knn matting. In CVPR.

  • Chen, L.-C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587

  • Cheng, B., Misra, I., Schwing, A. G., Kirillov, A., & Girdhar, R. (2022). Masked-attention mask transformer for universal image segmentation. In IEEE/CVF conference on computer vision and pattern recognition.

  • Cho, D., Tai, Y.-W., & Kweon, I. (2016). Natural image matting using deep convolutional neural networks. In ECCV.

  • Christoph, R., Carsten, R., Jue, W., Margrit, G., Pushmeet, K., & Pamela, R. Alphamatting. http://www.alphamatting.com/

  • Chuang, Y.-Y., Curless, B., Salesin, D. H., & Szeliski, R. (2001). A bayesian approach to digital matting. In CVPR.

  • Dai, Y., Lu, H., & Shen, C. (2021). Learning affinity-aware upsampling for deep image matting. In IEEE conference on computer vision and pattern recognition.

  • Dai, Y., Price, B., Zhang, H., & Shen, C. (2022). Boosting robustness of image matting with context assembling and strong data augmentation. In IEEE/CVF conference on computer vision and pattern recognition.

  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.

  • Deora, R., Sharma, R., & Raj, D. S. S. (2021). Salient image matting.

  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International conference on learning representations, ICLR.

  • Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. IJCV, 88(2), 303–338.

    Article  Google Scholar 

  • Feng, X., Liang, X., & Zhang, Z. (2016). A cluster sampling method for image matting via sparse coding. In ECCV.

  • Forte, M., & Pitié, F. (2020). F, b, alpha matting. arXiv preprint arXiv:2003.07711

  • Gastal, E. S., & Oliveira, M. M. (2010). Shared sampling for real-time alpha matting. Computer Graphics Forum, 29(2), 575–584.

    Article  Google Scholar 

  • Grady, L., Schiwietz, T., Aharon, S., & Westermann, R. (2005). Random walks for interactive alpha-matting. In Proceedings of VIIP.

  • He, K., Rhemann, C., Rother, C., Tang, X., & Sun, J. (2011). A global sampling method for alpha matting. In CVPR.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

  • Hou, Q., & Liu, F. (2019). Context-aware image matting for simultaneous foreground and alpha estimation. In ICCV.

  • Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In ECCV.

  • Ke, Z., Li, K., Zhou, Y., Wu, Q., Mao, X., Yan, Q., & Lau, R. W. H. (2020). Is a green screen really necessary for real-time portrait matting?

  • Köhler, R., Hirsch, M., Schölkopf, B., & Harmeling, S. (2013). Improving alpha matting and motion blurred foreground estimation. In ICIP.

  • Kong, X.-N., Ng, M. K., & Zhou, Z.-H. (2009). Transductive multi-label learning via alpha matting (Unpublished manuscript).

  • Levin, A., Lischinski, D., & Weiss, Y. (2006). A closed-form solution to natural image matting. In CVPR.

  • Levin, A., Rav-Acha, A., & Lischinski, D. (2007). Spectral matting. In CVPR.

  • Li, Y., & Lu, H. (2020). Natural image matting via guided contextual attention. In AAAI.

  • Li, J., Zhang, J., Maybank, S.J., & Tao, D. (2020). End-to-end animal image matting.

  • Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In ECCV.

  • Lin, S., Ryabtsev, A., Sengupta, S., Curless, B., Seitz, S., & Kemelmacher-Shlizerman, I. (2021). Real-time high-resolution background matting. In CVPR.

  • Lin, S., Yang, L., Saleemi, I., & Sengupta, S. (2022). Robust high-resolution video matting with temporal guidance. In IEEE/CVF winter conference on applications of computer vision.

  • Lin, H. T., Tai, Y.-W., & Brown, M. S. (2011). Motion regularization for matting motion blurred objects. TPAMI, 33(11), 2329–2336.

    Article  Google Scholar 

  • Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., & Han, J. (2020). On the variance of the adaptive learning rate and beyond. In ICLR.

  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE/CVF international conference on computer vision.

  • Liu, J., Yao, Y., Hou, W., Cui, M., Xie, X., Zhang, C., & Hua, X.-S. (2020). Boosting semantic human matting with coarse annotations. In CVPR.

  • Lu, H., Dai, Y., Shen, C., Xu, S. (2019). Indices matter: Learning to index for deep image matting. In ICCV.

  • Lutz, S., Amplianitis, K., & Smolic, A. (2018). Alphagan: Generative adversarial networks for natural image matting. In BMVC.

  • Mechrez, R., Talmi, I., & Zelnik-Manor, L. (2018). The contextual loss for image transformation with non-aligned data. In ECCV.

  • Park, G., Son, S., Yoo, J., Kim, S., & Kwak, N. (2022). Matteformer: Transformer-based image matting via prior-tokens. In IEEE/CVF conference on computer vision and pattern recognition.

  • Qiao, Y., Liu, Y., Yang, X., Zhou, D., Xu, M., Zhang, Q., & Wei, X. (2020). Attention-guided hierarchical structure aggregation for image matting. In CVPR.

  • Qiao, Y., Liu, Y., Zhu, Q., Yang, X., Wang, Y., Zhang, Q., & Wei, X. (2020). Multi-scale information assembly for image matting. Computer Graphics Forum, 39(7), 565–574. https://doi.org/10.1111/cgf.14168

    Article  Google Scholar 

  • Rhemann, C., Rother, C., Wang, J., Gelautz, M., Kohli, P., & Rott, P. (2009). A perceptually motivated online benchmark for image matting. In CVPR.

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention.

  • Ruzon, M. A., & Tomasi, C. (2000). Alpha estimation in natural images. In: CVPR.

  • Sengupta, S., Jayaram, V., Curless, B., Seitz, S., & Kemelmacher-Shlizerman, I. (2020). Background matting: The world is your green screen. In CVPR.

  • Sengupta, S., Jayaram, V., Curless, B., Seitz, S. M., & Kemelmacher-Shlizerman, I. (2020). Background matting: The world is your green screen. In IEEE/CVF conference on computer vision and pattern recognition.

  • Sharma, R., Deora, R., & Vishvakarma, A. (2020). AlphaNet: An attention guided deep network for automatic image matting.

  • Shen, X., Tao, X., Gao, H., Zhou, C., & Jia, J. (2016). Deep automatic portrait matting. In ECCV.

  • Sun, Y., Tang, C.-K., & Tai, Y.-W. (2021). Semantic image matting. In CVPR.

  • Sun, Y., Tang, C., & Tai, Y. (2022). Human instance matting via mutual guidance and multi-instance refinement. In IEEE/CVF conference on computer vision and pattern recognition, CVPR.

  • Tai, Y.-W., Jia, J., & Tang, C.-K. (2007). Soft color segmentation and its applications. TPAMI, 29(9), 1520–1537.

    Article  Google Scholar 

  • Tang, J., Aksoy, Y., Oztireli, C., Gross, M., & Aydin, T. O. (2019). Learning-based sampling for natural image matting. In CVPR.

  • Torralba, A., & Oliva, A. (2003). Statistics of natural image categories. Network: Computation in Neural Systems, 14(3), 391–412.

    Article  PubMed  Google Scholar 

  • Wang, J., Cohen, M. F., et al. (2007). Image and video matting: A survey. Foundations and Trends ® in Computer Graphics and Vision, 3(2), 97–175.

    Article  Google Scholar 

  • Wei, T., Chen, D., Zhou, W., Liao, J., Zhao, H., Zhang, W., & Yu, N. (2021). Improved image matting via real-time user clicks and uncertainty estimation. In CVPR.

  • Xu, N., Price, B., Cohen, S., & Huang, T. (2017). Deep image matting. In CVPR.

  • Yeung, S. K., Tang, C.-K., Brown, M. S., & Kang, S. B. (2011). Matting and compositing of transparent and refractive objects. ACM Transactions on Graphics, 30(1), 1–13.

    Article  Google Scholar 

  • Yu, Z., Li, X., Huang, H., Zheng, W., & Chen, L. (2021). Cascade image matting with deformable graph refinement. In 2021 IEEE/CVF international conference on computer vision.

  • Yu, H., Xu, N., Huang, Z., Zhou, Y., & Shi, H. (2021). High-resolution deep image matting. In AAAI

  • Yu, Q., Zhang, J., Zhang, H., Wang, Y., Lin, Z., Xu, N., Bai, Y., & Yuille, A. (2021). Mask guided matting via progressive refinement network. In CVPR.

  • Zhang, Y., Gong, L., Fan, L., Ren, P., Huang, Q., Bao, H., & Xu, W. (2019). A late fusion CNN for digital matting. In CVPR.

  • Zhang, X. C., Ng, R., & Chen, Q. (2018). Single image reflection separation with perceptual losses. In CVPR.

  • Zheng, Y., & Kambhamettu, C. (2009). Learning based digital matting. In ICCV.

  • Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In CVPR.

  • Zhu, B., Chen, Y., Wang, J., Liu, S., Zhang, B., & Tang, M. (2017). Fast deep matting for portrait animation on mobile phone. In ACM MM.

Download references

Funding

This work was supported by Kuaishou Technology and the Research Grant Council of the Hong Kong SAR under grant no. 16201420.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Yanan Sun and supervised by Chi-Keung Tang and Yu-Wing Tai.

Corresponding author

Correspondence to Yanan Sun.

Ethics declarations

Conflict of interest

This work is of non-financial interests.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Communicated by Ming-Hsuan Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Tang, CK. & Tai, YW. Semantic Image Matting: General and Specific Semantics. Int J Comput Vis 132, 710–730 (2024). https://doi.org/10.1007/s11263-023-01907-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01907-6

Keywords

Navigation