CANet: cross attention network for food image segmentation

Dong, Xiaoxiao; Li, Haisheng; Wang, Xiaochuan; Wang, Wei; Du, Junping

doi:10.1007/s11042-023-17916-z

CANet: cross attention network for food image segmentation

Published: 29 December 2023

Volume 83, pages 60987–61006, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xiaoxiao Dong^1,2,3,
Haisheng Li ORCID: orcid.org/0000-0003-4861-0513^1,2,3,
Xiaochuan Wang^1,2,3,
Wei Wang^1,2,3 &
…
Junping Du⁴

563 Accesses
Explore all metrics

Abstract

Food image segmentation which aims to distinguish various ingredients is crucial for food safety, as estimating calories and other nutrients is important for human health and sustainable development. However, the performances of current image segmentation methods are inferior on food image datasets due to the significant diversity of appearances and distinctive conditions between ingredients and daily props, while these methods have insufficient capabilities for feature extraction of food images. In addition, utilizing attention mechanisms to obtain contextual detail information and long-range dependencies leads to a quadratic computational complexity. In this paper, we propose a Cross Spatial Attention (CSA) module to extract richer spatial features from food images, with lower time and space complexity. Specifically, the CSA module aggregates the contextual information by cross-calculation of horizontal and vertical dimensions. And experiments demonstrate that, by taking a two-step cross-calculation, each pixel could eventually capture global long-range dependencies. Furthermore, our method integrates a Channel Attention (CA) module to selectively highlight interdependent channel information by integrating relevant features across all feature maps. Then the outputs of these two attention modules are aggregated to enhance the representation of the image feature. Convincing performance improvement is achieved on the FoodSeg103, UECFoodPix and ADE20K. Moreover, the proposed network achieved better trade-off between accuracy and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Food image segmentation based on deep and shallow dual-branch network

Article 29 January 2025

Semantic Center Guided Windows Attention Fusion Framework for Food Recognition

Benchmarking algorithms for food localization and semantic segmentation

Article 24 June 2020

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Oh SW, Lee J-Y, Xu N, Kim SJ (2022) Space-time memory networks for video object segmentation with user guidance. IEEE Trans Pattern Anal Mach Intell 44(1):442–455
Article Google Scholar
Ye Y, Yang K, Xiang K, Wang J, Wang K (2020) Universal semantic segmentation for fisheye urban driving images. In: 2020 IEEE International conference on systems, man, and cybernetics (SMC). IEEE, pp 648–655
Wei X, Du J, Liang M, Ye L (2019) Boosting deep attribute learning via support vector regression for fast moving crowd counting. Pattern Recogn Lett 119:12–23
Article Google Scholar
He Y, Yang D, Roth H, Zhao C, Xu D (2021) Dints: differentiable neural network topology search for 3d medical image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5841–5850
Xu L, Du J, Li Q (2013) Image fusion based on nonsubsampled contourlet transform and saliency-motivated pulse coupled neural networks. Mathematical Problems in Engineering 2013
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Cao J, Mao D-h, Cai Q, Li H-s, Du J-p (2013) A review of object representation based on local features. Journal of Zhejiang University Science C 14(7):495–504
Article Google Scholar
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE International conference on computer vision, pp 764–773
Yang M, Yu K, Zhang C, Li Z, Yang K (2018) Denseaspp for semantic segmentation in street scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3684–3692
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Wu Z, Liu C, Wen J, Xu Y, Yang J, Li X (2023) Selecting high-quality proposals for weakly supervised object detection with bottom-up aggregated attention and phase-aware loss. IEEE Trans Image Process 32:682–693
Article Google Scholar
Fu J, Liu J, Jiang J, Li Y, Bao Y, Lu H (2020) Scene segmentation with dual relation-aware attention network. IEEE Transactions on Neural Networks and Learning Systems 32(6):2547–2560
Article Google Scholar
Dong X, Wang W, Li H, Cai Q (2021) Windows attention based pyramid network for food segmentation. In: 2021 IEEE 7th International conference on cloud computing and intelligent systems (CCIS). IEEE, pp 213–217
Wang W, Min W, Li T, Dong X, Li H, Jiang S (2022) A review on vision-based analysis for automatic dietary assessment. Trends in Food Science & Technology
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154
Li Q, Du J, Song F, Wang C, Liu H, Lu C (2013) Region-based multi-focus image fusion using the local spatial frequency. In: 2013 25th Chinese control and decision conference (CCDC). IEEE, pp 3792–3796
Wu X, Fu X, Liu Y, Lim E-P, Hoi SC, Sun Q (2021) A large-scale benchmark for food image segmentation. In: Proceedings of the 29th ACM international conference on multimedia, pp 506–515
Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ade20k dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 633–641
Fang Y, Deng W, Du J, Hu J (2020) Identity-aware cyclegan for face photo-sketch synthesis and recognition. Pattern Recogn 102:107249
Article Google Scholar
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. International Conference on Learning Representations
Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters–improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361
Yuan Y, Huang L, Guo J, Zhang C, Chen X, Wang J (2021) Ocnet: object context for semantic segmentation. Int J Comput Vision 129(8):2375–2398
Article Google Scholar
Wu T, Tang S, Zhang R, Cao J, Zhang Y (2021) Cgnet: a light-weight context guided network for semantic segmentation. IEEE Trans Image Process 30:1169–1179
Article Google Scholar
Yuan Y, Xie J, Chen X, Wang J (2020) Segfix: model-agnostic boundary refinement for segmentation. In: European conference on computer vision. Springer, pp 489–506
Geng Z, Guo M-H, Chen H, Li X, Wei K, Lin Z (2020) Is attention better than matrix decomposition? In: International conference on learning representations
Guo M-H, Xu T-X, Liu J-J, Liu Z-N, Jiang P-T, Mu T-J, Zhang S-H, Martin RR, Cheng M-M, Hu S-M (2022) Attention mechanisms in computer vision: a survey. Computational Visual Media, 1–38
Li F, Cai Q, Li H, Chen Y, Cao J, Li S (2022) Attentive frequency learning network for super-resolution. Appl Intell 52(5):5185–5196
Article Google Scholar
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Kou F, Du J, Lin Z, Liang M, Li H, Shi L, Yang C (2018) A semantic modeling method for social network short text based on spatial and temporal characteristics. Journal of computational science 28:281–293
Article Google Scholar
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Park J, Woo S, Lee J-Y, Kweon I-S (2018) Bam: bottleneck attention module. In: British machine vision conference (BMVC). British machine vision association (BMVA)
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Shu X, Yang J, Yan R, Song Y (2022) Expansion-squeeze-excitation fusion network for elderly activity recognition. IEEE Trans Circuits Syst Video Technol 32(8):5281–5292
Article Google Scholar
Shu X, Zhang L, Qi G-J, Liu W, Tang J (2021) Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. IEEE Trans Pattern Anal Mach Intell 44(6):3300–3315
Article Google Scholar
Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3139–3148
Min W, Jiang S, Liu L, Rui Y, Jain R (2019) A survey on food computing. ACM Computing Surveys (CSUR) 52(5):1–36
Article Google Scholar
Chen J, Ngo C-W (2016) Deep-based ingredient recognition for cooking recipe retrieval. In: Proceedings of the 24th ACM international conference on multimedia, pp 32–41
Zhou F, Lin Y (2016) Fine-grained image classification by exploring bipartite-graph labels. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1124–1133
Min W, Jiang S, Sang J, Wang H, Liu X, Herranz L (2016) Being a supercook: joint food attributes and multimodal content modeling for recipe retrieval and exploration. IEEE Trans Multimedia 19(5):1100–1113
Article Google Scholar
Ge M, Elahi M, Fernaández-Tobías I, Ricci F, Massimo D (2015) Using tags and latent factors in a food recommender system. In: Proceedings of the 5th international conference on digital health 2015, pp 105–112
Trattner C, Elsweiler D (2017) Investigating the healthiness of internet-sourced recipes: implications for meal planning and recommender systems. In: Proceedings of the 26th international conference on world wide web, pp 489–498
Phanich M, Pholkul P, Phimoltares S (2010) Food recommendation system using clustering analysis for diabetic patients. In: 2010 International conference on information science and applications. IEEE, pp 1–8
Sadilek A, Kautz H, DiPrete L, Labus B, Portman E, Teitel J, Silenzio V (2016) Deploying nemesis: preventing foodborne illness by data mining social media. In: Proceedings of the AAAI conference on artificial intelligence, vol 30, pp 3982–3989
Schiboni G, Amft O (2018). In: Tamura T, Chen W (eds) Automatic dietary monitoring using wearable accessories. Springer, Cham, pp 369–412
Google Scholar
Aguilar E, Remeseiro B, Bolaños M, Radeva P (2018) Grab, pay, and eat: semantic food detection for smart restaurants. IEEE Trans Multimedia 20(12):3266–3275
Article Google Scholar
Shimoda W, Yanai K (2015) Cnn-based food image segmentation without pixel-wise annotation. In: International conference on image analysis and processing. Springer, pp 449–457
Shroff G, Smailagic A, Siewiorek DP (2008) Wearable context-aware food recognition for calorie monitoring. In: 2008 12th Ieee International symposium on wearable computers. IEEE, pp 119–120
Chang Y-W, Chen Y-Y (2006) An improve scheme of segmenting colour food image by robust algorithm. Proc Algo 2006:331–335
Google Scholar
De Silva, LC, Pereira A, Punchihewa A (2005) Food classification using color imaging. In: Published in the proceedings of the annual conference on image and vision computing (IVCNZ 2005), University of Otago, Dunedin, New Zealand. Citeseer, pp 28–29
Meyers A, Johnston N, Rathod V, Korattikara A, Gorban A, Silberman N, Guadarrama S, Papandreou G, Huang J, Murphy KP (2015) Im2calories: towards an automated mobile vision food diary. In: Proceedings of the IEEE international conference on computer vision, pp 1233–1241
Morikawa C, Sugiyama H, Aizawa K (2012) Food region segmentation in meal images using touch points. In: Proceedings of the ACM multimedia 2012 workshop on multimedia for cooking and eating activities, pp 7–12
Shimoda W, Yanai K (2016) Foodness proposal for multiple food detection by training of single food images. In: Proceedings of the 2nd international workshop on multimedia assisted dietary management, pp 13–21
Zhu F, Bosch M, Khanna N, Boushey CJ, Delp EJ (2011) Multilevel segmentation for food classification in dietary assessment. In: 2011 7th International symposium on image and signal processing and analysis (ISPA). IEEE, pp 337–342
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Okamoto K, Yanai K (2021) Uec-foodpix complete: a large-scale food image segmentation dataset. In: Pattern recognition. ICPR International workshops and challenges: Virtual Event, Proceedings, Part V. Springer, pp 647–659. Accessed 15 Jan 2021
Contributors M (2020) MMSegmentation, an Open Source Semantic Segmentation Toolbox. https://github.com/open-mmlab/mmsegmentation
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar

Download references

Acknowledgements

This work was supported by scientific research program of Beijing Municipal Education Commission KZ202110011017, National Natural Science Foundation of China (No. 62277001) and Beijing Natural Science Foundation (No. L233026).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
Xiaoxiao Dong, Haisheng Li, Xiaochuan Wang & Wei Wang
Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing, 100048, China
Xiaoxiao Dong, Haisheng Li, Xiaochuan Wang & Wei Wang
National Engineering Laboratory For Agri-product Quality Traceability, Beijing, 100048, China
Xiaoxiao Dong, Haisheng Li, Xiaochuan Wang & Wei Wang
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Junping Du

Authors

Xiaoxiao Dong
View author publications
You can also search for this author inPubMed Google Scholar
Haisheng Li
View author publications
You can also search for this author inPubMed Google Scholar
Xiaochuan Wang
View author publications
You can also search for this author inPubMed Google Scholar
Wei Wang
View author publications
You can also search for this author inPubMed Google Scholar
Junping Du
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Haisheng Li.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dong, X., Li, H., Wang, X. et al. CANet: cross attention network for food image segmentation. Multimed Tools Appl 83, 60987–61006 (2024). https://doi.org/10.1007/s11042-023-17916-z

Download citation

Received: 01 July 2022
Revised: 11 November 2023
Accepted: 17 December 2023
Published: 29 December 2023
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11042-023-17916-z

Keywords

Profiles

Haisheng Li View author profile

Part of a collection:

Track 6: Computer Vision for Multimedia Applications

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CANet: cross attention network for food image segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Food image segmentation based on deep and shallow dual-branch network

Semantic Center Guided Windows Attention Fusion Framework for Food Recognition

Benchmarking algorithms for food localization and semantic segmentation

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now