Skip to main content

Advertisement

Log in

CANet: cross attention network for food image segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Food image segmentation which aims to distinguish various ingredients is crucial for food safety, as estimating calories and other nutrients is important for human health and sustainable development. However, the performances of current image segmentation methods are inferior on food image datasets due to the significant diversity of appearances and distinctive conditions between ingredients and daily props, while these methods have insufficient capabilities for feature extraction of food images. In addition, utilizing attention mechanisms to obtain contextual detail information and long-range dependencies leads to a quadratic computational complexity. In this paper, we propose a Cross Spatial Attention (CSA) module to extract richer spatial features from food images, with lower time and space complexity. Specifically, the CSA module aggregates the contextual information by cross-calculation of horizontal and vertical dimensions. And experiments demonstrate that, by taking a two-step cross-calculation, each pixel could eventually capture global long-range dependencies. Furthermore, our method integrates a Channel Attention (CA) module to selectively highlight interdependent channel information by integrating relevant features across all feature maps. Then the outputs of these two attention modules are aggregated to enhance the representation of the image feature. Convincing performance improvement is achieved on the FoodSeg103, UECFoodPix and ADE20K. Moreover, the proposed network achieved better trade-off between accuracy and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Oh SW, Lee J-Y, Xu N, Kim SJ (2022) Space-time memory networks for video object segmentation with user guidance. IEEE Trans Pattern Anal Mach Intell 44(1):442–455

    Article  Google Scholar 

  2. Ye Y, Yang K, Xiang K, Wang J, Wang K (2020) Universal semantic segmentation for fisheye urban driving images. In: 2020 IEEE International conference on systems, man, and cybernetics (SMC). IEEE, pp 648–655

  3. Wei X, Du J, Liang M, Ye L (2019) Boosting deep attribute learning via support vector regression for fast moving crowd counting. Pattern Recogn Lett 119:12–23

    Article  Google Scholar 

  4. He Y, Yang D, Roth H, Zhao C, Xu D (2021) Dints: differentiable neural network topology search for 3d medical image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5841–5850

  5. Xu L, Du J, Li Q (2013) Image fusion based on nonsubsampled contourlet transform and saliency-motivated pulse coupled neural networks. Mathematical Problems in Engineering 2013

  6. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  7. Cao J, Mao D-h, Cai Q, Li H-s, Du J-p (2013) A review of object representation based on local features. Journal of Zhejiang University Science C 14(7):495–504

    Article  Google Scholar 

  8. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  9. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE International conference on computer vision, pp 764–773

  10. Yang M, Yu K, Zhang C, Li Z, Yang K (2018) Denseaspp for semantic segmentation in street scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3684–3692

  11. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890

  12. Wu Z, Liu C, Wen J, Xu Y, Yang J, Li X (2023) Selecting high-quality proposals for weakly supervised object detection with bottom-up aggregated attention and phase-aware loss. IEEE Trans Image Process 32:682–693

    Article  Google Scholar 

  13. Fu J, Liu J, Jiang J, Li Y, Bao Y, Lu H (2020) Scene segmentation with dual relation-aware attention network. IEEE Transactions on Neural Networks and Learning Systems 32(6):2547–2560

    Article  Google Scholar 

  14. Dong X, Wang W, Li H, Cai Q (2021) Windows attention based pyramid network for food segmentation. In: 2021 IEEE 7th International conference on cloud computing and intelligent systems (CCIS). IEEE, pp 213–217

  15. Wang W, Min W, Li T, Dong X, Li H, Jiang S (2022) A review on vision-based analysis for automatic dietary assessment. Trends in Food Science & Technology

  16. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154

  17. Li Q, Du J, Song F, Wang C, Liu H, Lu C (2013) Region-based multi-focus image fusion using the local spatial frequency. In: 2013 25th Chinese control and decision conference (CCDC). IEEE, pp 3792–3796

  18. Wu X, Fu X, Liu Y, Lim E-P, Hoi SC, Sun Q (2021) A large-scale benchmark for food image segmentation. In: Proceedings of the 29th ACM international conference on multimedia, pp 506–515

  19. Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ade20k dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 633–641

  20. Fang Y, Deng W, Du J, Hu J (2020) Identity-aware cyclegan for face photo-sketch synthesis and recognition. Pattern Recogn 102:107249

    Article  Google Scholar 

  21. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

  22. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. International Conference on Learning Representations

  23. Peng C, Zhang X, Yu G, Luo G, Sun J (2017) Large kernel matters–improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4353–4361

  24. Yuan Y, Huang L, Guo J, Zhang C, Chen X, Wang J (2021) Ocnet: object context for semantic segmentation. Int J Comput Vision 129(8):2375–2398

    Article  Google Scholar 

  25. Wu T, Tang S, Zhang R, Cao J, Zhang Y (2021) Cgnet: a light-weight context guided network for semantic segmentation. IEEE Trans Image Process 30:1169–1179

    Article  Google Scholar 

  26. Yuan Y, Xie J, Chen X, Wang J (2020) Segfix: model-agnostic boundary refinement for segmentation. In: European conference on computer vision. Springer, pp 489–506

  27. Geng Z, Guo M-H, Chen H, Li X, Wei K, Lin Z (2020) Is attention better than matrix decomposition? In: International conference on learning representations

  28. Guo M-H, Xu T-X, Liu J-J, Liu Z-N, Jiang P-T, Mu T-J, Zhang S-H, Martin RR, Cheng M-M, Hu S-M (2022) Attention mechanisms in computer vision: a survey. Computational Visual Media, 1–38

  29. Li F, Cai Q, Li H, Chen Y, Cao J, Li S (2022) Attentive frequency learning network for super-resolution. Appl Intell 52(5):5185–5196

    Article  Google Scholar 

  30. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  31. Kou F, Du J, Lin Z, Liang M, Li H, Shi L, Yang C (2018) A semantic modeling method for social network short text based on spatial and temporal characteristics. Journal of computational science 28:281–293

    Article  Google Scholar 

  32. Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612

  33. Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667

  34. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  35. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  36. Park J, Woo S, Lee J-Y, Kweon I-S (2018) Bam: bottleneck attention module. In: British machine vision conference (BMVC). British machine vision association (BMVA)

  37. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  38. Shu X, Yang J, Yan R, Song Y (2022) Expansion-squeeze-excitation fusion network for elderly activity recognition. IEEE Trans Circuits Syst Video Technol 32(8):5281–5292

    Article  Google Scholar 

  39. Shu X, Zhang L, Qi G-J, Liu W, Tang J (2021) Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. IEEE Trans Pattern Anal Mach Intell 44(6):3300–3315

    Article  Google Scholar 

  40. Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3139–3148

  41. Min W, Jiang S, Liu L, Rui Y, Jain R (2019) A survey on food computing. ACM Computing Surveys (CSUR) 52(5):1–36

    Article  Google Scholar 

  42. Chen J, Ngo C-W (2016) Deep-based ingredient recognition for cooking recipe retrieval. In: Proceedings of the 24th ACM international conference on multimedia, pp 32–41

  43. Zhou F, Lin Y (2016) Fine-grained image classification by exploring bipartite-graph labels. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1124–1133

  44. Min W, Jiang S, Sang J, Wang H, Liu X, Herranz L (2016) Being a supercook: joint food attributes and multimodal content modeling for recipe retrieval and exploration. IEEE Trans Multimedia 19(5):1100–1113

    Article  Google Scholar 

  45. Ge M, Elahi M, Fernaández-Tobías I, Ricci F, Massimo D (2015) Using tags and latent factors in a food recommender system. In: Proceedings of the 5th international conference on digital health 2015, pp 105–112

  46. Trattner C, Elsweiler D (2017) Investigating the healthiness of internet-sourced recipes: implications for meal planning and recommender systems. In: Proceedings of the 26th international conference on world wide web, pp 489–498

  47. Phanich M, Pholkul P, Phimoltares S (2010) Food recommendation system using clustering analysis for diabetic patients. In: 2010 International conference on information science and applications. IEEE, pp 1–8

  48. Sadilek A, Kautz H, DiPrete L, Labus B, Portman E, Teitel J, Silenzio V (2016) Deploying nemesis: preventing foodborne illness by data mining social media. In: Proceedings of the AAAI conference on artificial intelligence, vol 30, pp 3982–3989

  49. Schiboni G, Amft O (2018). In: Tamura T, Chen W (eds) Automatic dietary monitoring using wearable accessories. Springer, Cham, pp 369–412

    Google Scholar 

  50. Aguilar E, Remeseiro B, Bolaños M, Radeva P (2018) Grab, pay, and eat: semantic food detection for smart restaurants. IEEE Trans Multimedia 20(12):3266–3275

    Article  Google Scholar 

  51. Shimoda W, Yanai K (2015) Cnn-based food image segmentation without pixel-wise annotation. In: International conference on image analysis and processing. Springer, pp 449–457

  52. Shroff G, Smailagic A, Siewiorek DP (2008) Wearable context-aware food recognition for calorie monitoring. In: 2008 12th Ieee International symposium on wearable computers. IEEE, pp 119–120

  53. Chang Y-W, Chen Y-Y (2006) An improve scheme of segmenting colour food image by robust algorithm. Proc Algo 2006:331–335

    Google Scholar 

  54. De Silva, LC, Pereira A, Punchihewa A (2005) Food classification using color imaging. In: Published in the proceedings of the annual conference on image and vision computing (IVCNZ 2005), University of Otago, Dunedin, New Zealand. Citeseer, pp 28–29

  55. Meyers A, Johnston N, Rathod V, Korattikara A, Gorban A, Silberman N, Guadarrama S, Papandreou G, Huang J, Murphy KP (2015) Im2calories: towards an automated mobile vision food diary. In: Proceedings of the IEEE international conference on computer vision, pp 1233–1241

  56. Morikawa C, Sugiyama H, Aizawa K (2012) Food region segmentation in meal images using touch points. In: Proceedings of the ACM multimedia 2012 workshop on multimedia for cooking and eating activities, pp 7–12

  57. Shimoda W, Yanai K (2016) Foodness proposal for multiple food detection by training of single food images. In: Proceedings of the 2nd international workshop on multimedia assisted dietary management, pp 13–21

  58. Zhu F, Bosch M, Khanna N, Boushey CJ, Delp EJ (2011) Multilevel segmentation for food classification in dietary assessment. In: 2011 7th International symposium on image and signal processing and analysis (ISPA). IEEE, pp 337–342

  59. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  60. Okamoto K, Yanai K (2021) Uec-foodpix complete: a large-scale food image segmentation dataset. In: Pattern recognition. ICPR International workshops and challenges: Virtual Event, Proceedings, Part V. Springer, pp 647–659. Accessed 15 Jan 2021

  61. Contributors M (2020) MMSegmentation, an Open Source Semantic Segmentation Toolbox. https://github.com/open-mmlab/mmsegmentation

  62. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by scientific research program of Beijing Municipal Education Commission KZ202110011017, National Natural Science Foundation of China (No. 62277001) and Beijing Natural Science Foundation (No. L233026).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haisheng Li.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, X., Li, H., Wang, X. et al. CANet: cross attention network for food image segmentation. Multimed Tools Appl 83, 60987–61006 (2024). https://doi.org/10.1007/s11042-023-17916-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17916-z

Keywords

Profiles

  1. Haisheng Li