Skip to main content

Advertisement

Log in

Benchmarking algorithms for food localization and semantic segmentation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The problem of food segmentation is quite challenging since food is characterized by intrinsic high intra-class variability. Also, segmentation of food images taken in-the-wild may be characterized by acquisition artifacts, and that could be problematic for the segmentation algorithms. A proper evaluating of segmentation algorithms is of paramount importance for the design and improvement of food analysis systems that can work in less-than-ideal real scenarios. In this paper, we evaluate the performance of different deep learning-based segmentation algorithms in the context of food. Due to the lack of large-scale food segmentation datasets, we initially create a new dataset composed of 5000 images of 50 diverse food categories. The images are accurately annotated with pixel-wise annotations. In order to test the algorithms under different conditions, the dataset is augmented with the same images but rendered under different acquisition distortions that comprise illuminant change, JPEG compression, Gaussian noise, and Gaussian blur. The final dataset is composed of 120,000 images. Using standard benchmark measures, we conducted extensive experiments to evaluate ten state-of-the-art segmentation algorithms on two tasks: food localization and semantic food segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://www.ivl.disco.unimib.it/activities/benchmarking-food-segmentation/.

References

  1. Aguilar E, nos MB, Radeva P (2019) Regularized uncertainty-based multi-task learning model for food analysis. J Vis Commun Image Represent 60:360–370

    Article  Google Scholar 

  2. Aguilar E, Remeseiro B, Bolaños M, Radeva P (2018) Grab, pay, and eat: semantic food detection for smart restaurants. IEEE Trans Multim 20(12):3266–3275

    Article  Google Scholar 

  3. Anthimopoulos M, Dehais J, Diem P, Mougiakakou S (2013) Segmentation and recognition of multi-food meal images for carbohydrate counting. In: 13th IEEE International Conference on BioInformatics and BioEngineering. IEEE, pp 1–4

  4. Anthimopoulos MM, Gianola L, Scarnato L, Diem P, Mougiakakou SG (2014) A food recognition system for diabetic patients based on an optimized bag-of-features model. IEEE J Biomed Health Inf 18(4):1261–1271

    Article  Google Scholar 

  5. Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916

    Article  Google Scholar 

  6. Aslan S, Ciocca G, Schettini R (2017) On comparing color spaces for food segmentation. In: Int. Conf. on Image Analysis and Processing, pp 435–443

  7. Aslan S, Ciocca G, Schettini R (2018) Semantic food segmentation for automatic dietary monitoring. In: IEEE 8th International Conference on consumer electronics, pp 1–4

  8. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  9. Bettadapura V, Thomaz E, Parnami A, Abowd GD, Essa I (2015) Leveraging context to support automated food recognition in restaurants. In: 2015 IEEE Winter Conference on applications of computer vision. IEEE, pp 580–587

  10. Bianco S, Celona L, Schettini R (2016) Robust smile detection using convolutional neural networks. J Electron Imaging 25(6):063002

    Article  Google Scholar 

  11. Bianco S, Cusano C, Napoletano P, Schettini R (2013) On the robustness of color texture descriptors across illuminants. In: International Conference on image analysis and processing. Springer, pp 652–662

  12. Bolanos M, Radeva P (2016) Simultaneous food localization and recognition. In: 23rd IEE Int. Conf. on Pattern Recognition (ICPR), pp 3140–3145

  13. Bossard L, Guillaumin M, Gool LV (2014) Food-101–mining discriminative components with random forests. In: European Conf. on computer vision (ECCV). Springer, pp 446–461

  14. Chen J, Ngo CW (2016) Deep-based ingredient recognition for cooking recipe retrieval. In: Proc. of the 2016 ACM on Multimedia Conference. ACM, pp 32–41

  15. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. CoRR abs/1412.7062

  16. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  17. Chen M, Dhingra K, Wu W, Yang L, Sukthankar R, Yang J (2009) Pfid: Pittsburgh fast-food image dataset. In: Image Processing (ICIP), 2009 16th IEEE International Conference on. IEEE, pp 289–292

  18. Chen MY, Yang YH, Ho CJ, Wang SH, Liu SM, Chang E, Yeh CH, Ouhyoung M (2012) Automatic chinese food identification and quantity estimation. In: SIGGRAPH Asia 2012 Technical Briefs. ACM, p 29

  19. Chen X, Zhu Y, Zhou H, Diao L, Wang D (2017) Chinesefoodnet: A large-scale image dataset for chinese food recognition. arXiv preprint arXiv:1705.02743

  20. Ciocca G, Corchs S, Gasparini F, Schettini R (2014) How to assess image quality within a workflow chain: an overview. Int J Digit Libr 15(1):1–25

    Article  Google Scholar 

  21. Ciocca G, Napoletano P, Schettini R (2015) Food recognition and leftover estimation for daily diet monitoring. In: New Trends in image analysis and processing—ICIAP 2015 Workshops, Lecture Notes in Computer Science, vol. 9281, pp 334–341

  22. Ciocca G, Napoletano P, Schettini R (2015) IAT–image annotation tool: manual. arXiv:1502.05212

  23. Ciocca G, Napoletano P, Schettini R (2017) Food recognition: a new dataset, experiments and results. IEEE J Biomed Health Inf 21(3):588–598

    Article  Google Scholar 

  24. Ciocca G, Napoletano P, Schettini R (2017) Learning cnn-based features for retrieval of food images. In: New Trends in image analysis and processing–ICIAP 2017, pp 426–434

  25. Ciocca G, Napoletano P, Schettini R (2018) Cnn-based features for retrieval and classification of food images. Comput Vis Image Underst 176–177:70–77

    Article  Google Scholar 

  26. Corchs S, Gasparini F (2017) A multidistortion database for image quality. In: International Workshop on computational color imaging. Springer, pp 95–104

  27. Cusano C, Napoletano P, Schettini R (2014) Combining local binary pattern and local color contrast for texture classification under varying illumination. J Opt Soc Am A 31(7):1453–1461

    Article  Google Scholar 

  28. Dehais J, Anthimopoulos M, Mougiakakou S (2016) Food image segmentation for dietary assessment. In: 2nd Int. Workshop on multimedia assisted dietary management, pp 23–28

  29. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 248–255

  30. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  31. Fang S, Liu C, Tahboub K, Zhu F, Delp EJ, Boushey CJ (2018) ctada: The design of a crowdsourcing tool for online food image identification and segmentation. In: 2018 IEEE Southwest Symposium on image analysis and interpretation (SSIAI), pp 25–28

  32. Farinella G, Moltisanti M, Battiato S (2014) Classifying food images represented as bag of textons. In: Image Processing (ICIP), 2014 IEEE International Conference on, pp 5212–5216

  33. Farinella GM, Allegra D, Moltisanti M, Stanco F, Battiato S (2016) Retrieval and classification of food images. Comput Biol Med 77:23–39

    Article  Google Scholar 

  34. Fu Z, Chen D, Li H (2017) Chinfood1000: A large benchmark dataset for chinese food recognition. In: International Conference on intelligent computing. Springer, pp 273–281

  35. Gao J, Tan W, Ma L, Wang Y, Tang W (2019) Musefood: multi-sensor-based food volume estimation on smartphones. arXiv, CoRR abs/1903.07437

  36. Ghadiyaram D, Bovik AC (2015) Massive online crowdsourced study of subjective and objective picture quality. IEEE Trans Image Process 25(1):372–387

    Article  MathSciNet  Google Scholar 

  37. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE Conference on computer vision and pattern recognition (CVPR), pp 580–587

  38. Hoashi H, Joutou T, Yanai K (2010) Image recognition of 85 food categories by feature fusion. In: Multimedia (ISM), 2010 IEEE International Symposium on, pp 296–301. IEEE

  39. Aslan S, Ciocca G, Schettini R (2018) Semantic segmentation of food images for automatic dietary monitoring. In: 2018 26th Signal Processing and Communications Applications Conference (SIU). IEEE, pp 1–4

  40. Ege T, Yanai K (2018) Multi-task learning of dish detection and calorie estimation. In: Proceedings of the Joint Workshop on multimedia for cooking and eating activities and multimedia assisted dietary management. ACM, pp 53–58

  41. Inunganbi S, Seal A, Khanna P (2018) Classification of food images through interactive image segmentation. In: Intelligent Information and Database Systems, pp 519–528

  42. Jiang S, Min W, Liu L, Luo Z (2019) Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans Image Process 29:265–276

    Article  MathSciNet  Google Scholar 

  43. Joutou T, Yanai K (2009) A food image recognition system with multiple kernel learning. In: Image Processing (ICIP), 2009 16th IEEE International Conference on. IEEE, pp 285–288

  44. Kagaya H, Aizawa K, Ogawa M (2014) Food detection and recognition using convolutional neural network. In: Proceedings of the 22nd ACM international conference on Multimedia. ACM, pp 1085–1088

  45. Kaur P, Sikka K, Wang W, Belongie S, Divakaran A (2019) Foodx-251: a dataset for fine-grained food classification. arXiv preprint arXiv:1907.06167

  46. Kawano Y, Yanai K (2014) Automatic expansion of a food image dataset leveraging existing categories with domain adaptation. In: Proc. of ECCV Workshop on Transferring and Adapting Source Knowledge in Computer Vision (TASK-CV), pp 3–17

  47. Kinga D, Ba L (2015) Adam: a method for stochastic optimization. In: International Conference on learning representations (ICLR), vol. 5

  48. Koziarski M, Cyganek B (2017) Image recognition with deep neural networks in presence of noise-dealing with and taking advantage of distortions. Integr Comput Aided Eng 24(4):337–349

    Article  Google Scholar 

  49. Liu C, Cao Y, Luo Y, Chen G, Vokkarane V, Yunsheng M, Chen S, Hou P (2018) A new deep learning-based food recognition system for dietary assessment on an edge computing service infrastructure. IEEE Trans Serv Comput 11(2):249–261

    Article  Google Scholar 

  50. Lo SY, Hang HM, Chan SW, Lin JJ (2018) Efficient dense modules of asymmetric convolution for real-time semantic segmentation. arXiv preprint arXiv:1809.06323

  51. Lu Y, Allegra D, Anthimopoulos M, Stanco F, Farinella GM, Mougiakakou S (2018) A multi-task learning approach for meal assessment. In: Proceedings of the Joint Workshop on multimedia for cooking and eating activities and multimedia assisted dietary management, CEA/MADiMa ’18, pp 46–52

  52. Mariappan A, Bosch M, Zhu F, Boushey CJ, Kerr DA, Ebert DS, Delp EJ (2009) Personal dietary assessment using mobile devices. In: Proceedings of SPIE–the International Society for Optical Engineering, vol. 7246, pp 72460Z–72460Z–12

  53. Martinel N, Foresti GL, Micheloni C (2018) Wide-slice residual networks for food recognition. In: 2018 IEEE Winter Conference on applications of computer vision (WACV), pp 567–576

  54. Matsuda Y, Hoashi H, Yanai K (2012) Recognition of multiple-food images by detecting candidate regions. In: 2012 IEEE International Conference on Multimedia and Expo. IEEE, pp 25–30

  55. Mazzini D (2018) Guided upsampling network for real-time semantic segmentation. In: British Machine Vision Conference (BMVC), p 117

  56. Mazzini D, Raimondo S (2019) Spatial sampling network for fast scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 97–107

  57. Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. arXiv preprint arXiv:1803.06815

  58. Meilǎ M (2005) Comparing clusterings: an axiomatic view. In: Proceedings of the 22nd international conference on Machine learning. ACM, pp 577–584

  59. Mezgec S, Koroušić Seljak B (2017) Nutrinet: a deep learning food and drink image recognition system for dietary assessment. Nutrients 9(7):657

    Article  Google Scholar 

  60. Min W, Bao B, Mei S, Zhu Y, Rui Y, Jiang S (2018) You are what you eat: exploring rich recipe information for cross-region food analysis. IEEE Trans Multim 20(4):950–964

    Article  Google Scholar 

  61. Min W, Jiang S, Liu L, Rui Y, Jain R (2019) A survey on food computing. ACM Comput Surv (CSUR) 52(5):1–36

    Article  Google Scholar 

  62. Min W, Jiang S, Sang J, Wang H, Liu X, Herranz L (2017) Being a supercook: joint food attributes and multimodal content modeling for recipe retrieval and exploration. IEEE Trans Multim 19(5):1100–1113

    Article  Google Scholar 

  63. Min W, Liu L, Luo Z, Jiang S (2019) Ingredient-guided cascaded multi-attention network for food recognition. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 1331–1339

  64. Ming ZY, Chen J, Cao Y, Forde C, Ngo CW, Chua TS (2018) Food photo recognition for dietary tracking: System and experiment. In: International Conference on Multimedia Modeling, pp 129–141

  65. Myers A, Johnston N, Rathod V, Korattikara A, Gorban A, Silberman N, Guadarrama S, Papandreou G, Huang J, Murphy K (2015) Im2calories: Towards an automated mobile vision food diary. In: IEEE Int. Conf. on Computer Vision (ICCV), pp 1233–1241

  66. Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147

  67. Qiu J, Lo FPW, Sun Y, Lo B (2019) Mining discriminative food regions for accurate food recognition. In: British Machine Vision Conference

  68. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Article  Google Scholar 

  69. Romera E, Alvarez JM, Bergasa LM, Arroyo R (2018) ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272

    Article  Google Scholar 

  70. Salvador A, Hynes N, Aytar Y, Marin J, Ofli F, Weber I, Torralba A (2017) Learning cross-modal embeddings for cooking recipes and food images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3068–3076

  71. Sheikh HR, Sabir MF, Bovik AC (2006) A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans Image Process 15(11):3440–3451

    Article  Google Scholar 

  72. Shimoda W, Yanai K (2015) Cnn-based food image segmentation without pixel-wise annotation. In: International Conference on image analysis and processing. Springer, pp 449–457

  73. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  74. Subhi MA, Ali SH, Mohammed MA (2019) Vision-based approaches for automatic food recognition and dietary assessment: a survey. IEEE Access 7:35370–35381

    Article  Google Scholar 

  75. Tanno R, Okamoto K, Yanai K (2016) Deepfoodcam: A dcnn-based real-time mobile food recognition system. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management. ACM, pp 89–89

  76. Wang Y, Zhu F, Boushey CJ, Delp EJ (2017) Weakly supervised food image segmentation using class activation maps. In: 2017 IEEE International Conference on Image Processing (ICIP), pp 1277–1281

  77. Wu H, Merler M, Uceda-Sosa R, Smith JR (2016) Learning to make better mistakes: Semantics-aware visual food recognition. In: Proceedings of 24th ACM international conference on Multimedia, pp 172–176

  78. Xiong B, Jain SD, Grauman K (2018) Pixel objectness: learning to segment generic objects automatically in images and videos. arXiv preprint arXiv:1808.04702

  79. Yanai K, Kawano Y (2015) Food image recognition using deep convolutional network with pre-training and fine-tuning. In: 2015 IEEE International Conference on Multimedia Expo Workshops (ICMEW), pp 1–6

  80. Zhu F, Bosch M, Khanna N, Boushey CJ, Delp EJ (2015) Multiple hypotheses image segmentation and classification with application to dietary assessment. IEEE J Biomed Health Inf 19(1):377–388

    Article  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the K40, Titan Xp, and Titan X GPU cards used for this research. This work is published in the context of the project FooDesArt: Food Design Arte - L’Arte del Benessere , CUP (Codice Unico Progetto - Unique Project Code): E48I16000350009 - Call “Smart Fashion and Design”, cofunded by POR FESR 2014-2020 (Programma Operativo Regionale, Fondo Europeo di Sviluppo Regionale - Regional Operational Programme, European Regional Development Fund).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gianluigi Ciocca.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aslan, S., Ciocca, G., Mazzini, D. et al. Benchmarking algorithms for food localization and semantic segmentation. Int. J. Mach. Learn. & Cyber. 11, 2827–2847 (2020). https://doi.org/10.1007/s13042-020-01153-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01153-z

Keywords

Navigation