Abstract
The strategy of counting the carbohydrates in consumed foods is recommended by scientific societies as a way to improve the quality of life of life of diabetes patients. Monitoring food intake can be facilitated through the use of a mobile application that automatically recognizes the foods in a meal. Automatically recognizing food images is considered a challenging task for computer vision due to the similarity between foods. This challenge increases when the goal is to classify foods from a specific region and with a dataset containing only foods from that region and therefore small compared to public datasets from other countries. For this task, this work presents a model that uses a set of Fully Convolutional Networks (FCNs) to generate segmentations of foods in a meal. These segmentations are processed by an algorithm that uses digital image processing techniques to identify the foods. The model has low training costs due to being scalable, that is, the model can be trained to recognize a new food without the need to retrain the entire model. In the tests, foods consumed in Brazil were used, obtaining an accuracy of 98% and a recall of 88%.
Similar content being viewed by others
Data availability
The datasets generated by the research are available in a data repository and can be accessed at https://doi.org/10.17632/7n36jtcpv3.1.
References
Bossard L, Guillaumin M, Gool LV (2014) Food-101 - mining discriminative components with random forests. In: Fleet D, Pajdla T, Schiele B et al (eds) Computer Vision - ECCV 2014. Springer International Publishing, Cham, pp 446–461
Brain G (2015) Tensorflow. https://www.tensorflow.org/
Carvalho MA (2023) Brazilian food images. https://doi.org/10.17632/7n36jtcpv3.1
Chang X, Ren P, Xu P et al (2023) A comprehensive survey of scene graphs: Generation and application. IEEE Trans Pattern Anal Mach Intell 45(1):1–26. https://doi.org/10.1109/TPAMI.2021.3137605
Cheng Z, Chang X, Zhu L, et al (2019) Mmalfm: Explainable recommendation by leveraging reviews and images. ACM Trans Inf Syst 37(2). https://doi.org/10.1145/3291060, https://doi-org.ez38.periodicos.capes.gov.br/10.1145/3291060
Cho NH, Shaw JE, Huang Y et al (2018) Diabetes research and clinical practice. Diabetes Res Clin Pract 138:271–281. https://doi.org/10.1016/j.diabres.2018.02.023
Chun M, Jeong H, Lee H et al (2022) Development of korean food image classification model using public food image dataset and deep learning methods. IEEE Access 10:128732–128741. https://doi.org/10.1109/ACCESS.2022.3227796
Community N (2005) Numpy:. https://numpy.org/
Core T (2022) Image segmentation. https://www.tensorflow.org/tutorials/images/segmentation
Corporation I (2000) Opencv:. https://docs.opencv.org
EGE T, YANAI K (2018) Image-based food calorie estimation using recipe information. IEICE Transactions on Information and Systems E101.D(5):1333–1341. https://doi.org/10.1587/transinf.2017MVP0027
Freitas CNC, Cordeiro FR, Macario V (2020) Myfood: A food segmentation and classification system to aid nutritional monitoring. In: 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp 234–239, https://doi.org/10.1109/SIBGRAPI51738.2020.00039
Géron A (2019) Mãos à Obra: Aprendizado de Máquina com Scikit-Learn & Tensorflow. Alta Boks, Rio de Janeiro
Gonzales RC, Woods RE (2010) Digital image processing, 3rd edn. Person Education, São Paulo
Google (2014) Colab. https://colab.research.google.com/notebooks/welcome.ipynb?hl=en
Hollemans M (2018) Mobilenet version 2. https://machinethink.net/blog/mobilenet-v2/
Howard AG, Zhu M, Chen B, et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861
Ibtehaz N, Rahman MS (2020) Multiresunet : Rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw 121:74–87. https://doi.org/10.1016/j.neunet.2019.08.025
Islam KT, Wijewickrema S, Pervez M, et al (2018a) An exploration of deep transfer learning for food image classification. In: 2018 Digital Image Computing: Techniques and Applications (DICTA), pp 1–5, https://doi.org/10.1109/DICTA.2018.8615812
Islam MT, Karim Siddique BN, Rahman S, et al (2018b) Food image classification with convolutional neural network. In: 2018 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), pp 257–262, https://doi.org/10.1109/ICIIBMS.2018.8550005
Isola P, Zhu JY, Zhou T, et al (2017) Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5967–5976, https://doi.org/10.1109/CVPR.2017.632
Kagaya H, Aizawa K, Ogawa M (2014) Food detection and recognition using convolutional neural network. In: Proceedings of the 22nd ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, pp 1085–1088, https://doi.org/10.1145/2647868.2654970
Kawano Y, Yanai K (2014) Foodcam-256: A large-scale real-time mobile food recognitionsystem employing high-dimensional features and compression of classifier weights. In: Proceedings of the 22nd ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, pp 761–762, https://doi.org/10.1145/2647868.2654869
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. International Conference on Learning Representations
Lecun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Li M, Huang PY, Chang X et al (2023) Video pivoting unsupervised multi-modal machine translation. IEEE Trans Pattern Anal Mach Intell 45(3):3918–3932. https://doi.org/10.1109/TPAMI.2022.3181116
Li Z, Nie F, Chang X et al (2018) Rank-constrained spectral clustering with flexible embedding. IEEE Transactions on Neural Networks and Learning Systems 29(12):6073–6082. https://doi.org/10.1109/TNNLS.2018.2817538
Li Z, Nie F, Chang X et al (2018) Dynamic affinity graph construction for spectral clustering using multiple features. IEEE Transactions on Neural Networks and Learning Systems 29(12):6323–6332. https://doi.org/10.1109/TNNLS.2018.2829867
Liu C, Cao Y, Luo Y et al (2018) A new deep learning-based food recognition system for dietary assessment on an edge computing service infrastructure. IEEE Trans Serv Comput 11(2):249–261. https://doi.org/10.1109/TSC.2017.2662008
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3431–3440, https://doi.org/10.1109/CVPR.2015.7298965
Memis S, Arslan B, Batur OZ, et al (2020) A comparative study of deep learning methods on food classification problem. In: 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), pp 1–4, https://doi.org/10.1109/ASYU50717.2020.9259904
Myers A, Johnston N, Rathod V, et al (2015) Im2calories: Towards an automated mobile vision food diary. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1233–1241, https://doi.org/10.1109/ICCV.2015.146
Pedrine H, Schwartz WR (2008) Análise de imagens digitais: princípios, algoritmos e aplicações. Cengage Learning, São Paulo
Pouladzadeh P, Yassine A, Shirmohammadi S (2015) Foodd: Food detection dataset for calorie measurement using food images. In: Murino V, Puppo E, Sona D et al (eds) New Trends in Image Analysis and Processing - ICIAP 2015 Workshops. Springer International Publishing, Cham, pp 441–448
Reddy VH, Kumari S, Muralidharan V, et al (2019) Food recognition and calorie measurement using image processing and convolutional neural network. In: 2019 4th International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT), pp 109–115, https://doi.org/10.1109/RTEICT46194.2019.9016694
Rother C, Kolmogorov V, Blake A (2004) “grabcut”: Interactive foreground extraction using iterated graph cuts. In: ACM SIGGRAPH 2004 Papers. Association for Computing Machinery, New York, NY, USA, pp 309–314, https://doi.org/10.1145/1186562.1015720
Samraj A, D. S, K.A. D, et al (2020) Food genre classification from food images by deep neural network with tensorflow and keras. In: 2020 Seventh International Conference on Information Technology Trends (ITT), pp 228–231, https://doi.org/10.1109/ITT51279.2020.9320870
Sandler M, Howard A, Zhu M, et al (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4510–4520, https://doi.org/10.1109/CVPR.2018.00474
Singla A, Yuan L, Ebrahimi T (2016) Food/non-food image classification and food categorization using pre-trained googlenet model. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management. Association for Computing Machinery, New York, NY, USA, pp 3–11, https://doi.org/10.1145/2986035.2986039
Suzuki K, Horiba I, Sugie N (2003) Linear-time connected-component labeling based on sequential local operations. Comput Vis Image Underst 89(1):1–23. https://doi.org/10.1016/S1077-3142(02)00030-9
Szegedy C, Ioffe S, Vanhoucke V (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR abs/1602.07261. http://arxiv.org/abs/1602.07261
Vaz EC, Porfírio GJM, de Carvalho Nunes HR, et al (2018) Effectiveness and safety of carbohydrate counting in the management of adult patients with type 1 diabetes mellitus: a systematic review and meta-analysis. Archives of Endocrinology and Metabolism https://doi.org/10.20945/2359-3997000000045
Yan C, Chang X, Luo M, et al (2020) Self-weighted robust lda for multiclass classification with edge classes. ACM Trans Intell Syst Technol 12(1). https://doi.org/10.1145/3418284, https://doi-org.ez38.periodicos.capes.gov.br/10.1145/3418284
Yan C, Chang X, Li Z et al (2022) Zeronas: Differentiable generative adversarial networks search for zero-shot learning. IEEE Trans Pattern Anal Mach Intell 44(12):9733–9740. https://doi.org/10.1109/TPAMI.2021.3127346
Yuan D, Chang X, Li Z, et al (2022) Learning adaptive spatial-temporal context-aware correlation filters for uav tracking. ACM Trans Multimedia Comput Commun Appl 18(3). https://doi.org/10.1145/3486678, https://doi-org.ez38.periodicos.capes.gov.br/10.1145/3486678
Yu X, Yu Z, Ramalingam S (2018) Learning strict identity mappings in deep residual networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4432–4440, https://doi.org/10.1109/CVPR.2018.00466
Zhang L, Chang X, Liu J et al (2023) Tn-zstad: Transferable network for zero-shot temporal activity detection. IEEE Trans Pattern Anal Mach Intell 45(3):3848–3861. https://doi.org/10.1109/TPAMI.2022.3183586
Zhang Z, Sabuncu MR (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. CoRR abs/1805.07836
Zoph B, Vasudevan V, Shlens J, et al (2018) Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8697–8710, https://doi.org/10.1109/CVPR.2018.00907
Acknowledgements
This work was supported by CAPES, CNPq, and FAPEMIG.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Carvalho, M.A., Pimenta, T.C., Silvério, A.C.P. et al. Computer vision model for food identification in meals from the segmentation obtained by a set of fully convolutional networks. J Ambient Intell Human Comput 14, 16879–16890 (2023). https://doi.org/10.1007/s12652-023-04703-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-023-04703-9