Skip to main content

MFNet: A Channel Segmentation-Based Hierarchical Network for Multi-food Recognition

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Abstract

Due to significant differences between foods from different regions, there is a relative lack of publicly available multi-food image datasets, which is particularly evident in the field of Chinese cuisine. To alleviate this issue, we create a large-scale Chinese food image dataset, JNU FoodNet, which contains 17,128 original images. Although a considerable amount of prior research has focused on single food image recognition, such methods are not suitable for recognizing multiple food items in one image. Moreover, in a food image, there are usually multiple food regions, and the key features of each region tend to gradually disperse from the center to the edges, leading to cumulative errors. To overcome this difficulty, we design a selective discriminative feature constrained module, SGC, which restricts model attention to regions from a global information perspective. Furthermore, we propose a progressive hierarchical network, MFNet, based on channel segmentation from both the whole image and local region perspectives, combined with the SGC branch. Experimental results show that MFNet achieves state-of-the-art mAP values on JNU FoodNet, UEC Food-100, and UEC Food-256.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alaeddine, H., Jihene, M.: Wide deep residual networks in networks. Multimedia Tools Appl. 82(5), 7889–7899 (2023)

    Article  Google Scholar 

  2. Deng, L., et al.: Mixed-dish recognition with contextual relation networks. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 112–120 (2019)

    Google Scholar 

  3. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  4. Ege, T., Yanai, K.: Estimating food calories for multiple-dish food photos. In: 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), pp. 646–651. IEEE (2017)

    Google Scholar 

  5. Gu, Y., Cai, L., Wang, J., Chen, Y., Zhu, P., Gao, M.: Chinese dish detection based on dish-yolov5. In: 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2022)

    Google Scholar 

  6. Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  7. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

    Google Scholar 

  8. Jiang, L., Qiu, B., Liu, X., Huang, C., Lin, K.: Deepfood: food image analysis and dietary assessment via deep model. IEEE Access 8, 47477–47489 (2020)

    Article  Google Scholar 

  9. Jocher, G.: YOLOv5 by Ultralytics (2020). https://doi.org/10.5281/zenodo.3908559. https://github.com/ultralytics/yolov5

  10. Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics (2023). https://github.com/ultralytics/ultralytics

  11. Kawano, Y., Yanai, K.: FoodCam-256: a large-scale real-time mobile food recognition system employing high-dimensional features and compression of classifier weights. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 761–762 (2014)

    Google Scholar 

  12. Kim, J., Lee, Y.K., Herr, P.M.: The impact of menu size on calorie estimation. Int. J. Hosp. Manag. 100, 103083 (2022)

    Article  Google Scholar 

  13. Li, C., et al.: Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)

  14. Liu, Y., Shao, Z., Hoffmann, N.: Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv preprint arXiv:2112.05561 (2021)

  15. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  16. Ma, N., Zhang, X., Zheng, H.T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)

    Google Scholar 

  17. Mao, R., He, J., Shao, Z., Yarlagadda, S.K., Zhu, F.: Visual aware hierarchy based food recognition. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12665, pp. 571–598. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68821-9_47

    Chapter  Google Scholar 

  18. Matsuda, Y., Hoashi, H., Yanai, K.: Recognition of multiple-food images by detecting candidate regions. In: 2012 IEEE International Conference on Multimedia and Expo, pp. 25–30. IEEE (2012)

    Google Scholar 

  19. Matsuda, Y., Yanai, K.: Multiple-food recognition considering co-occurrence employing manifold ranking. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 2017–2020. IEEE (2012)

    Google Scholar 

  20. Metwalli, A.S., Shen, W., Wu, C.Q.: Food image recognition based on densely connected convolutional neural networks. In: 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 027–032. IEEE (2020)

    Google Scholar 

  21. Min, W.Q., Liu, L.H., Liu, Y.X., Luo, M.J., Jiang, S.Q.: A survey on food image recognition. Chin. J. Comput. 45(3) (2022)

    Google Scholar 

  22. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

    Google Scholar 

  23. Sheng, G., Sun, S., Liu, C., Yang, Y.: Food recognition via an efficient neural network with transformer grouping. Int. J. Intell. Syst. 37(12), 11465–11481 (2022)

    Article  Google Scholar 

  24. Shimoda, W., Yanai, K.: CNN-based food image segmentation without pixel-wise annotation. In: Murino, V., Puppo, E., Sona, D., Cristani, M., Sansone, C. (eds.) ICIAP 2015. LNCS, vol. 9281, pp. 449–457. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23222-5_55

    Chapter  Google Scholar 

  25. Sun, J., Radecka, K., Zilic, Z.: Exploring better food detection via transfer learning. In: 2019 16th International Conference on Machine Vision Applications (MVA), pp. 1–6. IEEE (2019)

    Google Scholar 

  26. Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)

    Google Scholar 

  27. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)

    Google Scholar 

  28. Wang, J., Lv, P., Wang, H., Shi, C.: SAR-U-Net: squeeze-and-excitation block and atrous spatial pyramid pooling based residual U-Net for automatic liver segmentation in computed tomography. Comput. Methods Programs Biomed. 208, 106268 (2021)

    Article  Google Scholar 

  29. Yu, T., Li, X., Cai, Y., Sun, M., Li, P.: \(\rm {S}^{2}\)-MLPV2: improved spatial-shift MLP architecture for vision. arXiv preprint arXiv:2108.01072 (2021)

  30. Zhang, H., et al.: Resnest: split-attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2736–2746 (2022)

    Google Scholar 

  31. Zhang, Q.L., Yang, Y.B.: SA-Net: shuffle attention for deep convolutional neural networks. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239. IEEE (2021)

    Google Scholar 

  32. Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12993–13000 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jin, K., Chen, J., Song, T. (2024). MFNet: A Channel Segmentation-Based Hierarchical Network for Multi-food Recognition. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14433. Springer, Singapore. https://doi.org/10.1007/978-981-99-8546-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8546-3_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8545-6

  • Online ISBN: 978-981-99-8546-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics