Abstract
Due to significant differences between foods from different regions, there is a relative lack of publicly available multi-food image datasets, which is particularly evident in the field of Chinese cuisine. To alleviate this issue, we create a large-scale Chinese food image dataset, JNU FoodNet, which contains 17,128 original images. Although a considerable amount of prior research has focused on single food image recognition, such methods are not suitable for recognizing multiple food items in one image. Moreover, in a food image, there are usually multiple food regions, and the key features of each region tend to gradually disperse from the center to the edges, leading to cumulative errors. To overcome this difficulty, we design a selective discriminative feature constrained module, SGC, which restricts model attention to regions from a global information perspective. Furthermore, we propose a progressive hierarchical network, MFNet, based on channel segmentation from both the whole image and local region perspectives, combined with the SGC branch. Experimental results show that MFNet achieves state-of-the-art mAP values on JNU FoodNet, UEC Food-100, and UEC Food-256.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alaeddine, H., Jihene, M.: Wide deep residual networks in networks. Multimedia Tools Appl. 82(5), 7889–7899 (2023)
Deng, L., et al.: Mixed-dish recognition with contextual relation networks. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 112–120 (2019)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Ege, T., Yanai, K.: Estimating food calories for multiple-dish food photos. In: 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), pp. 646–651. IEEE (2017)
Gu, Y., Cai, L., Wang, J., Chen, Y., Zhu, P., Gao, M.: Chinese dish detection based on dish-yolov5. In: 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2022)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Jiang, L., Qiu, B., Liu, X., Huang, C., Lin, K.: Deepfood: food image analysis and dietary assessment via deep model. IEEE Access 8, 47477–47489 (2020)
Jocher, G.: YOLOv5 by Ultralytics (2020). https://doi.org/10.5281/zenodo.3908559. https://github.com/ultralytics/yolov5
Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics (2023). https://github.com/ultralytics/ultralytics
Kawano, Y., Yanai, K.: FoodCam-256: a large-scale real-time mobile food recognition system employing high-dimensional features and compression of classifier weights. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 761–762 (2014)
Kim, J., Lee, Y.K., Herr, P.M.: The impact of menu size on calorie estimation. Int. J. Hosp. Manag. 100, 103083 (2022)
Li, C., et al.: Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)
Liu, Y., Shao, Z., Hoffmann, N.: Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv preprint arXiv:2112.05561 (2021)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Mao, R., He, J., Shao, Z., Yarlagadda, S.K., Zhu, F.: Visual aware hierarchy based food recognition. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12665, pp. 571–598. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68821-9_47
Matsuda, Y., Hoashi, H., Yanai, K.: Recognition of multiple-food images by detecting candidate regions. In: 2012 IEEE International Conference on Multimedia and Expo, pp. 25–30. IEEE (2012)
Matsuda, Y., Yanai, K.: Multiple-food recognition considering co-occurrence employing manifold ranking. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 2017–2020. IEEE (2012)
Metwalli, A.S., Shen, W., Wu, C.Q.: Food image recognition based on densely connected convolutional neural networks. In: 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 027–032. IEEE (2020)
Min, W.Q., Liu, L.H., Liu, Y.X., Luo, M.J., Jiang, S.Q.: A survey on food image recognition. Chin. J. Comput. 45(3) (2022)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Sheng, G., Sun, S., Liu, C., Yang, Y.: Food recognition via an efficient neural network with transformer grouping. Int. J. Intell. Syst. 37(12), 11465–11481 (2022)
Shimoda, W., Yanai, K.: CNN-based food image segmentation without pixel-wise annotation. In: Murino, V., Puppo, E., Sona, D., Cristani, M., Sansone, C. (eds.) ICIAP 2015. LNCS, vol. 9281, pp. 449–457. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23222-5_55
Sun, J., Radecka, K., Zilic, Z.: Exploring better food detection via transfer learning. In: 2019 16th International Conference on Machine Vision Applications (MVA), pp. 1–6. IEEE (2019)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
Wang, J., Lv, P., Wang, H., Shi, C.: SAR-U-Net: squeeze-and-excitation block and atrous spatial pyramid pooling based residual U-Net for automatic liver segmentation in computed tomography. Comput. Methods Programs Biomed. 208, 106268 (2021)
Yu, T., Li, X., Cai, Y., Sun, M., Li, P.: \(\rm {S}^{2}\)-MLPV2: improved spatial-shift MLP architecture for vision. arXiv preprint arXiv:2108.01072 (2021)
Zhang, H., et al.: Resnest: split-attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2736–2746 (2022)
Zhang, Q.L., Yang, Y.B.: SA-Net: shuffle attention for deep convolutional neural networks. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239. IEEE (2021)
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12993–13000 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jin, K., Chen, J., Song, T. (2024). MFNet: A Channel Segmentation-Based Hierarchical Network for Multi-food Recognition. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14433. Springer, Singapore. https://doi.org/10.1007/978-981-99-8546-3_2
Download citation
DOI: https://doi.org/10.1007/978-981-99-8546-3_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8545-6
Online ISBN: 978-981-99-8546-3
eBook Packages: Computer ScienceComputer Science (R0)