MFNet: A Channel Segmentation-Based Hierarchical Network for Multi-food Recognition

Jin, Kelei; Chen, Jing; Song, Tingting

doi:10.1007/978-981-99-8546-3_2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14433))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

409 Accesses

Abstract

Due to significant differences between foods from different regions, there is a relative lack of publicly available multi-food image datasets, which is particularly evident in the field of Chinese cuisine. To alleviate this issue, we create a large-scale Chinese food image dataset, JNU FoodNet, which contains 17,128 original images. Although a considerable amount of prior research has focused on single food image recognition, such methods are not suitable for recognizing multiple food items in one image. Moreover, in a food image, there are usually multiple food regions, and the key features of each region tend to gradually disperse from the center to the edges, leading to cumulative errors. To overcome this difficulty, we design a selective discriminative feature constrained module, SGC, which restricts model attention to regions from a global information perspective. Furthermore, we propose a progressive hierarchical network, MFNet, based on channel segmentation from both the whole image and local region perspectives, combined with the SGC branch. Experimental results show that MFNet achieves state-of-the-art mAP values on JNU FoodNet, UEC Food-100, and UEC Food-256.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alaeddine, H., Jihene, M.: Wide deep residual networks in networks. Multimedia Tools Appl. 82(5), 7889–7899 (2023)
Article Google Scholar
Deng, L., et al.: Mixed-dish recognition with contextual relation networks. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 112–120 (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Ege, T., Yanai, K.: Estimating food calories for multiple-dish food photos. In: 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), pp. 646–651. IEEE (2017)
Google Scholar
Gu, Y., Cai, L., Wang, J., Chen, Y., Zhu, P., Gao, M.: Chinese dish detection based on dish-yolov5. In: 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2022)
Google Scholar
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Jiang, L., Qiu, B., Liu, X., Huang, C., Lin, K.: Deepfood: food image analysis and dietary assessment via deep model. IEEE Access 8, 47477–47489 (2020)
Article Google Scholar
Jocher, G.: YOLOv5 by Ultralytics (2020). https://doi.org/10.5281/zenodo.3908559. https://github.com/ultralytics/yolov5
Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics (2023). https://github.com/ultralytics/ultralytics
Kawano, Y., Yanai, K.: FoodCam-256: a large-scale real-time mobile food recognition system employing high-dimensional features and compression of classifier weights. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 761–762 (2014)
Google Scholar
Kim, J., Lee, Y.K., Herr, P.M.: The impact of menu size on calorie estimation. Int. J. Hosp. Manag. 100, 103083 (2022)
Article Google Scholar
Li, C., et al.: Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)
Liu, Y., Shao, Z., Hoffmann, N.: Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv preprint arXiv:2112.05561 (2021)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Google Scholar
Mao, R., He, J., Shao, Z., Yarlagadda, S.K., Zhu, F.: Visual aware hierarchy based food recognition. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12665, pp. 571–598. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68821-9_47
Chapter Google Scholar
Matsuda, Y., Hoashi, H., Yanai, K.: Recognition of multiple-food images by detecting candidate regions. In: 2012 IEEE International Conference on Multimedia and Expo, pp. 25–30. IEEE (2012)
Google Scholar
Matsuda, Y., Yanai, K.: Multiple-food recognition considering co-occurrence employing manifold ranking. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 2017–2020. IEEE (2012)
Google Scholar
Metwalli, A.S., Shen, W., Wu, C.Q.: Food image recognition based on densely connected convolutional neural networks. In: 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 027–032. IEEE (2020)
Google Scholar
Min, W.Q., Liu, L.H., Liu, Y.X., Luo, M.J., Jiang, S.Q.: A survey on food image recognition. Chin. J. Comput. 45(3) (2022)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Sheng, G., Sun, S., Liu, C., Yang, Y.: Food recognition via an efficient neural network with transformer grouping. Int. J. Intell. Syst. 37(12), 11465–11481 (2022)
Article Google Scholar
Shimoda, W., Yanai, K.: CNN-based food image segmentation without pixel-wise annotation. In: Murino, V., Puppo, E., Sona, D., Cristani, M., Sansone, C. (eds.) ICIAP 2015. LNCS, vol. 9281, pp. 449–457. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23222-5_55
Chapter Google Scholar
Sun, J., Radecka, K., Zilic, Z.: Exploring better food detection via transfer learning. In: 2019 16th International Conference on Machine Vision Applications (MVA), pp. 1–6. IEEE (2019)
Google Scholar
Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Google Scholar
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
Google Scholar
Wang, J., Lv, P., Wang, H., Shi, C.: SAR-U-Net: squeeze-and-excitation block and atrous spatial pyramid pooling based residual U-Net for automatic liver segmentation in computed tomography. Comput. Methods Programs Biomed. 208, 106268 (2021)
Article Google Scholar
Yu, T., Li, X., Cai, Y., Sun, M., Li, P.: \(\rm {S}^{2}\)-MLPV2: improved spatial-shift MLP architecture for vision. arXiv preprint arXiv:2108.01072 (2021)
Zhang, H., et al.: Resnest: split-attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2736–2746 (2022)
Google Scholar
Zhang, Q.L., Yang, Y.B.: SA-Net: shuffle attention for deep convolutional neural networks. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2235–2239. IEEE (2021)
Google Scholar
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12993–13000 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Jiangnan University, Wuxi, 214122, China
Kelei Jin, Jing Chen & Tingting Song

Authors

Kelei Jin
View author publications
You can also search for this author in PubMed Google Scholar
Jing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tingting Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Chen .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, K., Chen, J., Song, T. (2024). MFNet: A Channel Segmentation-Based Hierarchical Network for Multi-food Recognition. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14433. Springer, Singapore. https://doi.org/10.1007/978-981-99-8546-3_2

Download citation

DOI: https://doi.org/10.1007/978-981-99-8546-3_2
Published: 26 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8545-6
Online ISBN: 978-981-99-8546-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MFNet: A Channel Segmentation-Based Hierarchical Network for Multi-food Recognition