Skip to main content
Log in

Multi-Task Learning for Food Identification and Analysis with Deep Convolutional Neural Networks

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

In this paper, we proposed a multi-task system that can identify dish types, food ingredients, and cooking methods from food images with deep convolutional neural networks. We built up a dataset of 360 classes of different foods with at least 500 images for each class. To reduce the noises of the data, which was collected from the Internet, outlier images were detected and eliminated through a one-class SVM trained with deep convolutional features. We simultaneously trained a dish identifier, a cooking method recognizer, and a multi-label ingredient detector. They share a few low-level layers in the deep network architecture. The proposed framework shows higher accuracy than traditional method with handcrafted features, and the cooking method recognizer and ingredient detector can be applied to dishes which are not included in the training dataset to provide reference information for users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Yang S, Chen M, Pomerleau D, Sukthankar R. Food recognition using statistics of pairwise local features. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2010, pp.2249-2256.

  2. Retna Swami M S S K, Karuppiah M. Optimal feature extraction using greedy approach for random image components and subspace approach in face recognition. Journal of Computer Science and Technology, 2013, 28(2): 322–328.

    Article  Google Scholar 

  3. Hall P, Cai H, Wu Q, Corradi T. Crossdepiction problem: Recognition and synthesis of photographs and artwork. Computational Visual Media, 2015, 1(2): 91–103.

    Article  Google Scholar 

  4. Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Conference on Neural Information Processing Systems (NIPS), December 2012, pp.1106-1114.

  5. Ghosh S, Laksana E, Scherer S, Morency L P. A multi-label convolutional neural network approach to cross-domain action unit detection. In Proc. IEEE Int. Conf. Affective Computing and Intelligent Interaction (ACII), May 2015, pp.609-615.

  6. Li S, Liu Z Q, Chan A. Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. International Journal of Computer Vision, 2015, 113(1): 19–36.

    Article  MathSciNet  Google Scholar 

  7. Chen M Y, Yang Y H, Ho C J, Wang S H, Liu S M, Chang E, Yeh C H, Ouhyoung M. Automatic Chinese food identification and quantity estimation. In Proc. SIGGRAPH Asia 2012 Technical Briefs, November 2012, pp.29:1–29:4.

  8. Kagaya H, Aizawa K, Ogawa M. Food detection and recognition using convolutional neural network. In Proc. the 22nd ACM International Conference on Multimedia, November 2014, pp.1085-1088.

  9. Kawano Y, Yanai K. Food image recognition with deep convolutional features. In Proc. the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, September 2014, pp.589-593.

  10. Chen M, Dhingra K, Wu W, Yang L, Sukthankar R, Yang J. PFID: Pittsburgh fastfood image dataset. In Proc. the 16th IEEE International Conference on Image Processing (ICIP), November 2009, pp.289-292.

  11. Hoashi H, Joutou T, Yanai K. Image recognition of 85 food categories by feature fusion. In Proc. IEEE International Symposium on Multimedia (ISM), December 2010, pp.296-301.

  12. Joutou T, Yanai K. A food image recognition system with multiple kernel learning. In Proc. the 16th IEEE International Conference on Image Processing (ICIP), November 2009, pp.285-288.

  13. Matsuda Y, Hoashi H, Yanai K. Recognition of multiplefood images by detecting candidate regions. In Proc. IEEE International Conference on Multimedia and Expo (ICME), July 2012, pp.25-30.

  14. Kawano Y, Yanai K. Real-time mobile food recognition system. In Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2013.

  15. Bosch M, Zhu F, Khanna N, Boushey C J, Delp E J. Combining global and local features for food identification in dietary assessment. In Proc. the 18th IEEE International Conference on Image Processing (ICIP), September 2011, pp.1789-1792.

  16. Maruyama T, Kawano Y, Yanai K. Realtime mobile recipe recommendation system using food ingredient recognition. In Proc. the 2nd ACM International Workshop on Interactive Multimedia on Mobile and Portable Devices, Oct. 29-Nov. 2, 2012, pp.27-34.

  17. Wang C, Huang K Q. VFM: Visual feedback model for robust object recognition. Journal of Computer Science and Technology, 2015, 30(2): 325–339.

    Article  Google Scholar 

  18. Yang X, Kim S, Xing E P. Heterogeneous multitask learning with joint sparsity constraints. In Proc. the 23rd Annual Conference on Neural Information Processing Systems (NIPS), December 2009, pp.2151-2159.

  19. Wang X, Fouhey D F, Gupta A. Designing deep networks for surface normal estimation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp.539-547.

  20. Amer M, Goldstein M, Abdennadher S. Enhancing oneclass support vector machines for unsupervised anomaly detection. In Proc. the ACM SIGKDD Workshop on Outlier Detection and Description, August 2013, pp.8-15.

  21. Nair V, Hinton G E. Rectified linear units improve restricted boltzmann machines. In Proc. the 27th International Conference on Machine Learning (ICML), June 2010, pp.807-814.

  22. Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088): 533–536.

    Article  Google Scholar 

  23. Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv: 1207.0580, 2012. http://arxiv.org/abs/1207.0580, Mar. 2016.

  24. Branson S, Beijbom O, Belongie S. Efficient large-scale structured learning. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013, pp.1806-1813.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Song-Hai Zhang.

Additional information

Special Section of CVM 2016

This work was supported by the National High Technology Research and Development 863 Program of China under Grant No. 2013AA013903, the National Natural Science Foundation of China under Grant No. 61373069, the Research Grant of Beijing Higher Institution Engineering Research Center, and the Tsinghua University Initiative Scientific Research Program.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, XJ., Lu, YF. & Zhang, SH. Multi-Task Learning for Food Identification and Analysis with Deep Convolutional Neural Networks. J. Comput. Sci. Technol. 31, 489–500 (2016). https://doi.org/10.1007/s11390-016-1642-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-016-1642-6

Keywords

Navigation