Skip to main content

An Enhanced Intelligent Agent with Image Description Generation

  • Conference paper
  • First Online:
Intelligent Virtual Agents (IVA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10011))

Included in the following conference series:

Abstract

In this paper, we present an Embodied Conversational Agent (ECA) enriched with automatic image understanding, using vision data derived from state-of-the-art machine learning techniques for the advancement of autonomous interaction with the elderly or infirm. The agent is developed to conduct health and emotion well-being monitoring for the elderly. It is not only able to conduct question-answering via speech-based interaction, but also able to provide analysis of the user’s surroundings, company, emotional states, hazards and fall actions via visual data using deep learning techniques. The agent is accessible from a web browser and can be communicated with via voice means, with a webcam required for the visual analysis functionality. The system has been evaluated with diverse real-life images to prove its efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. De Vos, E.: Look at that doggy in my windows, on effects of anthropomorphism in human-agent interaction. Doctoral Thesis, Utrecht University (2002)

    Google Scholar 

  2. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  3. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks (2013)

    Google Scholar 

  4. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature heirarchies for accurate object detection and semantic segmentation. IEEE Transactions on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)

    Google Scholar 

  5. Girshick, R.: Fast R-CNN. In: Proceedings of ICCV 2015 (2015)

    Google Scholar 

  6. Facebook: Tornado (2011). http://www.tornadoweb.org/en/stable/

  7. Wallace, R.: The elements of AIML style. Alice AI Foundation (2003). https://files.ifi.uzh.ch/cl/hess/classes/seminare/chatbots/style.pdf

  8. Wallace, R.: Symbolic reductions in AIML (2000). http://www.alicebot.org/documentation/srai.html

  9. Shires, G., Wennborg, H.: Web speech API specification. W3C Community Final Specification Agreement (2012). https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

  10. Chatfield, K., Simonyan, K., Vedalsi, A., Zisserman, A.: Return of the devil in the details delving deep into convolutional neural nets. In: BMVC (2014)

    Google Scholar 

  11. Fei-Fei, L.: ImageNet: crowdsourcing, benchmarking and other cool things. CMU VASC Seminar (2010)

    Google Scholar 

  12. Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: International Conference on Computer Vision (ICCV) (2009)

    Google Scholar 

  13. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, pp. 487–495 (2014)

    Google Scholar 

  14. Jia, Y. et al.: Caffe: an open source convolutional architecture for fast feature embedding (2013). http://caffe.berkeleyvision.org/

  15. Neoh, S.C., Zhang, L., Mistry, K., Hossain, M.A., Lim, C.P., Aslam, N., Kinghorn, P.: Intelligent facial emotion recognition using a layered encoding cascade optimization model. Appl. Soft Comput. 34(2015), 72–93 (2015)

    Article  Google Scholar 

  16. Mistry, K., Zhang, L., Neoh, S.C., Lim, C.P., Fielding, B.: A micro-GA embedded PSO feature selection approach to intelligent facial emotion recognition. IEEE Trans. Cybern. PP(99), 1–14 (2016). ISSN 2168-2267

    Article  Google Scholar 

  17. Zhang, L., Mistry, K., Jiang, M., Neoh, S.C., Hossain, A.: Adaptive facial point detection and emotion recognition for a humanoid robot. Comput. Vis. Image Underst. 140, 93–114 (2015)

    Article  Google Scholar 

  18. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete expression dataset for action unit and emotion-specified expression. In: Proceedings of CVPR4HB (2010)

    Google Scholar 

  19. Lin, C.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of Workshop on Text Summarization Branches Out (2004)

    Google Scholar 

  20. Grubinger, M., Clough, P.D., Müller, H., Deselaers, T.: The IAPR benchmark: a new evaluation resource for visual information systems. In: International Conference on Language Resources and Evaluation (2006)

    Google Scholar 

  21. Lin, D., Fidler, S., Kong, C., Urtasun, R.: Generating multi-sentence natural language descriptions of indoor scenes. In: British Machine Vision Conference (BMVC) (2015)

    Google Scholar 

  22. Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  23. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  24. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

    Google Scholar 

  25. Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 2342–2350 (2015)

    Google Scholar 

  26. Neoh, S.C., Srisukkham, W., Zhang, L, Todryk, S., Greystoke, B., Lim, C.P., Hossain, A., Aslam, N.: An intelligent decision support system for Leukaemia diagnosis using microscopic blood images. Sci. Rep. 5(14938), 1–14 (2015)

    Google Scholar 

  27. Bourouis, A., Feham, M., Hossain, M.A., Zhang, L.: An intelligent mobile based decision support system for retinal disease diagnosis. Decis. Support Syst. 59, 341–350 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

We appreciate the funding support received from Higher Education Innovation Fund and RPPTV Ltd.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Fielding, B., Kinghorn, P., Mistry, K., Zhang, L. (2016). An Enhanced Intelligent Agent with Image Description Generation. In: Traum, D., Swartout, W., Khooshabeh, P., Kopp, S., Scherer, S., Leuski, A. (eds) Intelligent Virtual Agents. IVA 2016. Lecture Notes in Computer Science(), vol 10011. Springer, Cham. https://doi.org/10.1007/978-3-319-47665-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47665-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47664-3

  • Online ISBN: 978-3-319-47665-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics