Skip to main content

Advertisement

Log in

Artificial intelligence snapchat: Visual conversation agent

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Visual conversation is a dialog in which parties exchange visual information. The key novelty presented in this paper is an artificial intelligence-driven visual conversation automation method. We will present a state of the art Artificial Intelligence Snapchat Visual Conversation Agent (AISVCA). AISVCA uses our proposed artificial intelligence-driven visual conversation automation method to create received image caption and generate an appropriate reasonable visual response. These functionalities are achieved by using a combination of Convolutional Neural Network (CNN), Long Short-Term Memory Neural Network (LSTM) and, Latent Semantic Indexing method (LSI). CNN and LSTM are used to create image captions and, LSI is used to assess the semantic similarity between captions generated from personalized image dataset, and captions that are extracted from the received image content. We will show that AISVCA, using the proposed method can generate a visual response that is basically indistinguishable from a human visual response. To evaluate the proposed approach, we measured the accuracy of the proposed system and, conducted a user study to test communication quality. In the user study, we analyzed source credibility and interpersonal attraction of the AISVCA. The user study results showed that there are no significant differences in communication quality between a visual conversation with AISVCA and visual conversation with the human agent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Agrawal A, Lu J, Antol S, Mitchell M, Zitnick C L, Parikh D, Batra D (2017) Vqa: Visual question answering. Int J Comput Vis 123(1):4–31

    Article  MathSciNet  Google Scholar 

  2. Chattopadhyay P, Yadav D, Prabhu V, Chandrasekaran A, Das A, Lee S, Batra D, Parikh D (2017) Evaluating visual conversational agents via cooperative human-ai games. arXiv:170805122

  3. Chen J, Dong W, Li M (2016) Image caption generator based on deep neural networks

  4. Das A, Kottur S, Gupta K, Singh A, Yadav D, Moura JM, Parikh D, Batra D (2017) Visual dialog. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 2

  5. Edwards C, Edwards A, Spence P, Shelton A (2014) Is that a bot running the social media feed? testing the differences in perceptions of communication quality for a human agent and a bot agent on twitter 33:372–376

  6. Edwards C, Edwards A, Spence P R, Shelton A K (2014) Is that a bot running the social media feed? testing the differences in perceptions of communication quality for a human agent and a bot agent on twitter. Comput Hum Behav 33:372–376

    Article  Google Scholar 

  7. Fang H, Gupta S, Iandola F, Srivastava RK, Deng L, Dollár P, Gao J, He X, Mitchell M, Platt JC et al (2015) From captions to visual concepts and back. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1473–1482

  8. Haas C, Wearden S T (2003) E-credibility: Building common ground in web environments. L1-Educational Studies in Language and Literature 3(1-2):169–184

    Article  Google Scholar 

  9. Hofmann T (2017) Probabilistic latent semantic indexing. In: ACM SIGIR forum, ACM, vol 51, pp 211–218

  10. Hosseini M H, Nahad R F (2012) Investigating antecedents and consequences of open university brand image. Int J Acad Res 4(4):953–960

    Google Scholar 

  11. Klassen A C, Creswell J, Clark V L P, Smith K C, Meissner H I (2012) Best practices in mixed methods for quality of life research. Qual Life Res 21(3):377–380

    Article  Google Scholar 

  12. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755

  13. Manning C D, Raghavan P, Schütze H (2008) Matrix decompositions and latent semantic indexing. Introduction to Information Retrieval pp 403–417

  14. McCroskey J C, McCain T A (1974) The measurement of interpersonal attraction. Speech Monographs 41 (3):261–266. https://doi.org/10.1080/03637757409375845

    Article  Google Scholar 

  15. McCroskey J C, Teven J J (1999) Goodwill: A reexamination of the construct and its measurement. Communications Monographs 66(1):90–103

    Article  Google Scholar 

  16. Mikolov T, Karafiát M, Burget L, Černockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: 11th annual conference of the international speech communication association

  17. Mostafazadeh N, Misra I, Devlin J, Mitchell M, He X, Vanderwende L (2016) Generating natural questions about an image. arXiv:160306059

  18. Ohanian R (1991) The impact of celebrity spokespersons’ perceived image on consumers’ intention to purchase. Journal of advertising Research

  19. Sharma S, Suhubdy D, Michalski V, Kahou SE, Bengio Y (2018) Chatpainter: Improving text to image generation using dialogue. arXiv:180208216

  20. Soh M (2016) Learning cnn-lstm architectures for image caption generation

  21. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: A neural image caption generator. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3156–3164

  22. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: A neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  23. Vinyals O, Toshev A, Bengio S, Erhan D (2017) Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE transactions on pattern analysis and machine intelligence 39(4):652–663

    Article  Google Scholar 

  24. Wagner K (2017) Snapchat is still bigger than instagram for younger u.s. millennials. https://www.recode.net/2017/8/24/16198632/snapchat-instagram-teens-comscore-study-growth-users

  25. Wagner K (2017) Snapchat is still the network of choice for u.s. teens - and instagram is facebook best shot at catching up. https://www.recode.net/2017/12/16/16783570/snapchat-instagram-teenagers-rbc-survey-favorite-app

  26. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

  27. Zhang H, Xu T, Li H, Zhang S, Huang X, Wang X, Metaxas D (2017) Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: IEEE Int. Conf. Comput. Vision (ICCV), pp 5907–5915

  28. Zhang Y, Jin R, Zhou Z H (2010) Understanding bag-of-words model: A statistical framework. Int J Mach Learn Cybern 1(1-4):43–52

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sasa Arsovski.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arsovski, S., Cheok, A.D., Govindarajoo, K. et al. Artificial intelligence snapchat: Visual conversation agent. Appl Intell 50, 2040–2049 (2020). https://doi.org/10.1007/s10489-019-01621-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01621-2

Keywords

Navigation