Skip to main content
Log in

Multiple instance learning based deep CNN for image memorability prediction

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Image memorability is a recent topic in the domain of computer vision, which enables one to measure the degree at which images are memorable to human cognitive system. Initial research on image memorability shown that memorability is an inherent characteristic of an image, and humans are consistent in remembering images. Further, it is also demonstrated that memorability of an image can be determined using machine learning and computer vision techniques. In this paper, a novel deep learning based image memorability prediction model is proposed. The proposed model automatically learns and utilises multiple visual factors such as object semantics, visual emotions, and saliency to predict image memorability scores. In particular, the proposed model employs multiple instance learning framework to utilise emotion cues evoking from single global context and multiple local contexts of an image. An extensive set of experiments are being carried out on large-scale image memorability dataset LaMem. The experimental results show that the proposed model performs better than current state-of-the-art models by reaching a rank correlation of 0.67, which is close to human consistency (ρ = 0.68).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Anderson AK, Wais PE, Gabrieli JD (2006) Emotion enhances remembrance of neutral events past. Proc Natl Acad Sci 103(5):1599–1604

    Article  Google Scholar 

  2. Baveye Y, Cohendet R, Perreira Da Silva M, Le Callet P (2016) Deep learning for image memorability prediction: the emotional bias. In: Proceedings of the 2016 ACM on multimedia conference. ACM, pp 491–495

  3. Blackwell AF (1997) Correction: a picture is worth 84.1 words. In: Proceedings of the first ESP student workshop, pp 15–22

  4. Borkin MA, Vo AA, Bylinskii Z, Isola P, Sunkavalli S, Oliva A, Pfister H (2013) What makes a visualization memorable? IEEE Trans Vis Comput Graph 19(12):2306–2315

    Article  Google Scholar 

  5. Bradley MM, Greenwald MK, Petry MC, Lang PJ (1992) Remembering pictures: pleasure and arousal in memory. J Exp Psychol Learn Mem Cogn 18(2):379

    Article  Google Scholar 

  6. Brady TF, Konkle T, Alvarez GA, Oliva A (2008) Visual long-term memory has a massive storage capacity for object details. Proc Natl Acad Sci 105 (38):14325–14329

    Article  Google Scholar 

  7. Carbonneau MA, Cheplygina V, Granger E, Gagnon G (2017) Multiple instance learning: a survey of problem characteristics and applications. Pattern Recogn

  8. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 1. IEEE, pp 886–893

  9. Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1-2):31–71

    Article  Google Scholar 

  10. Dubey R, Peterson J, Khosla A, Yang MH, Ghanem B (2015) What makes an object memorable?. In: Proceedings of the IEEE international conference on computer vision, pp 1089–1097

  11. Everingham M, Winn J (2010) The pascal visual object classes challenge 2010 (voc2010) development kit

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  13. Hilbert M (2012) How much information is there in the “information society”? Significance 9(4):8–12

    Article  Google Scholar 

  14. Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 39–43

  15. Hunt RR, Worthen JB (2006) Distinctiveness and memory. Oxford University Press, London

    Book  Google Scholar 

  16. Isola P, Parikh D, Torralba A, Oliva A (2011) Understanding the intrinsic memorability of images. In: Advances in neural information processing systems, pp 2429–2437

  17. Isola P, Xiao J, Torralba A, Oliva A (2011) What makes an image memorable?. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 145–152

  18. Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 2106–2113

  19. Khosla A, Das Sarma A, Hamid R (2014) What makes an image popular?. In: Proceedings of the 23rd international conference on world wide web. ACM, pp 867–876

  20. Khosla A, Raju AS, Torralba A, Oliva A (2015) Understanding and predicting image memorability at a large scale. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 2390–2398

  21. Khosla A, Xiao J, Isola P, Torralba A, Oliva A (2012) Image memorability and visual inception. In: SIGGRAPH Asia 2012 technical briefs. ACM

  22. Khosla A, Xiao J, Torralba A, Oliva A (2012) Memorability of image regions. In: Advances in neural information processing systems, pp 296–304

  23. Konkle T, Brady TF, Alvarez GA, Oliva A (2010) Conceptual distinctiveness supports detailed visual long-term memory for real-world objects. J Exp Psychol Gen 139(3):558

    Article  Google Scholar 

  24. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  25. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, pp 2169–2178

  26. Li W, Duan L, Xu D, Tsang IWH (2011) Text-based image retrieval using progressive multi-instance learning. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 2049–2055

  27. Li Y, Hou X, Koch C, Rehg JM, Yuille AL (2014) The secrets of salient object segmentation. Georgia Institute of Technology

  28. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  29. Lu X, Lin Z, Shen X, Mech R, Wang J (2015) Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE international conference on computer vision, pp 990–998

  30. Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 83–92

  31. Mancas M, Le Meur O (2013) Memorability of natural scenes: The role of attention. In: 2013 20th IEEE international conference on image processing (ICIP). IEEE, pp 196–200

  32. Maqsood I, Khan MR, Abraham A (2004) An ensemble of neural networks for weather forecasting. Neural Comput Applic 13(2):112–122

    Article  Google Scholar 

  33. Maren S (1999) Long-term potentiation in the amygdala: a mechanism for emotional learning and memory. Trends Neurosci 22(12):561–567

    Article  Google Scholar 

  34. Murray N, Marchesotti L, Perronnin F (2012) Ava: a large-scale database for aesthetic visual analysis. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2408–2415

  35. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987

    Article  Google Scholar 

  36. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  Google Scholar 

  37. Pan J, Sayrol E, Giro-i Nieto X, McGuinness K, O’Connor NE (2016) Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 598–606

  38. Peng H, Li K, Li B, Ling H, Xiong W, Hu W (2015) Predicting image memorability by multi-view adaptive regression. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 1147–1150

  39. Perrone MP, Cooper LN (1995) When networks disagree: Ensemble methods for hybrid neural networks. In: How we learn; how we remember: Toward an understanding of brain and neural systems: Selected papers of Leon N Cooper. World scientific, pp 342–358

  40. Phelps EA (2004) Human emotion and memory: interactions of the amygdala and hippocampal complex. Curr Opin Neurobiol 14(2):198–202

    Article  Google Scholar 

  41. Pinheiro PO, Collobert R (2015) From image-level to pixel-level labeling with convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1713–1721

  42. Ramanathan S, Katti H, Sebe N, Kankanhalli M, Chua TS (2010) An eye fixation database for saliency detection in images. In: European conference on computer vision. Springer, pp 30–43

  43. Rao T, Xu M, Liu H, Wang J, Burnett I (2016) Multi-scale blocks based image emotion classification using multiple instance learning. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 634–638

  44. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  45. Rock I, Englestein P (1959) A study of memory for visual form. The American Journal of Psychology

  46. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  47. Saleh B, Farhadi A, Elgammal A (2013) Object-centric anomaly detection by attribute-based reasoning. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 787–794

  48. Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8

  49. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  50. Song HO, Lee YJ, Jegelka S, Darrell T (2014) Weakly-supervised discovery of visual pattern configurations. In: Advances in neural information processing systems, pp 1637–1645

  51. Standing L (1973) Learning 10000 pictures. Q J Exp Psychol 25(2):207–222

    Article  Google Scholar 

  52. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A, et al. (2015) Going deeper with convolutions. Cvpr

  53. Van De Weijer J, Schmid C, Verbeek J (2007) Learning color names from real-world images. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8

  54. Vijayanarasimhan S, Grauman K (2008) Keywords to visual categories: Multiple-instance learning forweakly supervised object categorization. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8

  55. Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3485–3492

  56. Xu Y, Mo T, Feng Q, Zhong P, Lai M, Eric I, Chang C (2014) Deep learning of feature representation with multiple instance learning for medical image analysis. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1626–1630

  57. Zhang C, Platt JC, Viola PA (2006) Multiple instance boosting for object detection. In: Advances in neural information processing systems, pp 1417–1424

  58. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sathisha Basavaraju.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Basavaraju, S., Sur, A. Multiple instance learning based deep CNN for image memorability prediction. Multimed Tools Appl 78, 35511–35535 (2019). https://doi.org/10.1007/s11042-019-08202-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08202-y

Keywords

Navigation