Multiple instance learning based deep CNN for image memorability prediction

Basavaraju, Sathisha; Sur, Arijit

doi:10.1007/s11042-019-08202-y

Multiple instance learning based deep CNN for image memorability prediction

Published: 12 October 2019

Volume 78, pages 35511–35535, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

387 Accesses
6 Citations
Explore all metrics

Abstract

Image memorability is a recent topic in the domain of computer vision, which enables one to measure the degree at which images are memorable to human cognitive system. Initial research on image memorability shown that memorability is an inherent characteristic of an image, and humans are consistent in remembering images. Further, it is also demonstrated that memorability of an image can be determined using machine learning and computer vision techniques. In this paper, a novel deep learning based image memorability prediction model is proposed. The proposed model automatically learns and utilises multiple visual factors such as object semantics, visual emotions, and saliency to predict image memorability scores. In particular, the proposed model employs multiple instance learning framework to utilise emotion cues evoking from single global context and multiple local contexts of an image. An extensive set of experiments are being carried out on large-scale image memorability dataset LaMem. The experimental results show that the proposed model performs better than current state-of-the-art models by reaching a rank correlation of 0.67, which is close to human consistency (ρ = 0.68).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Embracing New Techniques in Deep Learning for Estimating Image Memorability

Article 11 April 2022

Investigation on the Influence of Visual Attention on Image Memorability

Image Memorability Using Diverse Visual Features and Soft Attention

References

Anderson AK, Wais PE, Gabrieli JD (2006) Emotion enhances remembrance of neutral events past. Proc Natl Acad Sci 103(5):1599–1604
Article Google Scholar
Baveye Y, Cohendet R, Perreira Da Silva M, Le Callet P (2016) Deep learning for image memorability prediction: the emotional bias. In: Proceedings of the 2016 ACM on multimedia conference. ACM, pp 491–495
Blackwell AF (1997) Correction: a picture is worth 84.1 words. In: Proceedings of the first ESP student workshop, pp 15–22
Borkin MA, Vo AA, Bylinskii Z, Isola P, Sunkavalli S, Oliva A, Pfister H (2013) What makes a visualization memorable? IEEE Trans Vis Comput Graph 19(12):2306–2315
Article Google Scholar
Bradley MM, Greenwald MK, Petry MC, Lang PJ (1992) Remembering pictures: pleasure and arousal in memory. J Exp Psychol Learn Mem Cogn 18(2):379
Article Google Scholar
Brady TF, Konkle T, Alvarez GA, Oliva A (2008) Visual long-term memory has a massive storage capacity for object details. Proc Natl Acad Sci 105 (38):14325–14329
Article Google Scholar
Carbonneau MA, Cheplygina V, Granger E, Gagnon G (2017) Multiple instance learning: a survey of problem characteristics and applications. Pattern Recogn
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 1. IEEE, pp 886–893
Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1-2):31–71
Article Google Scholar
Dubey R, Peterson J, Khosla A, Yang MH, Ghanem B (2015) What makes an object memorable?. In: Proceedings of the IEEE international conference on computer vision, pp 1089–1097
Everingham M, Winn J (2010) The pascal visual object classes challenge 2010 (voc2010) development kit
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hilbert M (2012) How much information is there in the “information society”? Significance 9(4):8–12
Article Google Scholar
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 39–43
Hunt RR, Worthen JB (2006) Distinctiveness and memory. Oxford University Press, London
Book Google Scholar
Isola P, Parikh D, Torralba A, Oliva A (2011) Understanding the intrinsic memorability of images. In: Advances in neural information processing systems, pp 2429–2437
Isola P, Xiao J, Torralba A, Oliva A (2011) What makes an image memorable?. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 145–152
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 2106–2113
Khosla A, Das Sarma A, Hamid R (2014) What makes an image popular?. In: Proceedings of the 23rd international conference on world wide web. ACM, pp 867–876
Khosla A, Raju AS, Torralba A, Oliva A (2015) Understanding and predicting image memorability at a large scale. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 2390–2398
Khosla A, Xiao J, Isola P, Torralba A, Oliva A (2012) Image memorability and visual inception. In: SIGGRAPH Asia 2012 technical briefs. ACM
Khosla A, Xiao J, Torralba A, Oliva A (2012) Memorability of image regions. In: Advances in neural information processing systems, pp 296–304
Konkle T, Brady TF, Alvarez GA, Oliva A (2010) Conceptual distinctiveness supports detailed visual long-term memory for real-world objects. J Exp Psychol Gen 139(3):558
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, pp 2169–2178
Li W, Duan L, Xu D, Tsang IWH (2011) Text-based image retrieval using progressive multi-instance learning. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 2049–2055
Li Y, Hou X, Koch C, Rehg JM, Yuille AL (2014) The secrets of salient object segmentation. Georgia Institute of Technology
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lu X, Lin Z, Shen X, Mech R, Wang J (2015) Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE international conference on computer vision, pp 990–998
Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 83–92
Mancas M, Le Meur O (2013) Memorability of natural scenes: The role of attention. In: 2013 20th IEEE international conference on image processing (ICIP). IEEE, pp 196–200
Maqsood I, Khan MR, Abraham A (2004) An ensemble of neural networks for weather forecasting. Neural Comput Applic 13(2):112–122
Article Google Scholar
Maren S (1999) Long-term potentiation in the amygdala: a mechanism for emotional learning and memory. Trends Neurosci 22(12):561–567
Article Google Scholar
Murray N, Marchesotti L, Perronnin F (2012) Ava: a large-scale database for aesthetic visual analysis. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2408–2415
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Article Google Scholar
Pan J, Sayrol E, Giro-i Nieto X, McGuinness K, O’Connor NE (2016) Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 598–606
Peng H, Li K, Li B, Ling H, Xiong W, Hu W (2015) Predicting image memorability by multi-view adaptive regression. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 1147–1150
Perrone MP, Cooper LN (1995) When networks disagree: Ensemble methods for hybrid neural networks. In: How we learn; how we remember: Toward an understanding of brain and neural systems: Selected papers of Leon N Cooper. World scientific, pp 342–358
Phelps EA (2004) Human emotion and memory: interactions of the amygdala and hippocampal complex. Curr Opin Neurobiol 14(2):198–202
Article Google Scholar
Pinheiro PO, Collobert R (2015) From image-level to pixel-level labeling with convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1713–1721
Ramanathan S, Katti H, Sebe N, Kankanhalli M, Chua TS (2010) An eye fixation database for saliency detection in images. In: European conference on computer vision. Springer, pp 30–43
Rao T, Xu M, Liu H, Wang J, Burnett I (2016) Multi-scale blocks based image emotion classification using multiple instance learning. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 634–638
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Rock I, Englestein P (1959) A study of memory for visual form. The American Journal of Psychology
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Saleh B, Farhadi A, Elgammal A (2013) Object-centric anomaly detection by attribute-based reasoning. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 787–794
Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Song HO, Lee YJ, Jegelka S, Darrell T (2014) Weakly-supervised discovery of visual pattern configurations. In: Advances in neural information processing systems, pp 1637–1645
Standing L (1973) Learning 10000 pictures. Q J Exp Psychol 25(2):207–222
Article Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A, et al. (2015) Going deeper with convolutions. Cvpr
Van De Weijer J, Schmid C, Verbeek J (2007) Learning color names from real-world images. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8
Vijayanarasimhan S, Grauman K (2008) Keywords to visual categories: Multiple-instance learning forweakly supervised object categorization. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3485–3492
Xu Y, Mo T, Feng Q, Zhong P, Lai M, Eric I, Chang C (2014) Deep learning of feature representation with multiple instance learning for medical image analysis. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1626–1630
Zhang C, Platt JC, Viola PA (2006) Multiple instance boosting for object detection. In: Advances in neural information processing systems, pp 1417–1424
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495

Download references

Author information

Authors and Affiliations

Department of CSE, Indian Institute of Technology Guwahati, Guwahati, India
Sathisha Basavaraju & Arijit Sur

Authors

Sathisha Basavaraju
View author publications
You can also search for this author in PubMed Google Scholar
Arijit Sur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sathisha Basavaraju.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Basavaraju, S., Sur, A. Multiple instance learning based deep CNN for image memorability prediction. Multimed Tools Appl 78, 35511–35535 (2019). https://doi.org/10.1007/s11042-019-08202-y

Download citation

Received: 12 December 2018
Revised: 08 July 2019
Accepted: 08 September 2019
Published: 12 October 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11042-019-08202-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple instance learning based deep CNN for image memorability prediction

Abstract

Access this article

Similar content being viewed by others

Embracing New Techniques in Deep Learning for Estimating Image Memorability

Investigation on the Influence of Visual Attention on Image Memorability

Image Memorability Using Diverse Visual Features and Soft Attention

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple instance learning based deep CNN for image memorability prediction

Abstract

Access this article

Similar content being viewed by others

Embracing New Techniques in Deep Learning for Estimating Image Memorability

Investigation on the Influence of Visual Attention on Image Memorability

Image Memorability Using Diverse Visual Features and Soft Attention

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation