Abstract
Image memorability is a recent topic in the domain of computer vision, which enables one to measure the degree at which images are memorable to human cognitive system. Initial research on image memorability shown that memorability is an inherent characteristic of an image, and humans are consistent in remembering images. Further, it is also demonstrated that memorability of an image can be determined using machine learning and computer vision techniques. In this paper, a novel deep learning based image memorability prediction model is proposed. The proposed model automatically learns and utilises multiple visual factors such as object semantics, visual emotions, and saliency to predict image memorability scores. In particular, the proposed model employs multiple instance learning framework to utilise emotion cues evoking from single global context and multiple local contexts of an image. An extensive set of experiments are being carried out on large-scale image memorability dataset LaMem. The experimental results show that the proposed model performs better than current state-of-the-art models by reaching a rank correlation of 0.67, which is close to human consistency (ρ = 0.68).
Similar content being viewed by others
References
Anderson AK, Wais PE, Gabrieli JD (2006) Emotion enhances remembrance of neutral events past. Proc Natl Acad Sci 103(5):1599–1604
Baveye Y, Cohendet R, Perreira Da Silva M, Le Callet P (2016) Deep learning for image memorability prediction: the emotional bias. In: Proceedings of the 2016 ACM on multimedia conference. ACM, pp 491–495
Blackwell AF (1997) Correction: a picture is worth 84.1 words. In: Proceedings of the first ESP student workshop, pp 15–22
Borkin MA, Vo AA, Bylinskii Z, Isola P, Sunkavalli S, Oliva A, Pfister H (2013) What makes a visualization memorable? IEEE Trans Vis Comput Graph 19(12):2306–2315
Bradley MM, Greenwald MK, Petry MC, Lang PJ (1992) Remembering pictures: pleasure and arousal in memory. J Exp Psychol Learn Mem Cogn 18(2):379
Brady TF, Konkle T, Alvarez GA, Oliva A (2008) Visual long-term memory has a massive storage capacity for object details. Proc Natl Acad Sci 105 (38):14325–14329
Carbonneau MA, Cheplygina V, Granger E, Gagnon G (2017) Multiple instance learning: a survey of problem characteristics and applications. Pattern Recogn
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 1. IEEE, pp 886–893
Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1-2):31–71
Dubey R, Peterson J, Khosla A, Yang MH, Ghanem B (2015) What makes an object memorable?. In: Proceedings of the IEEE international conference on computer vision, pp 1089–1097
Everingham M, Winn J (2010) The pascal visual object classes challenge 2010 (voc2010) development kit
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hilbert M (2012) How much information is there in the “information society”? Significance 9(4):8–12
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 39–43
Hunt RR, Worthen JB (2006) Distinctiveness and memory. Oxford University Press, London
Isola P, Parikh D, Torralba A, Oliva A (2011) Understanding the intrinsic memorability of images. In: Advances in neural information processing systems, pp 2429–2437
Isola P, Xiao J, Torralba A, Oliva A (2011) What makes an image memorable?. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 145–152
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 2106–2113
Khosla A, Das Sarma A, Hamid R (2014) What makes an image popular?. In: Proceedings of the 23rd international conference on world wide web. ACM, pp 867–876
Khosla A, Raju AS, Torralba A, Oliva A (2015) Understanding and predicting image memorability at a large scale. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 2390–2398
Khosla A, Xiao J, Isola P, Torralba A, Oliva A (2012) Image memorability and visual inception. In: SIGGRAPH Asia 2012 technical briefs. ACM
Khosla A, Xiao J, Torralba A, Oliva A (2012) Memorability of image regions. In: Advances in neural information processing systems, pp 296–304
Konkle T, Brady TF, Alvarez GA, Oliva A (2010) Conceptual distinctiveness supports detailed visual long-term memory for real-world objects. J Exp Psychol Gen 139(3):558
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, pp 2169–2178
Li W, Duan L, Xu D, Tsang IWH (2011) Text-based image retrieval using progressive multi-instance learning. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 2049–2055
Li Y, Hou X, Koch C, Rehg JM, Yuille AL (2014) The secrets of salient object segmentation. Georgia Institute of Technology
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lu X, Lin Z, Shen X, Mech R, Wang J (2015) Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE international conference on computer vision, pp 990–998
Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 83–92
Mancas M, Le Meur O (2013) Memorability of natural scenes: The role of attention. In: 2013 20th IEEE international conference on image processing (ICIP). IEEE, pp 196–200
Maqsood I, Khan MR, Abraham A (2004) An ensemble of neural networks for weather forecasting. Neural Comput Applic 13(2):112–122
Maren S (1999) Long-term potentiation in the amygdala: a mechanism for emotional learning and memory. Trends Neurosci 22(12):561–567
Murray N, Marchesotti L, Perronnin F (2012) Ava: a large-scale database for aesthetic visual analysis. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2408–2415
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Pan J, Sayrol E, Giro-i Nieto X, McGuinness K, O’Connor NE (2016) Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 598–606
Peng H, Li K, Li B, Ling H, Xiong W, Hu W (2015) Predicting image memorability by multi-view adaptive regression. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 1147–1150
Perrone MP, Cooper LN (1995) When networks disagree: Ensemble methods for hybrid neural networks. In: How we learn; how we remember: Toward an understanding of brain and neural systems: Selected papers of Leon N Cooper. World scientific, pp 342–358
Phelps EA (2004) Human emotion and memory: interactions of the amygdala and hippocampal complex. Curr Opin Neurobiol 14(2):198–202
Pinheiro PO, Collobert R (2015) From image-level to pixel-level labeling with convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1713–1721
Ramanathan S, Katti H, Sebe N, Kankanhalli M, Chua TS (2010) An eye fixation database for saliency detection in images. In: European conference on computer vision. Springer, pp 30–43
Rao T, Xu M, Liu H, Wang J, Burnett I (2016) Multi-scale blocks based image emotion classification using multiple instance learning. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 634–638
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Rock I, Englestein P (1959) A study of memory for visual form. The American Journal of Psychology
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Saleh B, Farhadi A, Elgammal A (2013) Object-centric anomaly detection by attribute-based reasoning. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 787–794
Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Song HO, Lee YJ, Jegelka S, Darrell T (2014) Weakly-supervised discovery of visual pattern configurations. In: Advances in neural information processing systems, pp 1637–1645
Standing L (1973) Learning 10000 pictures. Q J Exp Psychol 25(2):207–222
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A, et al. (2015) Going deeper with convolutions. Cvpr
Van De Weijer J, Schmid C, Verbeek J (2007) Learning color names from real-world images. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8
Vijayanarasimhan S, Grauman K (2008) Keywords to visual categories: Multiple-instance learning forweakly supervised object categorization. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3485–3492
Xu Y, Mo T, Feng Q, Zhong P, Lai M, Eric I, Chang C (2014) Deep learning of feature representation with multiple instance learning for medical image analysis. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1626–1630
Zhang C, Platt JC, Viola PA (2006) Multiple instance boosting for object detection. In: Advances in neural information processing systems, pp 1417–1424
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Basavaraju, S., Sur, A. Multiple instance learning based deep CNN for image memorability prediction. Multimed Tools Appl 78, 35511–35535 (2019). https://doi.org/10.1007/s11042-019-08202-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08202-y