Abstract
Driven by Internet technology, the explosion of video content is growing exponentially. Hence, the need for video analysis and interpretation by human cognition is taken into account. Memorability is one of the cognitive measures that define the recall ability of watched visual content. Prior research works are done on image memorability, and video memorability modeling is least focused. Moreover, the computational features need to be robust to build a strong video memorability predictor model that is also not focused earlier. This paper proposes a robust, scalable and novel Stacked Bin- Convolutional Neural Network (SB_CNN) based Sparse Low-Rank Regressor (SLRR) model that classifies and predicts the interesting events of videos by performing a robust feature scaling process and intelligent memorability tasks. Inspired by the Low-Rank Representation (LRR) property in noise data handling, the proposed feature extraction process covered the low-rank issue on attributes such as objects, scenes, and annotated to the original feature space. The estimated sparse coefficient vector is fed into the Stacked Bin- Convolutional Neural Networks (SB-CNN) model that establishes the connections between video frames and their memorability scores by means of electing the best frames by following the objectives of Multi-Attribute Decision Making (MADM) technique. The selected frames are fed into the CNN classifier. With the estimated decision score, the proposed classifier has effectively predicted the events from relevant video with the better recall time 49. 9247. Experiments conducted on public datasets, SumMe and SUN database demonstrates the efficacy of the proposed technique.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Borkin MA, Vo AA, Bylinskii Z, Isola P, Sunkavalli S, Oliva A, Pfister H (2013) What makes a visualization memorable? IEEE Trans Visual Comput Grap 19(12):2306–2315
Borkin MA, Vo AA et al (2013) What makes a visualization memorable? IEEE Trans Visual Comput Grap 19(12):2306–2315
Borkin MA, Bylinskii Z et al (2016) Beyond memorability: visualization recognition and recall. IEEE Trans Visual Comput Grap 22(1):519–528
Bylinskii Z, Borkin M et al (2015) Eye Fixation Metrics for Large Scale Evaluation and Comparison of Information Visualizations. Eye Tracking and Visualization: Foundations, Techniques, and Applications, Springer International Publishing. Pp. 235-255
Bylinskii Z, Isola P, Bainbridge C, Torralba A, Oliva A (2015) Intrinsic and extrinsic effects on image memorability. Vision Research, vol. 116. pp. 165
Cao D, He X, Miao L, An Y, Yang C, Hong R (2018) Attentive group recommendation. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, 2018
Chen Y, Jalali A, Sanghavi S, Caramanis C (2011) Low-rank matrix recovery from errors and erasures. IEEE Trans Inform Theory 59(7):4324–4337
Cohendet R, Yadati K, Duong NQK, Demarty C-H (2018) Annotating, understanding, and predicting long-term video memorability. In: proceedings of the ACM international conference on multimedia retrieval (ICMR). pp. 11–14
Cohendet R et al (n.d.) VideoMem: Constructing, Analyzing, Predicting Short-Term and Long-Term Video Memorability. 2019 IEEE/ CVF International conference on computer vision (ICCV)
Fajtl J, Argyriou V, Monekosso D, Remagnino P (2018) Amnet: Memorability estimation with attention. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Pp. 6363–6372
Gygli M, Grabner H, Riemenschneider H, Van Gool L. (2014) Creating summaries from user videos. In European conference on computer vision (springer). pp. 505–520
Gygli M, Grabner H, Van Gool L. (2015) Video summarization by learning submodular mixtures of objectives. In: IEEE Conference on Computer Vision and Pattern Recognition
Han J, Chen C, Shao L, Hu X, Han J, Liu T (2015) Learning computational models of video memorability from fmri brain imaging. IEEE Trans Cybern 45(8):1692–1703
Isola P, Xiao J, Parikh D, Torralba A (2013) What makes a photograph memorable? IEEE Trans Patt Analy Machine Intel 36(7):1469–1482
Jing F, Lin L, Zhou S, Ma R (2021) Assessing the impact of street-view greenery on fear of neighborhood crime in Guangzhou, China. Int J Environ Res Public Health 8(1):311
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, FeiFei L, (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. pp. 1725–1732
Khosla A, Bainbridge WA, Torralba A, Oliva A (2013) Modifying the memorability of face photographs. In: Proceedings of IEEE International Conference on Computer Vision. pp. 3200–3207
Kim J, Yoon S, Pavlovic V (2013) Relative spatial features for image memorability. In: Proceedings of ACM International Conference on Multimedia. pp. 761–764
Kurzhals K, Raschke M, et al (2014) State-of-the-art of visualization for eye tracking data. In: Proceedings of EuroVis, 2014
Lee YJ, Grauman K (2015), Predicting important objects for egocentric video summarization, international journal of computer vision. Vol. 40. pp. 993–1005
Lu C, Feng J, Lin Z, Yan S,(2013) Correlation adaptive subspace segmentation by trace lasso. In: Proceedings of IEEE International Conference on Computer Vision. pp. 1345–1352
Mancas M, Meur OL (2013) Memorability of natural sscenes: the role of attention. In: 2013 20th IEEE International Conference on Image Processing (ICIP), 2013. pp. 196–200
Muhammad K ,Hussain T, WookBaik S , (2018) Efficient CNN based summarization of surveillance videos for resource-constrained devices. Vol .130 Pattern Recognition Letters. Vol.130. pp. 370–375
Shekhar S, Singal D, Singh H, Shetty A, Kedia M (2017) Show and Recall: Learning What Makes Videos Memorable. ICCV 2017 Workshop on Mutual Benefits of Cognitive and Computer Vision (MBCC).
SumMeDataset collected from (n.d.): https://gyglim.github.io/me/vsum/index.html
SUN Dataset collected from (n.d.): https://groups.csail.mit.edu/vision/SUN/hierarchy.html
Zhang N, Yang J (2013) Low-rank representation based discriminative projection for robust feature extraction. Neurocomputing 111(6):13–20
Zhang K, Chao WL, Sha F, Grauman K (2016) Summary transfer: Exemplar-based subset selection for video summarization In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1059–1067.
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems (NIPS). pp. 487–495
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ali, H., Gilani, S.O., Khan, M.J. et al. Stacked Bin Convolutional Neural Networks based Sparse Low-Rank Regressor: Robust, Scalable and Novel Model for Memorability Prediction of Videos. Multimed Tools Appl 82, 40799–40817 (2023). https://doi.org/10.1007/s11042-023-15128-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15128-z