Skip to main content
Log in

Stacked Bin Convolutional Neural Networks based Sparse Low-Rank Regressor: Robust, Scalable and Novel Model for Memorability Prediction of Videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Driven by Internet technology, the explosion of video content is growing exponentially. Hence, the need for video analysis and interpretation by human cognition is taken into account. Memorability is one of the cognitive measures that define the recall ability of watched visual content. Prior research works are done on image memorability, and video memorability modeling is least focused. Moreover, the computational features need to be robust to build a strong video memorability predictor model that is also not focused earlier. This paper proposes a robust, scalable and novel Stacked Bin- Convolutional Neural Network (SB_CNN) based Sparse Low-Rank Regressor (SLRR) model that classifies and predicts the interesting events of videos by performing a robust feature scaling process and intelligent memorability tasks. Inspired by the Low-Rank Representation (LRR) property in noise data handling, the proposed feature extraction process covered the low-rank issue on attributes such as objects, scenes, and annotated to the original feature space. The estimated sparse coefficient vector is fed into the Stacked Bin- Convolutional Neural Networks (SB-CNN) model that establishes the connections between video frames and their memorability scores by means of electing the best frames by following the objectives of Multi-Attribute Decision Making (MADM) technique. The selected frames are fed into the CNN classifier. With the estimated decision score, the proposed classifier has effectively predicted the events from relevant video with the better recall time 49. 9247. Experiments conducted on public datasets, SumMe and SUN database demonstrates the efficacy of the proposed technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. Borkin MA, Vo AA, Bylinskii Z, Isola P, Sunkavalli S, Oliva A, Pfister H (2013) What makes a visualization memorable? IEEE Trans Visual Comput Grap 19(12):2306–2315

    Article  Google Scholar 

  2. Borkin MA, Vo AA et al (2013) What makes a visualization memorable? IEEE Trans Visual Comput Grap 19(12):2306–2315

    Article  Google Scholar 

  3. Borkin MA, Bylinskii Z et al (2016) Beyond memorability: visualization recognition and recall. IEEE Trans Visual Comput Grap 22(1):519–528

    Article  Google Scholar 

  4. Bylinskii Z, Borkin M et al (2015) Eye Fixation Metrics for Large Scale Evaluation and Comparison of Information Visualizations. Eye Tracking and Visualization: Foundations, Techniques, and Applications, Springer International Publishing. Pp. 235-255

  5. Bylinskii Z, Isola P, Bainbridge C, Torralba A, Oliva A (2015) Intrinsic and extrinsic effects on image memorability. Vision Research, vol. 116. pp. 165

  6. Cao D, He X, Miao L, An Y, Yang C, Hong R (2018) Attentive group recommendation. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, 2018

  7. Chen Y, Jalali A, Sanghavi S, Caramanis C (2011) Low-rank matrix recovery from errors and erasures. IEEE Trans Inform Theory 59(7):4324–4337

    Article  Google Scholar 

  8. Cohendet R, Yadati K, Duong NQK, Demarty C-H (2018) Annotating, understanding, and predicting long-term video memorability. In: proceedings of the ACM international conference on multimedia retrieval (ICMR). pp. 11–14

  9. Cohendet R et al (n.d.) VideoMem: Constructing, Analyzing, Predicting Short-Term and Long-Term Video Memorability. 2019 IEEE/ CVF International conference on computer vision (ICCV)

  10. Fajtl J, Argyriou V, Monekosso D, Remagnino P (2018) Amnet: Memorability estimation with attention. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Pp. 6363–6372

  11. Gygli M, Grabner H, Riemenschneider H, Van Gool L. (2014) Creating summaries from user videos. In European conference on computer vision (springer). pp. 505–520

  12. Gygli M, Grabner H, Van Gool L. (2015) Video summarization by learning submodular mixtures of objectives. In: IEEE Conference on Computer Vision and Pattern Recognition

  13. Han J, Chen C, Shao L, Hu X, Han J, Liu T (2015) Learning computational models of video memorability from fmri brain imaging. IEEE Trans Cybern 45(8):1692–1703

    Article  Google Scholar 

  14. Isola P, Xiao J, Parikh D, Torralba A (2013) What makes a photograph memorable? IEEE Trans Patt Analy Machine Intel 36(7):1469–1482

    Article  Google Scholar 

  15. Jing F, Lin L, Zhou S, Ma R (2021) Assessing the impact of street-view greenery on fear of neighborhood crime in Guangzhou, China. Int J Environ Res Public Health 8(1):311

    Article  Google Scholar 

  16. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, FeiFei L, (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. pp. 1725–1732

  17. Khosla A, Bainbridge WA, Torralba A, Oliva A (2013) Modifying the memorability of face photographs. In: Proceedings of IEEE International Conference on Computer Vision. pp. 3200–3207

  18. Kim J, Yoon S, Pavlovic V (2013) Relative spatial features for image memorability. In: Proceedings of ACM International Conference on Multimedia. pp. 761–764

  19. Kurzhals K, Raschke M, et al (2014) State-of-the-art of visualization for eye tracking data. In: Proceedings of EuroVis, 2014

  20. Lee YJ, Grauman K (2015), Predicting important objects for egocentric video summarization, international journal of computer vision. Vol. 40. pp. 993–1005

  21. Lu C, Feng J, Lin Z, Yan S,(2013) Correlation adaptive subspace segmentation by trace lasso. In: Proceedings of IEEE International Conference on Computer Vision. pp. 1345–1352

  22. Mancas M, Meur OL (2013) Memorability of natural sscenes: the role of attention. In: 2013 20th IEEE International Conference on Image Processing (ICIP), 2013. pp. 196–200

  23. Muhammad K ,Hussain T, WookBaik S , (2018) Efficient CNN based summarization of surveillance videos for resource-constrained devices. Vol .130 Pattern Recognition Letters. Vol.130. pp. 370–375

  24. Shekhar S, Singal D, Singh H, Shetty A, Kedia M (2017) Show and Recall: Learning What Makes Videos Memorable. ICCV 2017 Workshop on Mutual Benefits of Cognitive and Computer Vision (MBCC).

  25. SumMeDataset collected from (n.d.): https://gyglim.github.io/me/vsum/index.html

  26. SUN Dataset collected from (n.d.): https://groups.csail.mit.edu/vision/SUN/hierarchy.html

  27. Zhang N, Yang J (2013) Low-rank representation based discriminative projection for robust feature extraction. Neurocomputing 111(6):13–20

    Article  Google Scholar 

  28. Zhang K, Chao WL, Sha F, Grauman K (2016) Summary transfer: Exemplar-based subset selection for video summarization In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1059–1067.

  29. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems (NIPS). pp. 487–495

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hasnain Ali.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, H., Gilani, S.O., Khan, M.J. et al. Stacked Bin Convolutional Neural Networks based Sparse Low-Rank Regressor: Robust, Scalable and Novel Model for Memorability Prediction of Videos. Multimed Tools Appl 82, 40799–40817 (2023). https://doi.org/10.1007/s11042-023-15128-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15128-z

Keywords

Navigation