Stacked Bin Convolutional Neural Networks based Sparse Low-Rank Regressor: Robust, Scalable and Novel Model for Memorability Prediction of Videos

Ali, Hasnain; Gilani, Syed Omer; Khan, Muhammad Jawad; Jamil, Mohsin; Khattak, Muazzam Khan

doi:10.1007/s11042-023-15128-z

Stacked Bin Convolutional Neural Networks based Sparse Low-Rank Regressor: Robust, Scalable and Novel Model for Memorability Prediction of Videos

Published: 01 April 2023

Volume 82, pages 40799–40817, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hasnain Ali ORCID: orcid.org/0000-0002-4530-5356¹,
Syed Omer Gilani²,
Muhammad Jawad Khan¹,
Mohsin Jamil³ &
…
Muazzam Khan Khattak⁴

185 Accesses
1 Altmetric
Explore all metrics

Abstract

Driven by Internet technology, the explosion of video content is growing exponentially. Hence, the need for video analysis and interpretation by human cognition is taken into account. Memorability is one of the cognitive measures that define the recall ability of watched visual content. Prior research works are done on image memorability, and video memorability modeling is least focused. Moreover, the computational features need to be robust to build a strong video memorability predictor model that is also not focused earlier. This paper proposes a robust, scalable and novel Stacked Bin- Convolutional Neural Network (SB_CNN) based Sparse Low-Rank Regressor (SLRR) model that classifies and predicts the interesting events of videos by performing a robust feature scaling process and intelligent memorability tasks. Inspired by the Low-Rank Representation (LRR) property in noise data handling, the proposed feature extraction process covered the low-rank issue on attributes such as objects, scenes, and annotated to the original feature space. The estimated sparse coefficient vector is fed into the Stacked Bin- Convolutional Neural Networks (SB-CNN) model that establishes the connections between video frames and their memorability scores by means of electing the best frames by following the objectives of Multi-Attribute Decision Making (MADM) technique. The selected frames are fed into the CNN classifier. With the estimated decision score, the proposed classifier has effectively predicted the events from relevant video with the better recall time 49. 9247. Experiments conducted on public datasets, SumMe and SUN database demonstrates the efficacy of the proposed technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Insights of Feature Fusion for Video Memorability Prediction

Compound Memory Networks for Few-Shot Video Classification

Complex event detection via attention-based video representation and classification

Article 10 August 2017

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Borkin MA, Vo AA, Bylinskii Z, Isola P, Sunkavalli S, Oliva A, Pfister H (2013) What makes a visualization memorable? IEEE Trans Visual Comput Grap 19(12):2306–2315
Article Google Scholar
Borkin MA, Vo AA et al (2013) What makes a visualization memorable? IEEE Trans Visual Comput Grap 19(12):2306–2315
Article Google Scholar
Borkin MA, Bylinskii Z et al (2016) Beyond memorability: visualization recognition and recall. IEEE Trans Visual Comput Grap 22(1):519–528
Article Google Scholar
Bylinskii Z, Borkin M et al (2015) Eye Fixation Metrics for Large Scale Evaluation and Comparison of Information Visualizations. Eye Tracking and Visualization: Foundations, Techniques, and Applications, Springer International Publishing. Pp. 235-255
Bylinskii Z, Isola P, Bainbridge C, Torralba A, Oliva A (2015) Intrinsic and extrinsic effects on image memorability. Vision Research, vol. 116. pp. 165
Cao D, He X, Miao L, An Y, Yang C, Hong R (2018) Attentive group recommendation. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, 2018
Chen Y, Jalali A, Sanghavi S, Caramanis C (2011) Low-rank matrix recovery from errors and erasures. IEEE Trans Inform Theory 59(7):4324–4337
Article Google Scholar
Cohendet R, Yadati K, Duong NQK, Demarty C-H (2018) Annotating, understanding, and predicting long-term video memorability. In: proceedings of the ACM international conference on multimedia retrieval (ICMR). pp. 11–14
Cohendet R et al (n.d.) VideoMem: Constructing, Analyzing, Predicting Short-Term and Long-Term Video Memorability. 2019 IEEE/ CVF International conference on computer vision (ICCV)
Fajtl J, Argyriou V, Monekosso D, Remagnino P (2018) Amnet: Memorability estimation with attention. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Pp. 6363–6372
Gygli M, Grabner H, Riemenschneider H, Van Gool L. (2014) Creating summaries from user videos. In European conference on computer vision (springer). pp. 505–520
Gygli M, Grabner H, Van Gool L. (2015) Video summarization by learning submodular mixtures of objectives. In: IEEE Conference on Computer Vision and Pattern Recognition
Han J, Chen C, Shao L, Hu X, Han J, Liu T (2015) Learning computational models of video memorability from fmri brain imaging. IEEE Trans Cybern 45(8):1692–1703
Article Google Scholar
Isola P, Xiao J, Parikh D, Torralba A (2013) What makes a photograph memorable? IEEE Trans Patt Analy Machine Intel 36(7):1469–1482
Article Google Scholar
Jing F, Lin L, Zhou S, Ma R (2021) Assessing the impact of street-view greenery on fear of neighborhood crime in Guangzhou, China. Int J Environ Res Public Health 8(1):311
Article Google Scholar
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, FeiFei L, (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. pp. 1725–1732
Khosla A, Bainbridge WA, Torralba A, Oliva A (2013) Modifying the memorability of face photographs. In: Proceedings of IEEE International Conference on Computer Vision. pp. 3200–3207
Kim J, Yoon S, Pavlovic V (2013) Relative spatial features for image memorability. In: Proceedings of ACM International Conference on Multimedia. pp. 761–764
Kurzhals K, Raschke M, et al (2014) State-of-the-art of visualization for eye tracking data. In: Proceedings of EuroVis, 2014
Lee YJ, Grauman K (2015), Predicting important objects for egocentric video summarization, international journal of computer vision. Vol. 40. pp. 993–1005
Lu C, Feng J, Lin Z, Yan S,(2013) Correlation adaptive subspace segmentation by trace lasso. In: Proceedings of IEEE International Conference on Computer Vision. pp. 1345–1352
Mancas M, Meur OL (2013) Memorability of natural sscenes: the role of attention. In: 2013 20th IEEE International Conference on Image Processing (ICIP), 2013. pp. 196–200
Muhammad K ,Hussain T, WookBaik S , (2018) Efficient CNN based summarization of surveillance videos for resource-constrained devices. Vol .130 Pattern Recognition Letters. Vol.130. pp. 370–375
Shekhar S, Singal D, Singh H, Shetty A, Kedia M (2017) Show and Recall: Learning What Makes Videos Memorable. ICCV 2017 Workshop on Mutual Benefits of Cognitive and Computer Vision (MBCC).
SumMeDataset collected from (n.d.): https://gyglim.github.io/me/vsum/index.html
SUN Dataset collected from (n.d.): https://groups.csail.mit.edu/vision/SUN/hierarchy.html
Zhang N, Yang J (2013) Low-rank representation based discriminative projection for robust feature extraction. Neurocomputing 111(6):13–20
Article Google Scholar
Zhang K, Chao WL, Sha F, Grauman K (2016) Summary transfer: Exemplar-based subset selection for video summarization In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1059–1067.
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems (NIPS). pp. 487–495

Download references

Author information

Authors and Affiliations

School of Mechanical & Manufacturing Engineering, National University of Sciences and Technology, Robotics & AI, Islamabad, Pakistan
Hasnain Ali & Muhammad Jawad Khan
School of Mechanical & Manufacturing Engineering, National University of Sciences and Technology, Biomedical Engineering and Sciences, Islamabad, Pakistan
Syed Omer Gilani
Memorial University of Newfoundland, Electrical & Computer Engineering, St. John’s, NL, Canada
Mohsin Jamil
Quaid e Azam University, Computing, Islamabad, Pakistan
Muazzam Khan Khattak

Authors

Hasnain Ali
View author publications
You can also search for this author in PubMed Google Scholar
Syed Omer Gilani
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Jawad Khan
View author publications
You can also search for this author in PubMed Google Scholar
Mohsin Jamil
View author publications
You can also search for this author in PubMed Google Scholar
Muazzam Khan Khattak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hasnain Ali.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ali, H., Gilani, S.O., Khan, M.J. et al. Stacked Bin Convolutional Neural Networks based Sparse Low-Rank Regressor: Robust, Scalable and Novel Model for Memorability Prediction of Videos. Multimed Tools Appl 82, 40799–40817 (2023). https://doi.org/10.1007/s11042-023-15128-z

Download citation

Received: 20 May 2022
Revised: 22 September 2022
Accepted: 13 March 2023
Published: 01 April 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11042-023-15128-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stacked Bin Convolutional Neural Networks based Sparse Low-Rank Regressor: Robust, Scalable and Novel Model for Memorability Prediction of Videos

Abstract

Access this article

Similar content being viewed by others

Insights of Feature Fusion for Video Memorability Prediction

Compound Memory Networks for Few-Shot Video Classification

Complex event detection via attention-based video representation and classification

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stacked Bin Convolutional Neural Networks based Sparse Low-Rank Regressor: Robust, Scalable and Novel Model for Memorability Prediction of Videos

Abstract

Access this article

Similar content being viewed by others

Insights of Feature Fusion for Video Memorability Prediction

Compound Memory Networks for Few-Shot Video Classification

Complex event detection via attention-based video representation and classification

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation