Creating personalized video summaries via semantic event detection

Fei, Mengjuan; Jiang, Wei; Mao, Weijie

doi:10.1007/s12652-018-0797-0

Creating personalized video summaries via semantic event detection

Original Research
Published: 20 April 2018

Volume 14, pages 14931–14942, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

377 Accesses
8 Citations
Explore all metrics

Abstract

Video summarization has great potential in many application areas that enable fast browsing and efficient video indexing. Viewers prefer to browse a video summary containing the contents that they enjoy since watching an entire video may be time-consuming. We believe that it is necessary to create an automated tool that is capable of generating personalized video summaries. In this paper, we propose a new event detection-based personalized video summarization framework and deploy it to create film and soccer video summaries. In order to obtain effective event detection performance, we introduce two transfer learning method. The first event detection method is achieved based on the combination of convolutional neural network and support vector machine (CNNs–SVM). The second method is achieved using a fine-tuned summarization network (SumNet) that fuses fine-tuned object and scene networks. In this study, the training data consists of two datasets: (1) a 21K set of web images of back hugging, hand shaking, and standing talking used to detect a film event, and (2) a 30K set of web soccer match images of goals, fouls, and yellow cards to detect soccer events. Given an original video, we first segment it into shots and then use the trained model for event detection. Finally, based on the specification of user preferences, we generate a personalized event-based summary. We test our framework with several film videos and soccer videos. Experimental results demonstrate that the proposed fine-tuned SumNet achieves the best performance of 96.88% and \(98.50\%\), which is effective for generating personalized video summaries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

SSD: Single Shot MultiBox Detector

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Akilan T, Wu QJ, Safaei A, Jiang W (2017) A late fusion approach for harnessing multi-cnn model high-level features. In: Systems, man, and cybernetics (SMC), 2017 IEEE International Conference on, pp 566–571. https://doi.org/10.1109/SMC.2017.8122666
Akilan T, Wu QJ, Yang Y (2018) Fusion-based foreground enhancement for background subtraction using multivariate multi-model gaussian distribution. Inform Sci 430:414–431. https://doi.org/10.1016/j.ins.2017.11.062
Amel AM, Abdessalem BA, Abdellatif M (2010) Video shot boundary detection using motion activity descriptor. J Telecommun 2(1):54–59
Google Scholar
Baber J, Afzulpurkar N, Dailey MN, Bakhtyar M (2011) Shot boundary detection from videos using entropy and local descriptor. In: Digital signal processing (DSP), 2011 17th International Conference on IEEE, pp 1–6. https://doi.org/10.1109/ICDSP.2011.6004918
Cernekova Z, Pitas I, Nikou C (2006) Information theory-based shot cut/fade detection and video summarization. IEEE Transactions on circuits and systems for video technology 16(1):82–91. https://doi.org/10.1109/TCSVT.2005.856896
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27. https://doi.org/10.1145/1961189.1961199
Cucchiara R, Grana C, Prati A, Vezzani R (2005) Probabilistic posture classification for human-behavior analysis. IEEE Trans Syst Man Cybern Part A: Syst Hum 35(1):42–54. https://doi.org/10.1109/TSMCA.2004.838501
Darabi K, Ghinea G (2014) Personalized video summarization by highest quality frames. In: Multimedia and Expo Workshops (ICMEW), 2014 IEEE International Conference, pp 1–6. https://doi.org/10.1109/ICMEW.2014.6890674
De Avila SEF, Lopes APB, da Luz Jr A, de Albuquerque Araújo A (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recog Lett 32(1):56–68. https://doi.org/10.1016/j.patrec.2010.08.004
Furini M, Geraci F, Montangero M, Pellegrini M (2010) Stimo: Still and moving video storyboard for the web scenario. Multimed Tools Appl 46(1):47–69. https://doi.org/10.1007/s11042-009-0307-7
Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In: European conference on computer vision, IEEE Workshop, pp 505–520. https://doi.org/10.1007/978-3-319-10584-0-33
Han B, Hamm J, Sim J (2011) Personalized video summarization with human in the loop. In: Applications of computer vision (WACV), 2011 IEEE Workshop, pp 51–57. https://doi.org/10.1109/BIOROB.2006.1639128
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Jiang RM, Sadka AH, Crookes D (2009) Advances in video summarization and skimming. In: Recent advances in multimedia signal processing and communications, pp 27–50. https://doi.org/10.1007/978-3-642-02900-4-2
Joho H, Staiano J, Sebe N, Jose JM (2011) Looking at the viewer: analysing facial activity to detect personal highlights of multimedia contents. Multimed Tools Appl 51(2):505–523. https://doi.org/10.1007/s11042-010-0632-x
Juang CF, Chang CM (2007) Human body posture classification by a neural fuzzy network and home care system application. IEEE Trans Syst Man Cybern Part A: Syst Hum 37(6):984–994. https://doi.org/10.1109/TSMCA.2007.897609
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp 1725–1732. https://doi.org/10.1109/CVPR.2014.223
Khosla A, Hamid R, Lin CJ, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2698–2705. https://doi.org/10.1109/CVPR.2013.348
Kim G, Sigal L, Xing EP (2014) Joint summarization of large-scale collections of web images and videos for storyline reconstruction. IEEE Conference on computer vision and pattern recognition (CVPR), pp 4225–4232. https://doi.org/10.1109/CVPR.2014.538
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems, vol 1, pp 1097–1105
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Computer vision and pattern recognition, 2008. IEEE Conference on CVPR 2008, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587756
Lee YJ, Grauman K (2015) Predicting important objects for egocentric video summarization. Int J Comput Vis 114(1):38–55. https://doi.org/10.1007/s11263-014-0794-5
Li Z, Tang J, Wang X, Liu J, Lu H (2016) Multimedia news summarization in search. ACM Trans Intell Syst Technol (TIST) 7(3):33. https://doi.org/10.1145/2822907
Liu Y, Xiao Y (2013) A robust image hashing algorithm resistant against geometrical attacks. Radio Eng 22(4):1072–1081
MathSciNet Google Scholar
Ma J, Wu F, Zhu J, Xu D, Kong D (2017) A pre-trained convolutional neural network based method for thyroid nodule diagnosis. Ultrasonics 73:221–230. https://doi.org/10.1016/j.ultras.2016.09.011
Miniakhmetova M, Zymbler M (2015) An approach to personalized video summarization based on user preferences analysis. In: Application of information and communication technologies (AICT), 2015 9th International Conference, pp 153–155. https://doi.org/10.1109/ICAICT.2015.7338536
Money AG, Agius H (2008) Video summarisation: A conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143. https://doi.org/10.1016/j.jvcir.2007.04.002
Pal SK, Leigh AB (1995) Motion frame analysis and scene abstraction: discrimination ability of fuzziness measures. J Intell Fuzzy Syst 3(3):247–256. https://doi.org/10.3233/IFS-1995-3306
Pont-Tuset J, Arbelaez P, Barron JT, Marques F, Malik J (2017) Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transa Pattern Anal Machine Intell 39(1):128–140. https://doi.org/10.1109/TPAMI.2016.2537320
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Article MathSciNet Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR). arXiv:1409.1556
Song X, Sun L, Lei J, Tao D, Yuan G, Song M (2016) Event-based large scale surveillance video summarization. Neurocomputing 187:66–74. https://doi.org/10.1016/j.neucom.2015.07.131
Sun C, Nevatia R (2013) Large-scale web video event classification by use of fisher vectors. In: Applications of Computer Vision (WACV), 2013 IEEE Workshop, pp 15–22. https://doi.org/10.1109/WACV.2013.6474994
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Xiong B, Grauman K (2014) Detecting snap points in egocentric video with a web photo prior. In: European conference on computer vision, pp 282–298. https://doi.org/10.1007/978-3-319-10602-1-19
Yoshitaka A, Sawada K (2012) Personalized video summarization based on behavior of viewer. In: Signal Image Technology and Internet Based Systems (SITIS), 2012 Eighth International Conference, pp 661–667. https://doi.org/10.1109/SITIS.2012.100
Zawbaa HM, El-Bendary N, Hassanien AE, Kim Th (2012) Event detection based approach for soccer video summarization using machine learning. Int J Multimed Ubiquitous Eng 7(2):63–80
Google Scholar
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, pp 818–833
Zhang H, Hu R, Song L (2011) A shot boundary detection method based on color feature. In: Computer science and network technology (ICCSNT), 2011 International Conference, vol 4, pp 2541–2544. https://doi.org/10.1109/ICCSNT.2011.6182487
Zhao B, Xing EP (2014) Quasi real-time summarization for consumer videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2513–2520. https://doi.org/10.1109/CVPR.2014.322
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017a) Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence. https://doi.org/10.1109/TPAMI.2017.2723009
Zhou Z, Wu QJ, Huang F, Sun X (2017b) Fast and accurate near-duplicate image elimination for visual sensor networks. Int J Distrib Sens Netw 13(2): https://doi.org/10.1177/1550147717694172
Zhou Z, Wu QJ, Yang CN, Sun X, Pan Z (2017c) Coverless image steganography using histograms of oriented gradients-based hashing algorithm. J Intern Technol 18(5):1177–1184. https://doi.org/10.6138/JIT.2017.18.5.20160815b

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61633019) and the Public Projects of Zhejiang Province, China (No. LGF18F030002).

Author information

Authors and Affiliations

Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310027, China
Mengjuan Fei, Wei Jiang & Weijie Mao

Authors

Mengjuan Fei
View author publications
You can also search for this author in PubMed Google Scholar
Wei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Weijie Mao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Jiang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fei, M., Jiang, W. & Mao, W. Creating personalized video summaries via semantic event detection. J Ambient Intell Human Comput 14, 14931–14942 (2023). https://doi.org/10.1007/s12652-018-0797-0

Download citation

Received: 16 November 2017
Accepted: 14 April 2018
Published: 20 April 2018
Issue Date: November 2023
DOI: https://doi.org/10.1007/s12652-018-0797-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Creating personalized video summaries via semantic event detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Video summarization using deep learning techniques: a detailed analysis and investigation

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Creating personalized video summaries via semantic event detection

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Video summarization using deep learning techniques: a detailed analysis and investigation

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation