Real-time automated video highlight generation with dual-stream hierarchical growing self-organizing maps

Gunawardena, Pawara; Amila, Oshada; Sudarshana, Heshan; Nawaratne, Rashmika; Luhach, Ashish Kr.; Alahakoon, Damminda; Perera, Amal Shehan; Chitraranjan, Charith; Chilamkurti, Naveen; De Silva, Daswin

doi:10.1007/s11554-020-00957-0

Real-time automated video highlight generation with dual-stream hierarchical growing self-organizing maps

Special Issue Paper
Published: 18 March 2020

Volume 18, pages 1457–1475, (2021)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Pawara Gunawardena¹,
Oshada Amila¹,
Heshan Sudarshana¹,
Rashmika Nawaratne²,
Ashish Kr. Luhach³,
Damminda Alahakoon²,
Amal Shehan Perera¹,
Charith Chitraranjan¹,
Naveen Chilamkurti² &
…
Daswin De Silva²

593 Accesses
14 Citations
3 Altmetric
Explore all metrics

Abstract

Video has rapidly become one of the most common sources of visual information transfer. The number of videos uploaded to YouTube in a single day is estimated to take over 82 years to watch. Automated tools and techniques for analyzing and understanding video content, thus, have become an essential requirement. This paper addresses the problem of video highlight generation for large video files. We propose a novel skimming-based unsupervised video highlight generation method utilizing statistical image processing and data clustering, which process frame-level static and dynamic features of input video in two streams. The dynamic feature stream is represented by computing a dense optical flow for each consecutive frame, providing instantaneous velocity information for every pixel, which is then characterized by a per-frame orientation histogram, weighted by the norm, with orientations quantized. To process multi-scene videos, we utilize the divisive hierarchical clustering capability of growing self-organizing map (GSOM) using a dual-step top-down hierarchical approach in which the first level consists of clustering of spatial and temporal features of the video and in the second level, each parent cluster is hierarchically subdivided into child clusters using GSOM. The video highlight generation process is conducted real time by evaluating segments of video snippets based on a pre-defined time interval. We demonstrate the accuracy, robustness and the quality of highlights generated using a qualitative analysis conducted using 1625 human experts on highlights generated from two datasets. Further, we conduct a runtime analysis to demonstrate the efficient processing capability of the proposed method, to be used in real-time settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 10

VSCAN: An Enhanced Video Summarization Using Density-Based Spatial Clustering

An Optimized Clustered Based Video Synopsis by Using Artificial Intelligence

User Assisted Clustering Based Key Frame Extraction

References

Truong, B.T., Venkatesh, S.: Video abstraction: a systematic review and classification. ACM Trans. Multimed. Comput. Commun. Appl. 3(1), 3-es (2007). https://doi.org/10.1145/1198302.1198305
Article Google Scholar
Nawaratne, R., Alahakoon, D., Silva, D.D., Yu, X.: Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Trans. Ind. Inform. (2019). https://doi.org/10.1109/tii.2019.2938527
Article Google Scholar
Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Computer Vision—ECCV 2016, pp. 766–782 (2016)
Jung, B., Song, J., Lee, Y.: A narrative-based abstraction framework for story-oriented video. ACM Trans. Multimed. Comput. Commun. Appl. (2007). https://doi.org/10.1145/1230812.1230817
Article Google Scholar
Yu, Y., Lee, S., Na, J., Kang, J., Kim, G.: A deep ranking model for spatio-temporal highlight detection from a 360° video. In: Thirty-Second AAAI Conference on Artificial Intelligence. p. 9 (2018)
Yang, H., Wang, B., Lin, S., Wipf, D., Guo, M., Guo, B.: Unsupervised extraction of video highlights via robust recurrent auto-encoders. arXiv:1510.01442 [cs] (2015)
Ringer, C., Nicolaou, M.A.: Deep unsupervised multi-view detection of video game stream highlights. In: Proceedings of the 13th International Conference on the Foundations of Digital Games—FDG’18, Malmö, Sweden, pp. 1–6 (2018). https://doi.org/10.1145/3235765.3235781
Alahakoon, D., Halgamuge, S.K., Srinivasan, B.: Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans. Neural Netw. 11(3), 601–614 (2000). https://doi.org/10.1109/72.846732
Article Google Scholar
Amarasiri, R., Alahakoon, D., Smith, K., Premaratne, M.: HDGSOMr: a high dimensional growing self-organizing map using randomness for efficient web and text mining. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, Washington, DC, USA, pp. 215–221 (2005). https://doi.org/10.1109/wi.2005.70
Khosravi, M.R., Basri, H., Rostami, H., Samadi, S.: Distributed random cooperation for VBF-based routing in high-speed dense underwater acoustic sensor networks. J. Supercomput. 74(11), 6184–6200 (2018). https://doi.org/10.1007/s11227-018-2532-1
Article Google Scholar
Bandaragoda, T., et al.: Artificial intelligence based commuter behaviour profiling framework using Internet of things for real-time decision-making. Neural Comput. Appl. (2020). https://doi.org/10.1007/s00521-020-04736-7
Article Google Scholar
Gong, B., Chao, W.-L., Grauman, K., Sha, F.: Diverse sequential subset selection for supervised video summarization. In: Advances in neural information processing systems. pp. 2069–2077 (2014)
Zhang, Y., Liang, X., Zhang, D., Tan, M., Xing, E.P.: Unsupervised object-level video summarization with online motion auto-encoder. Pattern Recognit. Lett. (2018). https://doi.org/10.1016/j.patrec.2018.07.030
Article Google Scholar
Nawaratne, R., Alahakoon, D., De Silva, D., Chhetri, P., Chilamkurti, N.: Self-evolving intelligent algorithms for facilitating data interoperability in IoT environments. Future Gener. Comput. Syst. 86, 421–432 (2018). https://doi.org/10.1016/j.future.2018.02.049
Article Google Scholar
Hannane, R., Elboushaki, A., Afdel, K., Naghabhushan, P., Javed, M.: An efficient method for video shot boundary detection and keyframe extraction using SIFT-point distribution histogram. Int. J. Multimed. Inf. Retr. 5(2), 89–104 (2016). https://doi.org/10.1007/s13735-016-0095-6
Article Google Scholar
Karimi, V., Tashk, A.: Age and gender estimation by using hybrid facial features. In: 2012 20th Telecommunications Forum (TELFOR), pp. 1725–1728 (2012). https://doi.org/10.1109/TELfor.2012.6419560
Geng, T., Yang, M., You, Z., Cai, Y., Huang, F.: Multiscale overlapping blocks binarized statistical image features descriptor with flip-free distance for face verification in the wild. Neural Comput. Appl. 30(10), 3243–3252 (2018)
Article Google Scholar
Niu, K., Wang, H.: Video highlight extraction via content-aware deep transfer. Multimed. Tools Appl. (2019). https://doi.org/10.1007/s11042-019-7442-6
Article Google Scholar
Moses, T.M., Balachandran, K.: A deterministic key-frame indexing and selection for surveillance video summarization. In: 2019 International Conference on Data Science and Communication (IconDSC), pp. 1–5 (2019). https://doi.org/10.1109/IconDSC.2019.8816901
Fei, M., Jiang, W., Mao, W.: A novel compact yet rich key frame creation method for compressed video summarization. Multimed. Tools Appl. 77(10), 11957–11977 (2018). https://doi.org/10.1007/s11042-017-4843-2
Article Google Scholar
Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990). https://doi.org/10.1109/5.58325
Article Google Scholar
Kumar, K., Shrimankar, D.D., Singh, N.: SOMES: an efficient SOM technique for event summarization in multi-view surveillance videos. Recent Findings in Intelligent Computing Techniques, pp. 383–389. Springer, Singapore (2018)
Chapter Google Scholar
Gharaee, Z., Gärdenfors, P., Johnsson, M.: First and second order dynamics in a hierarchical SOM system for action recognition. Appl. Soft Comput. 59, 574–585 (2017). https://doi.org/10.1016/j.asoc.2017.06.007
Article Google Scholar
Farooq, F., Ahmed, J., Zheng, L.: Facial expression recognition using hybrid features and self-organizing maps. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, Hong Kong, pp. 409–414 (2017). https://doi.org/10.1109/icme.2017.8019503
Jung, Y., Cho, D., Kim, D., Woo, S., Kweon, I.S.: Discriminative feature learning for unsupervised video summarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8537–8544 (2019). https://doi.org/10.1609/aaai.v33i01.33018537
Nawaratne, R., Bandaragoda, T., Adikari, A., Alahakoon, D., De Silva, D., Yu, X.: Incremental knowledge acquisition and self-learning for autonomous video surveillance. In: IECON 2017—43rd Annual Conference of the IEEE Industrial Electronics Society, Beijing, pp. 4790–4795 (2017). https://doi.org/10.1109/iecon.2017.8216826
Zheng, H., Wang, H., Black, N.: Human activity detection in smart home environment with self-adaptive neural networks. In: 2008 IEEE International Conference on Networking, Sensing and Control, Sanya, China, pp. 1505–1510 (2008). https://doi.org/10.1109/icnsc.2008.4525459
Solichin, A., Harjoko, A., Putra, A.E.: Grid-based histogram of oriented optical flow for analyzing movements on video data. In: 2015 International Conference on Data and Software Engineering (ICoDSE), Yogyakarta, Indonesia, pp. 114–119 (2015). https://doi.org/10.1109/icodse.2015.7436982
Roth, S., Black, M.J.: On the spatial statistics of optical flow. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), vol. 1, pp. 42–49 (2005). https://doi.org/10.1109/iccv.2005.180
van Hateren, J.H., Ruderman, D.L.: Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex. Proc. Biol. Sci. 265(1412), 2315–2320 (1998)
Article Google Scholar
Fablet, R., Bouthemy, P.: Non parametric motion recognition using temporal multiscale Gibbs models. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1, p. I (2001). https://doi.org/10.1109/cvpr.2001.990516
Tavallali, P., Yazdi, M., Khosravi, M.R.: Robust cascaded skin detector based on AdaBoost. Multimed. Tools Appl. 78(2), 2599–2620 (2019)
Article Google Scholar
Tavallali, P., Yazdi, M., Khosravi, M.R.: An efficient training procedure for Viola–Jones face detector. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 828–831 (2017)
Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1), 185–203 (1981). https://doi.org/10.1016/0004-3702(81)90024-2
Article Google Scholar
Bruhn, A., Weickert, J., Schnörr, C.: Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Int. J. Comput. Vis. 61(3), 211–231 (2005). https://doi.org/10.1023/B:VISI.0000045324.43199.43
Article Google Scholar
Black, M.J., Anandan, P.: Robust dynamic motion estimation over time. In: 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Proceedings, pp. 296–302 (1991). https://doi.org/10.1109/CVPR.1991.139705
Wang, T., Snoussi, H.: Histograms of optical flow orientation for visual abnormal events detection. In: 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, pp. 13–18 (2012). https://doi.org/10.1109/AVSS.2012.39
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 (2005). https://doi.org/10.1109/cvpr.2005.177
Yin, H.: The self-organizing maps: background, theories, extensions and applications. In: Fulcher, J., Jain, L.C. (eds.) Computational Intelligence: A Compendium, pp. 715–762. Springer, Berlin (2008)
Chapter Google Scholar
Nallaperuma, D., et al.: Online incremental machine learning platform for big data-driven smart traffic management. IEEE Trans. Intell. Transp. Syst. (2019). https://doi.org/10.1109/tits.2019.2924883
Article Google Scholar
Nawaratne, R., Alahakoon, D., De Silva, D., Yu, X.: HT-GSOM: dynamic self-organizing map with transience for human activity recognition. In: 2019 IEEE 17th International Conference on Industrial Informatics (INDIN), vol. 1, pp. 270–273 (2019). https://doi.org/10.1109/indin41052.2019.8972260
Nawaratne, R., Alahakoon, D., De Silva, D., Kumara, H., Yu, X.: Hierarchical two-stream growing self-organizing maps with transience for human activity recognition. IEEE Trans. Ind. Inform. (2019). https://doi.org/10.1109/tii.2019.2957454
Article Google Scholar
Sohn, M.-W.: Distance and cosine measures of niche overlap. Soc. Netw. 23(2), 141–165 (2001). https://doi.org/10.1016/S0378-8733(01)00039-9
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 [cs] (2014)
Gygli, M., Grabner, H., Riemenschneider, H., Van Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision—ECCV 2014, vol. 8695, pp. 505–520. Springer, Cham (2014)
Chapter Google Scholar
Blank, M., Gorelick, L., Irani, E.S.M., Basri, R.: Actions as space-time shapes. In: 10th IEEE International Conference on Computer Vision (ICCV'05) Volume 1, IEEE, vol. 2, pp. 1395–1402 (2005)

Download references

Author information

Authors and Affiliations

University of Moratuwa, Katubadda, Sri Lanka
Pawara Gunawardena, Oshada Amila, Heshan Sudarshana, Amal Shehan Perera & Charith Chitraranjan
La Trobe University, Melbourne, VIC, Australia
Rashmika Nawaratne, Damminda Alahakoon, Naveen Chilamkurti & Daswin De Silva
The PNG University of Technology, Lae, Papua New Guinea
Ashish Kr. Luhach

Authors

Pawara Gunawardena
View author publications
You can also search for this author in PubMed Google Scholar
Oshada Amila
View author publications
You can also search for this author in PubMed Google Scholar
Heshan Sudarshana
View author publications
You can also search for this author in PubMed Google Scholar
Rashmika Nawaratne
View author publications
You can also search for this author in PubMed Google Scholar
Ashish Kr. Luhach
View author publications
You can also search for this author in PubMed Google Scholar
Damminda Alahakoon
View author publications
You can also search for this author in PubMed Google Scholar
Amal Shehan Perera
View author publications
You can also search for this author in PubMed Google Scholar
Charith Chitraranjan
View author publications
You can also search for this author in PubMed Google Scholar
Naveen Chilamkurti
View author publications
You can also search for this author in PubMed Google Scholar
Daswin De Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashish Kr. Luhach.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gunawardena, P., Amila, O., Sudarshana, H. et al. Real-time automated video highlight generation with dual-stream hierarchical growing self-organizing maps. J Real-Time Image Proc 18, 1457–1475 (2021). https://doi.org/10.1007/s11554-020-00957-0

Download citation

Received: 01 November 2019
Accepted: 05 March 2020
Published: 18 March 2020
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11554-020-00957-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Real-time automated video highlight generation with dual-stream hierarchical growing self-organizing maps

Abstract

Access this article

Similar content being viewed by others

VSCAN: An Enhanced Video Summarization Using Density-Based Spatial Clustering

An Optimized Clustered Based Video Synopsis by Using Artificial Intelligence

User Assisted Clustering Based Key Frame Extraction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-time automated video highlight generation with dual-stream hierarchical growing self-organizing maps

Abstract

Access this article

Similar content being viewed by others

VSCAN: An Enhanced Video Summarization Using Density-Based Spatial Clustering

An Optimized Clustered Based Video Synopsis by Using Artificial Intelligence

User Assisted Clustering Based Key Frame Extraction

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation