Abstract
Large increase in audio, video and digital data in the internet signifies the importance of video annotation techniques. This paper mainly deals with the development of a hybrid algorithm for automatic Video Annotation (VA). Another aim in developing the algorithm is to improve the performance and precision as well as to reduce the amount of time required to obtain the annotations. The overall process leads to the development of efficient techniques for shot detection followed by two level key frame extractions and saliency based residual approach for feature extraction. For all the stages in VA like shot detection, keyframe extraction and feature extraction, factors relating to improve the performance are addressed here. The combination of color histogram difference (CBD) and Edge change ratio (ECR) is used here; as these two are the most promising techniques in shot detection. The new idea is proposed to fine tune the keyframe extraction, which extracts keyframe in two levels. At first level, the first frame in the shot is considered as a keyframe. But to remove redundancy, it enters into second level and finds the optimal set of keyframes by using fuzzy c-means clustering technique. Colour and texture features are used for feature extraction. Here the Video annotation process is divided into two sections, training and testing. The weight vector is found in training stage. Based on this feature vector, the similarity array is calculated in testing phase which further finds corrected annotations. The proposed method is compared with OMG-SSL and MMT-MGO and results are found better on Trechvid dataset. The significance of using weight vector is also experimentally shown here.




Similar content being viewed by others
References
Abdollahian G, Birinci M, Diaz-de-Maria F, Gabbouj M, Delp EJ (2011) A region-dependent image matching method for image and video annotation. In: 9th international workshop on content-based multimedia indexing (CBMI), 13–15 June 2011, pp 121–126. https://doi.org/10.1109/CBMI.2011.5972532
Adjeroh D, Lee MC, Banda N, Kandaswamy U (2009) Adaptive edge oriented shot boundary detection. Hindawi Publishing Corporation EURASIP Journal on Image and Video Processing 2009:1–13. https://doi.org/10.1155/2009/859371
Angadi S, Naik V (2014) Entropy based fuzzy c means clustering and key frame extraction for sports video summarization. 2014 Fifth International Conference on Signal and Image Processing. https://doi.org/10.1109/ICSIP.2014.49
Bi J, Liu X, Lang B (2011) A novel shot boundary detection based on information theory using SVM. 4th International Congress on Image and Signal Processing. IEEE, pp 512-516, https://doi.org/10.1109/CISP.2011.6099941
Boreczky JS, Rowe LA (1996) Comparison of video shot boundary detection techniques. J Electron Imaging 5(2):122–128
Brown LM (2010) Example-based color vehicle retrieval for surveillance. In: Seventh IEEE international conference on advanced video and signal based surveillance (AVSS), Aug 29–Sep 01 2010. IEEE, pp 91–96. https://doi.org/10.1109/AVSS.2010.59
Chasemani FF, Affendy LS, Mustapha N, Khalid F (2015) Automatic video annotation framework using concept detectors. J Appl Sci 15:256–263. https://doi.org/10.3923/jas.2015.256.263
Hong-cai F, Xiao-juan Y, Wei M, Cao Y (2010) A shot boundary detection method based on color space. In: International conference on E-business and E-government, 7–10 May 2010. https://doi.org/10.1109/ICEE.2010.417
Huo Y, Zhang P, Wang Y (2014) Adaptive threshold video shot boundary detection algorithm based on progressive bisection strategy. Int J Inf Comput Sci 11(2):391–403. https://doi.org/10.12733/jics20102621
Kavasidis I, Palazzo S, Di Salvo R, Giordano D, Spampinato C (2013) An innovative web-based collaborative platform for video annotation. Multimedia Tools and Applications 70(1):413–432. https://doi.org/10.1007/s11042-013-1419-7
Khurana K, Chandak MB (2013) Video annotation methodology based on ontology for transportation domain. International Journal of Advanced Research in Computer Science and Software Engineering 3(6):540–548
Lai J-L, Yi Y (2012) Key frame extraction based on visual attention model. Elsevier, Journal Of Visual Communication Image 23(1):114–125. https://doi.org/10.1016/j.jvcir.2011.08.005
Lei Y, Luo W, Wang Y (2012) Video Sequence Matching Based on the Invariance of Color Correlation. IEEE Transactions On Circuits And Systems For Video Technology 22(9):1332–1343
Li Y, Tian Y, Duan L-Y, Yang J, Huang T, Gao W (2010) Sequence Multi-Labeling: A Unified Video Annotation Scheme With Spatial and Temporal Context. IEEE Trans Multimedia 12(8):814–828
Li A, Yu F, Shi K (2011) A novel fast and effective video retrieval system for surveillance application. In: IEEE international conference on cyber technology in automation, control, and intelligent systems (CYBER), 20–23 Mar 2011, pp 153–157. https://doi.org/10.1109/CYBER.2011.6011783
Liu H, Meng W (2012) Key frame extraction of online video based on optimized frame difference. In: 9th international conference on fuzzy systems and knowledge discovery (FSKD 2012), 29–31 Mar 2012, pp 1238–1242. https://doi.org/10.1109/FSKD.2012.6233777
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: Proceedings of the international joint conference on artificial intelligence, 25–31 Jul 2015, pp 1617–1623
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: Sensor-based activity recognition. Neurocomputing, Elsevier 181:108–115
Moxley E, Mei T (2010) Video annotation through search and graph reinforcement mining. In: Manjunath BS (ed). IEEE Trans Multimedia 12(3):184–193. https://doi.org/10.1109/TMM.2010.2041101
Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Zhang HJ (2007) Correlative multi-label video annotation. In: Proc. ACM multimedia, 25–29 Sep 2017, pp 17–26. https://doi.org/10.1145/1291233.1291245
Rathod GI, Nikam DA (2013) An algorithm for shot boundary detection and keyframe extraction using histogram diffference. International Journal of Emerging Technology and Advanced Engineering 2013(8):155–163
Sayar A, Yarman Vural FT (2009) Image annotation with semi-supervised clustering. In: Computer and information sciences, 14–16 Sep 2009. ISCIS 24th international symposium. IEEE, pp 12–17. https://doi.org/10.1109/ISCIS.2009.5291929
Swain MJ, Ballard DH (1991) Color Indexing. J Comput Vis 7(1):11–32
Tang J, Hua X-S, Wang M, Zhiwei G, Qi G-J, Xiuqing W (2009) Correlative Linear Neighborhood Propagation for Video Annotation. IEEE Transactions On Systems, Man, And Cybernetics—Part B: Cybernetics 39(2):409–416
Thakar VB, Hadia SK (2013) An adaptive novel feature based approach for automatic video shot boundary detection. In: International conference on intelligent systems and signal processing (ISSP), 1–2 Mar 2013. IEEE, pp 145–149. https://doi.org/10.1109/ISSP.2013.6526891
Wang M, Hua X-S, Tang J, Hong R (2009a) Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation. IEEE Trans Multimedia 11(3):465–476
Wang M, Hua X-S, Hong R, Tang J, Qi G-J, Song Y (2009b) Unified Video Annotation via Multigraph Learning. IEEE Transactions On Circuits And Systems For Video Technology 19(5):733–746
Xu CS, Wang JJ, Lu HQ, Zhang YF (2008) A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans Multimedia 10(3):421–436
Xu S, Tang S, Zhang Y, Li J, Zheng Y-T (2012) Exploring multi-modality structure for cross domian adaptation in video concept annotation. Journal of Neurocomputing 95:11–21. https://doi.org/10.1016/j.neucom.2011.05.041
Zhang H, Kankanhalli A, Smoliar SW (1993) Multimedia systems 1:10. https://doi.org/10.1007/BF01210504
Zhang S, Huang J, Huang Y, Yu Y, Li H, Metaxas DN (2012) Automatic Image Annotation Using Group Sparsity. IEEE Trans Syst Man Cybern B Cybern 42(3):838–849
Yu Qiu (2010) Improving News Video Annotation with Semantic Context, International Conference on Digital Image Computing: Techniques and Applications, DICTA 2010, Sydney, Australia. https://doi.org/10.1109/DICTA.2010.47
Golnaz Abdollahian, Murat Birinci Fernando Diaz-de-Maria , Moncef Gabbouj, Edward J. Delp (2011a) A region-dependent image matching method for image and video annotation. Content-Based Multimedia Indexing (CBMI), 9th International Workshop, ISSN: 1949-3983, pp. 121–126
Guo-Jun Qi, Yan Song, Xian-Sheng Hua, Li-Rong Dai, Hong-Jiang Zhang (2006) Video Annotation by Active Learning and Cluster Tuning, in International. Workshop on Semantic Learning Applications in Multimedia (SLAM 2006). In association with CVPR
Waqas Sultan (2016) What if we do not have multiple videos of the same action? - Video Action Localization Using Web Images, IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR)
Fereshteh FC, Affendy LS, Khalid NMF (2015) Automatic Video Annotation Framework Using Concept Detectors. J Appl Sci. https://doi.org/10.3923/jas.2015.256.263
Guojing Xuan (2013) A Video Annotation Method Based on Color Statistics, International Conference on Computer Sciences and Applications. doi: https://doi.org/10.1109/CSA.2013.151
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Aote, S.S., Potnurwar, A. An automatic video annotation framework based on two level keyframe extraction mechanism. Multimed Tools Appl 78, 14465–14484 (2019). https://doi.org/10.1007/s11042-018-6826-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6826-3