Multi-label semantic concept detection in videos using fusion of asymmetrically trained deep convolutional neural networks and foreground driven concept co-occurrence matrix

Janwe, Nitin J.; Bhoyar, Kishor K.

doi:10.1007/s10489-017-1033-x

Multi-label semantic concept detection in videos using fusion of asymmetrically trained deep convolutional neural networks and foreground driven concept co-occurrence matrix

Published: 18 September 2017

Volume 48, pages 2047–2066, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

494 Accesses
14 Citations
Explore all metrics

Abstract

Describing visual contents in videos by semantic concepts is an effective and realistic approach that can be used in video applications such as annotation, indexing, retrieval and ranking. In these applications, video data needs to be labelled with some known set of labels or concepts. Assigning semantic concepts manually is not feasible due to the large volume of ever-growing video data. Hence, automatic semantic concept detection of videos is a hot research area. Recently Deep Convolutional Neural Networks (CNNs) used in computer vision tasks are showing remarkable performance. In this paper, we present a novel approach for automatic semantic video concept detection using deep CNN and foreground driven concept co-occurrence matrix (FDCCM) which keeps foreground to background concept co-occurrence values, built by exploiting concept co-occurrence relationship in pre-labelled TRECVID video dataset and from a collection of random images extracted from Google Images. To deal with the dataset imbalance problem, we have extended this approach by making a fusion of two asymmetrically trained deep CNNs and used FDCCM to further improve concept detection. The performance of the proposed approach is compared with state-of-the-art approaches for the video concept detection over the widely used TRECVID data set and is found to be superior to existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Correlation-Based Deep Learning for Multimedia Semantic Concept Detection

A comparative study for multiple visual concepts detection in images and videos

Article 23 June 2015

High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations

Abbreviations

CCM:: concept co-occurrence matrix
FDCCM:: foreground driven concept co-occurrence matrix
CNN:: convolutional neural network

References

Feng L, Bhanu B (2016) Semantic concept co-occurrence patterns for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 38(2):785–799
Article Google Scholar
Kuo CH, Chou YH, Chang PC (2016) Using deep convolutional neural networks for image retrieval. Soc Imag Sci Technol. https://doi.org/10.2352/ISSN.2470-1173.2016.2.VIPC-231
Podlesnaya A, Podlesnyy S (2016) Deep learning based semantic video indexing and retrieval. arXiv:1601.07754 [cs.IR]
McCormac J, Handa A, Davison A, Leutenegger S (2016) SemanticFusion: dense 3D semantic mapping with convolutional neural networks. arXiv:1609.05130v2 [cs.CV]
Kikuchi K, Ueki K, Ogawa T, Kobayashi T (2016) Video semantic indexing using object detection-derived features. In: Proc. 24th European signal processing conference (EUSIPCO). Budapest, pp 1288–1292
Awad G, Snoek CGM, Smeaton AF, Quénot G (2016) TRECVid semantic indexing of video: a 6-year retrospective. ITE Trans Med Technol Appl (MTA) 4(1):187–208
Google Scholar
Janwe NJ, Bhoyar KK (2016) Neural network based multi-label semantic video concept detection using novel mixed-hybrid-fusion approach. In: Proceedings of the 2nd international conference on communication and information processing, ICCIP 2016. ACM, Singapore, pp 129–133
Vedaldi A, Lenc K (2015) MatConvNet: convolutional neural networks for MATLAB. In: Proc. of the int. conf. on multimedia. ACM, pp 689-692. https://doi.org/10.1145/2733373.2807412
Modiri S, Amir A, Zamir R, Shah M (2014) Video classification using semantic concept co-occurrences. https://doi.org/10.1109/CVPR.2014.324
Li X, Zhao F, Guo Y (2014) Multi-label image classification with a probabilistic label enhancement model. In: UAI’14 Proceedings of the thirtieth conference on uncertainty in artificial intelligence, pp 430-439
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: Proceedings of the international conference on machine learning, ICML. Beijing, pp 647– 655
Zeiler MD, Fergus R (2013) Visualizing and understanding convolutional networks. arXiv:1311.2901 [cs.CV]
Memar S, Suriani AL (2013) An integrated semantic-based approach in concept based video retrieval. Multimed Tools Appl 64:77–95. 10.1007/s11042-011-0848-4
Article Google Scholar
Oquab M, Bottou L, Laptev I, Sivic J (2013) Learning and transferring mid-level image representations using convolutional neural networks. Technical Report HAL-00911179, INRIA
Ma H, Zhu J, Lyu MRT, King I (2010) Bridging the semantic gap between image contents and tags. IEEE Trans Multimed 12(5):462–473
Article Google Scholar
Jia D, Berg A, Fei-Fei L (2011) Hierarchical semantic indexing for large scale image retrieval. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition, CVPR 2011. Colorado Springs, pp 785–792
Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: 2009 IEEE Computer society conference on computer vision and pattern recognition workshops, CVPR Workshops. Miami, pp 1778–1785
Bobick A, Davis J (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(1):257–267
Article Google Scholar
Davis JW, Bobick AF (1997) The representation and recognition of action using temporal templates. In: Proc. IEEE International conference on computer vision and pattern recognition, pp 928–934
Zelnik ML, Irani M (2006) Statistical analysis of dynamic actions. IEEE Trans Pattern Anal Mach Intell 28(9):1530–1535
Article Google Scholar
Dong X, Chang SF (2007) Visual event recognition in news video using kernel methods with multi-level temporal alignment. In: Proc. IEEE international conference on computer vision and pattern recognition. Minneapolis
Zhou X, Zhuang X, Yan S, Chang SF, Hasegawa-Johnson M, Huang TS (2008) Sift-bag kernel for video event analysis. In: Proc. ACM international conference on multimedia. Vancouver, pp 229–238
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: ANIPS, pp 1–8
LeCun L, Bottou Y, Bengio, Haffner P (1998) Gradient based learning applied to document recognition. Proc IEEE 86(5):2278–2324
Article Google Scholar
Dean G, Corrado R, Monga K, Chen M, Devin Q, Le M, Mao M, Ranzato A, Senior P, Tucker K, Yang, Ng A (2012) Large scale distributed deep networks. In: NIPS, pp 1–9
Rumelhart D, Hinton G, Williams R (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article MATH Google Scholar
Torralba A, Murphy KP, Freeman WT (2004) Contextual models for object detection using boosted random fields. In: Proc. Adv. neural inf. process. syst., pp 1401–1408
Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: Proc. 11th IEEE int. conf. comput. vis., pp 1–8
Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: Proc. IEEE Conf. comput. vis. pattern recog., pp 1–8
Hwang S, Grauman K (2010) Reading between the lines: object localization using implicit cues from image tags. In: Proc. IEEE Conf. comput. vis. pattern recog., pp 1145–1158
Torralba A (2003) Contextual priming for object detection. Int J Comput Vis 53(2):169–191
Article MathSciNet Google Scholar
Divvala S, Hoiem D, Hays J, Efros A, Hebert M (2009) An empirical study of context in object detection. In: Proc. IEEE Conf. comput. vis. pattern recog., pp 1271–1278
Feng L, Bhanu B (2012) Semantic-visual concept relatedness and co-occurrences for image retrieval. In: ICIP, pp 2429–2432
Wang J, Zhao Y, Wu X, Hua XS (2011) A transductive multi-label learning approach for video concept detection. Pattern Recogn 44:2274–2286
Article MATH Google Scholar
Zha ZJ, Liu Y, Mei T, Hua XS (2007) Video concept detection using support vector machines - trecvid 2007 evaluations. Technical report Microsoft Research Lab – Asia
Mazloom M, Li X, Snoek CGM (2016) TagBook: a semantic video representation without supervision for event detection. IEEE Trans Multimed 18(7):1378–1388
Article Google Scholar
Markatopoulou F, Mezaris V, Patras I (2015) Cascade of classifiers based on binary, non-binary and deep convolutional network descriptors for video concept detection. In: Proc. IEEE Int. conf. on image processing. Quebec City, pp 1786–1790
Markatopoulou F, Mezaris V, Patras I (2016) Deep multi-task learning with label correlation constraint for video concept detection. In: Proc. of the ACM multimedia conference. Amsterdam, pp 501–505
Sun Y, Sudo K, Taniguchi Y (2014) TRECVid 2013 semantic video concept detection by NTT-MD-DUT. In: Proc. of Trecvid 2014
Chen X, Chen S, Wu Y (2017) Coverless information hiding method based on the Chinese character encoding. J Int Technol 18(2):91–98. https://doi.org/10.6138/JIT.2017.18.2.20160815
Tian Q, Chen S (2017) Cross-heterogeneous-database age estimation through correlation representation learning. J Neurocomput 238:286–295
Article Google Scholar
Xue Y, Jiang J, Zhao B, Ma T (2017) A self-adaptive artificial bee colony algorithm based on global best for global optimization. Soft Comput 1–18. https://doi.org/10.1007/s00500-017-2547-1
Yuan C, Xia Z, Sun X (2017) Coverless image steganography based on SIFT and BOF. J Int Technol 18(2):209– 216
Google Scholar
Wei W, Fan X, Song H, Fan X, Yang J (2016) Imperfect information dynamic stackelberg game based resource allocation using hidden Markov for cloud computing. IEEE Trans Services Comput (99) https://doi.org/10.1109/TSC.2016.2528246
Chen Y, Hao C, Wu W, Wu E (2016) Robust dense reconstruction by range merging based on confidence estimation. Sci Chin Inf Sci 59(9):1–11. https://doi.org/10.1007/s11432-015-0957-4
Google Scholar
NIST: http://www.nist.gov
TRECVID: http://www-nlpir.nist.go

Download references

Author information

Authors and Affiliations

Department of Information Technology, YCCE, Nagpur, India
Nitin J. Janwe & Kishor K. Bhoyar

Authors

Nitin J. Janwe
View author publications
You can also search for this author in PubMed Google Scholar
Kishor K. Bhoyar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nitin J. Janwe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Janwe, N.J., Bhoyar, K.K. Multi-label semantic concept detection in videos using fusion of asymmetrically trained deep convolutional neural networks and foreground driven concept co-occurrence matrix. Appl Intell 48, 2047–2066 (2018). https://doi.org/10.1007/s10489-017-1033-x

Download citation

Published: 18 September 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s10489-017-1033-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-label semantic concept detection in videos using fusion of asymmetrically trained deep convolutional neural networks and foreground driven concept co-occurrence matrix

Abstract

Access this article

Similar content being viewed by others

Correlation-Based Deep Learning for Multimedia Semantic Concept Detection

A comparative study for multiple visual concepts detection in images and videos

High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-label semantic concept detection in videos using fusion of asymmetrically trained deep convolutional neural networks and foreground driven concept co-occurrence matrix

Abstract

Access this article

Similar content being viewed by others

Correlation-Based Deep Learning for Multimedia Semantic Concept Detection

A comparative study for multiple visual concepts detection in images and videos

High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation