research-article

Temporal Dynamic Concept Modeling Network for Explainable Video Event Recognition

Authors:

Weigang Zhang,

Zhaobo Qi,

Shuhui Wang,

Chi Su,

Li Su,

Qingming HuangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 19, Issue 6

Article No.: 219, Pages 1 - 22

https://doi.org/10.1145/3568312

Published: 12 July 2023 Publication History

Get Access

Abstract

Recently, with the vigorous development of deep learning and multimedia technology, intelligent urban computing has received more and more extensive attention from academia and industry. Unfortunately, most of the related technologies are black-box paradigms that lack interpretability. Among them, video event recognition is a basic technology. Event contains multiple concepts and their rich interactions, which can assist us to construct explainable event recognition methods. However, the crucial concepts needed to recognize events have various temporal existing patterns, and the relationship between events and the temporal characteristics of concepts has not been fully exploited. This brings great challenges for concept-based event categorization. To address the above issues, we introduce the temporal concept receptive field, which is the length of the temporal window size required to capture key concepts for concept-based event recognition methods. Accordingly, we introduce the temporal dynamic convolution (TDC) to model the temporal concept receptive field dynamically according to different events. Its core idea is to combine the results of multiple convolution layers with the learned coefficients from two complementary perspectives. These convolution layers contain a variety of kernel sizes, which can provide temporal concept receptive fields of different lengths. Similarly, we also propose the cross-domain temporal dynamic convolution (CrTDC) with the help of the rich relationship between different concepts. Different coefficients can help us to capture suitable temporal concept receptive field sizes and highlight crucial concepts to obtain accurate and complete concept representations for event analysis. Based on the TDC and CrTDC, we introduce the temporal dynamic concept modeling network (TDCMN) for explainable video event recognition. We evaluate TDCMN on large-scale and challenging datasets FCVID, ActivityNet, and CCV. Experimental results show that TDCMN significantly improves the event recognition performance of concept-based methods, and the explainability of our method inspires us to construct more explainable models from the perspective of the temporal concept receptive field.

References

[1]

Kashif Ahmad and Nicola Conci. 2019. How deep features have improved event recognition in multimedia: A survey. ACM Trans. Multim. Comput., Commun. Applic. 15, 2 (2019), 1–27.

Abstract

References

Cited By

Index Terms

Recommendations

Towards More Explainability: Concept Knowledge Mining Network for Event Recognition

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

Recommendations for video event recognition using concept vocabularies

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Full Text

Share

Share this Publication link

Share on social media

Affiliations