ExpT: Online Action Detection via Exemplar-Enhanced Transformer for Secondary School Experimental Evaluation

Yuan, Haomiao; Zheng, Zhichao; Gu, Yanhui; Zhou, Junsheng; Chen, Yi

doi:10.1007/978-981-97-0791-1_30

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2024))

Included in the following conference series:

International Conference on Computer Science and Education

94 Accesses

Abstract

Secondary school experimental evaluation is an essential component of secondary school science education. However, it faces several challenges, including obstacles to precise assessment within limited time and the presence of inconsistent evaluation criteria. Hence, it has become imperative to explore and harness artificial intelligence technology to improve secondary school experimental evaluation. Yet existing applicable online action detection (OAD) algorithms are hindered by limitation to historical context and inefficiency, leading to setbacks in realistic experimental evaluations. Based on this, we present Exemplar-enhanced Transformer (ExpT), a real-time mechanism for online action detection that more accurately and efficiently assesses the experiments conducted by students. By leveraging exemplars through temporal cross attention, the ExpT model provides complementary guidance for modeling temporal dependencies, along with the reduction of excessive attention. We evaluate ExpT on two realistic chemistry experiment datasets for online action detection, and it significantly outperforms all existing methods.

Supported by the Natural Science Foundation of China (Nos. 62377029, 22033002).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: Vivit: a video vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6836–6846 (2021)
Google Scholar
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Cao, S., Luo, W., Wang, B., Zhang, W., Ma, L.: E2e-load: end-to-end long-form online action detection. arXiv preprint arXiv:2306.07703 (2023)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chen, G., et al.: Videollm: modeling video sequence with large language models. arXiv preprint arXiv:2305.13292 (2023)
Chen, J., Mittal, G., Yu, Y., Kong, Y., Chen, M.: Gatehub: gated history unit with background suppression for online action detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19925–19934 (2022)
Google Scholar
De Geest, R., Gavves, E., Ghodrati, A., Li, Z., Snoek, C., Tuytelaars, T.: Online action detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 269–284. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_17
Chapter Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Eun, H., Moon, J., Park, J., Jung, C., Kim, C.: Learning to discriminate information for online action detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 809–818 (2020)
Google Scholar
Gao, M., Zhou, Y., Xu, R., Socher, R., Xiong, C.: Woad: weakly supervised online action detection in untrimmed videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1915–1923 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with gpus. IEEE Trans. Big Data 7(3), 535–547 (2019)
Article Google Scholar
Kim, J., Misu, T., Chen, Y.T., Tawari, A., Canny, J.: Grounding human-to-vehicle advice for self-driving vehicles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10591–10599 (2019)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Nawhal, M., Mori, G.: Activity graph transformer for temporal action localization. arXiv preprint arXiv:2101.08540 (2021)
Neimark, D., Bar, O., Zohar, M., Asselmann, D.: Video transformer network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3163–3172 (2021)
Google Scholar
Pang, G., Yan, C., Shen, C., Hengel, A.V.D., Bai, X.: Self-trained deep ordinal regression for end-to-end video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12173–12182 (2020)
Google Scholar
Park, H., Noh, J., Ham, B.: Learning memory-guided normality for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14372–14381 (2020)
Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Google Scholar
Sharir, G., Noy, A., Zelnik-Manor, L.: An image is worth 16\(\times \)16 words, what is a video worth? arXiv preprint arXiv:2103.13915 (2021)
Shu, T., Xie, D., Rothrock, B., Todorovic, S., Chun Zhu, S.: Joint inference of groups, events and human roles in aerial videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4576–4584 (2015)
Google Scholar
Tan, J., Tang, J., Wang, L., Wu, G.: Relaxed transformer decoders for direct action proposal generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13526–13535 (2021)
Google Scholar
Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: masked autoencoders are data-efficient learners for self-supervised video pre-training. Adv. Neural. Inf. Process. Syst. 35, 10078–10093 (2022)
Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Wang, X., et al.: OADTR: online action detection with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7565–7575 (2021)
Google Scholar
Xu, M., Gao, M., Chen, Y.T., Davis, L.S., Crandall, D.J.: Temporal recurrent networks for online action detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5532–5541 (2019)
Google Scholar
Xu, M., Xiong, Y., Chen, H., Li, X., Xia, W., Tu, Z., Soatto, S.: Long short-term transformer for online action detection. Adv. Neural. Inf. Process. Syst. 34, 1086–1099 (2021)
Google Scholar
Yang, L., Han, J., Zhang, D.: Colar: effective and efficient online action detection by consulting exemplars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3160–3169 (2022)
Google Scholar
Yu, C., Ma, X., Ren, J., Zhao, H., Yi, S.: Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 507–523. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_30
Chapter Google Scholar
Zhao, Y., Krähenbühl, P.: Real-time online video detection with temporal smoothing transformers. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 485–502. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19830-4_28

Download references

Author information

Authors and Affiliations

Nanjing Normal University, Nanjing, 210023, Jiangsu, China
Haomiao Yuan, Zhichao Zheng, Yanhui Gu, Junsheng Zhou & Yi Chen

Authors

Haomiao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Zhichao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yanhui Gu
View author publications
You can also search for this author in PubMed Google Scholar
Junsheng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yi Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Chen .

Editor information

Editors and Affiliations

Xiamen University, Xiamen, China
Wenxing Hong
Xiamen University Malaysia, Sepang, Malaysia
Geetha Kanaparan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuan, H., Zheng, Z., Gu, Y., Zhou, J., Chen, Y. (2024). ExpT: Online Action Detection via Exemplar-Enhanced Transformer for Secondary School Experimental Evaluation. In: Hong, W., Kanaparan, G. (eds) Computer Science and Education. Teaching and Curriculum. ICCSE 2023. Communications in Computer and Information Science, vol 2024. Springer, Singapore. https://doi.org/10.1007/978-981-97-0791-1_30

Download citation

DOI: https://doi.org/10.1007/978-981-97-0791-1_30
Published: 26 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0790-4
Online ISBN: 978-981-97-0791-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ExpT: Online Action Detection via Exemplar-Enhanced Transformer for Secondary School Experimental Evaluation