An Approach for Object Recognition in Videos for Vocabulary Extraction

Le, Anh Bao Nguyen; Nguyen, Chi Bao; Dang, Quoc Cuong; Danh, Be Hai; Le, Huynh Nhu; Luong, Huong Hoang; Nguyen, Hai Thanh

doi:10.1007/978-3-031-59462-5_3

Anh Bao Nguyen Le¹⁷,
Chi Bao Nguyen¹⁷,
Quoc Cuong Dang¹⁷,
Be Hai Danh¹⁷,
Huynh Nhu Le¹⁷,
Huong Hoang Luong¹⁸ &
…
Hai Thanh Nguyen¹⁷

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 586))

Included in the following conference series:

International Conference on Nature of Computation and Communication

20 Accesses

Abstract

English is the most common language globally, and it is increasingly important. English has been compiled in most online documents, information, and contents. However, with a considerable vocabulary, learning English is difficult for many people to remember. Therefore, many modern technologies have been proposed to support English learning, such as English learning technology through word-matching games to help children become excited and easily approach English from an early age. In addition, translation tools can help users look up vocabularies, antonyms, synonyms, and examples. This study presents a method to support learning English via object detection in videos, images, or even live-stream videos in real-time using deep learning architectures such as You Look Only Once (YOLO) - one of the finest families of object detection models with state-of-the-art performances. The method to obtain an mAP is 55.6 with 17GFlops. The results are vocabulary, meaning, and making sentences with that. Our method has good accuracy in data of 2786 images belonging to 59 classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Liu, H., Aderon, C., Wagon, N., Liu, H., MacCall, S., Gan, Y.: Deep learning-based automatic player identification and logging in American football videos. arXiv preprint arXiv:2204.13809 (2022)
Zou, S., et al.: TOD-CNN: an effective convolutional neural network for tiny object detection in sperm videos. arXiv preprint arXiv:2204.08166 (2022)
Zhao, W., et al.: A survey of semen quality evaluation in microscopic videos using computer assisted sperm analysis. arXiv preprint arXiv:2202.07820 (2022)
Gu, Y., Liao, X., Qin, X.: YouTube-GDD: a challenging gun detection dataset with rich contextual information. arXiv preprint arXiv:2203.04129 (2022)
Yin, Q., et al.: Detecting and tracking small and dense moving objects in satellite videos: a benchmark. IEEE Trans. Geosci. Remote Sens. 60, 1–18 (2022). https://doi.org/10.1109/TGRS.2021.3130436
Zhu, X., Dai, J., Yuan, L., Wei, Y.: Towards high performance video object detection. arXiv preprint arXiv:1711.11577 (2017)
Tang, P., Wang, C., Wang, X., Liu, W., Zeng, W., Wang, J.: Object detection in videos by high quality object linking. arXiv preprint arXiv:1801.09823 (2018)
He, F., Gao, N., Jia, J., Zhao, X., Huang, K.: QueryProp: object query propagation for high-performance video object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, pp. 834–842 (2022). https://doi.org/10.1609/aaai.v36i1.19965
Han, M., Wang, Y., Chang, X., Qiao, Y.: Mining inter-video proposal relations for video object detection (2020). https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123660426.pdf
Kolarova, S.T.V., et al.: Autonomous driving (2016). https://www.ifmo.de/files/publications_content/2016/ifmo_2016_Autonomous_Driving_2035_en.pdf
Advantech Co., Ltd.: The future of intelligent surveillance (2012). https://advcloudfiles.advantech.com/ecatalog/MyAdvantech/MyAdvantech_No_11_eng.pdf
Han, H., et al.: Real-time robust video object detection system against physical-world adversarial attacks. arXiv preprint arXiv:2208.09195 (2022)
Schofield, D., et al.: Chimpanzee face recognition from videos in the wild using deep learning. Sci. Adv. 5(9), eaaw0736 (2019). https://www.science.org/doi/abs/10.1126/sciadv.aaw0736
Ardianto, S., Hang, H.M., Cheng, W.H.: Fast vehicle detection and tracking on fisheye traffic monitoring video using CNN and bounding box propagation. arXiv preprint arXiv:2207.01183 (2022), to be published in International Conference on Image Processing (ICIP) 2022, Bordeaux, France
Raskar, P.S., Shah, S.K.: Real time object-based video forgery detection using YOLO (V2) (2021). https://doi.org/10.1016/j.forsciint.2021.110979
Jiang, C., et al.: Object detection from UAV thermal infrared images and videos using YOLO models (2022). https://doi.org/10.1016/j.jag.2022.102912
Torresani, G.B.L., Shi, J.: Object detection in video with spatiotemporal sampling networks (2018). https://openaccess.thecvf.com/content_ECCV_2018/papers/Gedas_Bertasius_Object_Detection_in_ECCV_2018_paper.pdf
Deng, H., et al.: Object guided external memory network for video object detection (2019). https://ieeexplore.ieee.org/document/9011008
Oh, S.W., University, Y., Lee, J.Y., Research, A., Xu, N., Research, A., Kim, S.J., University, Y.: Video object segmentation using space-time memory networks (2019). https://openaccess.thecvf.com/content_ICCV_2019/papers/Oh_Video_Object_Segmentation_Using_Space-Time_Memory_Networks_ICCV_2019_paper.pdf
Fan, Q., Tang, C.K., Tai, Y.W.: Few-shot video object detection (2021). https://www.researchgate.net/publication/351278547_Few-Shot_Video_Object_Detection#pf9
Ultralytics: Ultralytics yolov5. https://github.com/ultralytics/yolov5. Accessed 27 Sep 2023

Download references

Acknowledgement

This study is funded in part by the Can Tho University, Code: THS2022-15.

Author information

Authors and Affiliations

College of Information and Communication Technology, Can Tho University, Can Tho, Vietnam
Anh Bao Nguyen Le, Chi Bao Nguyen, Quoc Cuong Dang, Be Hai Danh, Huynh Nhu Le & Hai Thanh Nguyen
FPT University, Can Tho, Vietnam
Huong Hoang Luong

Authors

Anh Bao Nguyen Le
View author publications
You can also search for this author in PubMed Google Scholar
Chi Bao Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Quoc Cuong Dang
View author publications
You can also search for this author in PubMed Google Scholar
Be Hai Danh
View author publications
You can also search for this author in PubMed Google Scholar
Huynh Nhu Le
View author publications
You can also search for this author in PubMed Google Scholar
Huong Hoang Luong
View author publications
You can also search for this author in PubMed Google Scholar
Hai Thanh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hai Thanh Nguyen .

Editor information

Editors and Affiliations

Nguyen Tat Thanh University, Ho Chi Minh City, Vietnam
Phan Cong Vinh
University of Central Punjab in Pakistan, Johar Town, Pakistan
Hafiz Mahfooz Ul Haque

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le, A.B.N. et al. (2024). An Approach for Object Recognition in Videos for Vocabulary Extraction. In: Cong Vinh, P., Mahfooz Ul Haque, H. (eds) Nature of Computation and Communication. ICTCC 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 586. Springer, Cham. https://doi.org/10.1007/978-3-031-59462-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-59462-5_3
Published: 03 May 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-59461-8
Online ISBN: 978-3-031-59462-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Approach for Object Recognition in Videos for Vocabulary Extraction