Abstract
English is the most common language globally, and it is increasingly important. English has been compiled in most online documents, information, and contents. However, with a considerable vocabulary, learning English is difficult for many people to remember. Therefore, many modern technologies have been proposed to support English learning, such as English learning technology through word-matching games to help children become excited and easily approach English from an early age. In addition, translation tools can help users look up vocabularies, antonyms, synonyms, and examples. This study presents a method to support learning English via object detection in videos, images, or even live-stream videos in real-time using deep learning architectures such as You Look Only Once (YOLO) - one of the finest families of object detection models with state-of-the-art performances. The method to obtain an mAP is 55.6 with 17GFlops. The results are vocabulary, meaning, and making sentences with that. Our method has good accuracy in data of 2786 images belonging to 59 classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
References
Liu, H., Aderon, C., Wagon, N., Liu, H., MacCall, S., Gan, Y.: Deep learning-based automatic player identification and logging in American football videos. arXiv preprint arXiv:2204.13809 (2022)
Zou, S., et al.: TOD-CNN: an effective convolutional neural network for tiny object detection in sperm videos. arXiv preprint arXiv:2204.08166 (2022)
Zhao, W., et al.: A survey of semen quality evaluation in microscopic videos using computer assisted sperm analysis. arXiv preprint arXiv:2202.07820 (2022)
Gu, Y., Liao, X., Qin, X.: YouTube-GDD: a challenging gun detection dataset with rich contextual information. arXiv preprint arXiv:2203.04129 (2022)
Yin, Q., et al.: Detecting and tracking small and dense moving objects in satellite videos: a benchmark. IEEE Trans. Geosci. Remote Sens. 60, 1–18 (2022). https://doi.org/10.1109/TGRS.2021.3130436
Zhu, X., Dai, J., Yuan, L., Wei, Y.: Towards high performance video object detection. arXiv preprint arXiv:1711.11577 (2017)
Tang, P., Wang, C., Wang, X., Liu, W., Zeng, W., Wang, J.: Object detection in videos by high quality object linking. arXiv preprint arXiv:1801.09823 (2018)
He, F., Gao, N., Jia, J., Zhao, X., Huang, K.: QueryProp: object query propagation for high-performance video object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, pp. 834–842 (2022). https://doi.org/10.1609/aaai.v36i1.19965
Han, M., Wang, Y., Chang, X., Qiao, Y.: Mining inter-video proposal relations for video object detection (2020). https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123660426.pdf
Kolarova, S.T.V., et al.: Autonomous driving (2016). https://www.ifmo.de/files/publications_content/2016/ifmo_2016_Autonomous_Driving_2035_en.pdf
Advantech Co., Ltd.: The future of intelligent surveillance (2012). https://advcloudfiles.advantech.com/ecatalog/MyAdvantech/MyAdvantech_No_11_eng.pdf
Han, H., et al.: Real-time robust video object detection system against physical-world adversarial attacks. arXiv preprint arXiv:2208.09195 (2022)
Schofield, D., et al.: Chimpanzee face recognition from videos in the wild using deep learning. Sci. Adv. 5(9), eaaw0736 (2019). https://www.science.org/doi/abs/10.1126/sciadv.aaw0736
Ardianto, S., Hang, H.M., Cheng, W.H.: Fast vehicle detection and tracking on fisheye traffic monitoring video using CNN and bounding box propagation. arXiv preprint arXiv:2207.01183 (2022), to be published in International Conference on Image Processing (ICIP) 2022, Bordeaux, France
Raskar, P.S., Shah, S.K.: Real time object-based video forgery detection using YOLO (V2) (2021). https://doi.org/10.1016/j.forsciint.2021.110979
Jiang, C., et al.: Object detection from UAV thermal infrared images and videos using YOLO models (2022). https://doi.org/10.1016/j.jag.2022.102912
Torresani, G.B.L., Shi, J.: Object detection in video with spatiotemporal sampling networks (2018). https://openaccess.thecvf.com/content_ECCV_2018/papers/Gedas_Bertasius_Object_Detection_in_ECCV_2018_paper.pdf
Deng, H., et al.: Object guided external memory network for video object detection (2019). https://ieeexplore.ieee.org/document/9011008
Oh, S.W., University, Y., Lee, J.Y., Research, A., Xu, N., Research, A., Kim, S.J., University, Y.: Video object segmentation using space-time memory networks (2019). https://openaccess.thecvf.com/content_ICCV_2019/papers/Oh_Video_Object_Segmentation_Using_Space-Time_Memory_Networks_ICCV_2019_paper.pdf
Fan, Q., Tang, C.K., Tai, Y.W.: Few-shot video object detection (2021). https://www.researchgate.net/publication/351278547_Few-Shot_Video_Object_Detection#pf9
Ultralytics: Ultralytics yolov5. https://github.com/ultralytics/yolov5. Accessed 27 Sep 2023
Acknowledgement
This study is funded in part by the Can Tho University, Code: THS2022-15.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Le, A.B.N. et al. (2024). An Approach for Object Recognition in Videos for Vocabulary Extraction. In: Cong Vinh, P., Mahfooz Ul Haque, H. (eds) Nature of Computation and Communication. ICTCC 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 586. Springer, Cham. https://doi.org/10.1007/978-3-031-59462-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-59462-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-59461-8
Online ISBN: 978-3-031-59462-5
eBook Packages: Computer ScienceComputer Science (R0)