Eye Control and Motion with Deep Reinforcement Learning: In Virtual and Physical Environments

Arizmendi, Sergio; Paz, Asdrubal; González, Javier; Ponce, Hiram

doi:10.1007/978-3-031-47765-2_8

Sergio Arizmendi¹⁰,
Asdrubal Paz¹⁰,
Javier González¹⁰ &
…
Hiram Ponce¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14391))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

524 Accesses

Abstract

Attention mechanism in computer vision refers to scan, detect, and track a target object. This paper aims to develop and virtually train a machine learning model for object attention mechanism, combining object detection and mechanical automation. For this, we use Unity 3D Engine to model a simple scene in which two virtual cameras align together to realize a monocular attention in specific objects. Deep reinforcement learning, via ML-agent’s library, was used to train a model that aligns the virtual cameras. Moreover, the model was transferred to a physical camera to replicate the performance of attention mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep Reinforcement Learning: A New Frontier in Computer Vision Research

Vision-based attention deep q-network with prior-based knowledge

Article 24 March 2025

Deep reinforcement learning in computer vision: a comprehensive survey

Article 29 September 2021

References

ml agents@unity3d.com: unity ml-agents toolkit (2022). https://github.com/Unity-Technologies/ml-agents/tree/develop/docs
Badue, C., et al.: Self-driving cars: a survey. Expert Syst. Appl. 165, 113816 (2021)
Article Google Scholar
Baker, B., et al.: Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528 (2019)
Grisetti, G., Kümmerle, R., Stachniss, C., Burgard, W.: A tutorial on graph-based slam. IEEE Intell. Transp. Syst. Mag. 2(4), 31–43 (2010)
Article Google Scholar
Praeger, M., Xie, Y., Grant-Jacob, J.A., Eason, R.W., Mills, B.: Playing optical tweezers with deep reinforcement learning: in virtual, physical and augmented environments. Mach. Learn. Sci. Technol. 2(3), 035024 (2021)
Article Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Technologies, U.: Monobehaviour.fixedupdate(). unity documentation (2021)
Google Scholar
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7475 (2023)
Google Scholar
Ward, T.M., et al.: Computer vision in surgery. Surgery 169(5), 1253–1256 (2021)
Article Google Scholar
Won, J., Gopinath, D., Hodgins, J.: Control strategies for physically simulated characters performing two-player competitive sports. ACM Trans. Graph. (TOG) 40(4), 1–11 (2021)
Article Google Scholar
Zakka, K., Zeng, A., Lee, J., Song, S.: Form2Fit: learning shape priors for generalizable assembly from disassembly. In: Proceedings of the IEEE International Conference on Robotics and Automation (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Panamericana, Facultad de Ingeniería, Augusto Rodin 498, 03920, Ciudad de México, Mexico
Sergio Arizmendi, Asdrubal Paz, Javier González & Hiram Ponce

Authors

Sergio Arizmendi
View author publications
You can also search for this author in PubMed Google Scholar
Asdrubal Paz
View author publications
You can also search for this author in PubMed Google Scholar
Javier González
View author publications
You can also search for this author in PubMed Google Scholar
Hiram Ponce
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergio Arizmendi .

Editor information

Editors and Affiliations

Center for Computing Research, Instituto Politécnico Nacional, Ciudad de México, Distrito Federal, Mexico
Hiram Calvo
Facultad de Ingeniería, Universidad Panamericana, Ciudad de México, Mexico
Lourdes Martínez-Villaseñor
Facultad de Ingeniería, Universidad Panamericana, Ciudad de México, Mexico
Hiram Ponce

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arizmendi, S., Paz, A., González, J., Ponce, H. (2024). Eye Control and Motion with Deep Reinforcement Learning: In Virtual and Physical Environments. In: Calvo, H., Martínez-Villaseñor, L., Ponce, H. (eds) Advances in Computational Intelligence. MICAI 2023. Lecture Notes in Computer Science(), vol 14391. Springer, Cham. https://doi.org/10.1007/978-3-031-47765-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-47765-2_8
Published: 09 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47764-5
Online ISBN: 978-3-031-47765-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Eye Control and Motion with Deep Reinforcement Learning: In Virtual and Physical Environments