Abstract
Real-world advanced robotics applications cannot be conceived without the employment of onboard visual perception. By perception we refer not only to image acquisition, but more importantly to the information extraction required to carry out the robotic task. In this paper the computer vision system developed by the team of the University of Catania for the Mohamed Bin Zayed International Robotics Challenge 2020 is presented. The two challenges required to: 1) develop a team of drones for grasping a ball attached to another flying vehicle and to pierce a set of randomly placed balloons, 2) to build a wall by adopting a mobile manipulator and a flying vehicle. Several aspects have been taken into account in order to obtain a real-time and robust system, which are crucial features in demanding situations such as the ones posed by the challenges. The experimental results achieved in the real-world setting are reported.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Battiato, S., et al.: A system for autonomous landing of a UAV on a moving vehicle. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10484, pp. 129–139. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68560-1_12
blender online community: blender - a 3d modelling and rendering package. Blender foundation, blender institute, Amsterdam. https://www.blender.org
Cantelli, L., et al.: Autonomous landing of a UAV on a moving vehicle for the mbzirc. In: Human-Centric Robotics- Proceedings of the 20th International Conference on Climbing and Walking Robots and the Support Technologies for Mobile Machines, CLAWAR 2017, pp. 197–204 (2018)
Choi, J., Chun, D., Kim, H., Lee, H.J.: Gaussian yolov3: an accurate and fast object detector using localization uncertainty for autonomous driving. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Collet, A., Martinez, M., Srinivasa, S.: The moped framework: object recognition and pose estimation for manipulation. I. J. Robot. Res. 30, 1284–1306 (2011)
Dutta, A., Zisserman, A.: The VIA annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia. MM 2019, ACM, New York, NY, USA (2019)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. arXiv preprint arXiv:1703.06870 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 (2014)
Hinterstoisser, S., et al.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 International Conference on Computer Vision, pp. 858–865, November 2011
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42
Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: The European Conference on Computer Vision (ECCV) (2018)
Kendall, A., Grimes, M., Cipolla, R.: Convolutional networks for real-time 6-dof camera relocalization. CoRR abs/1505.07427 (2015)
Law, H., Deng, J.: Cornernet: Detecting objects as paired keypoints. In: The European Conference on Computer Vision (ECCV), September 2018
Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection. In: ICCV 2019 (2019)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Marchand, E., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: a hands-on survey. IEEE Trans. Vis. Comput. Graph. 22(12), 2633–2651 (2016)
Marder-Eppstein, E.: Project tango. In: Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH 2016, Anaheim, CA, USA, 24–28 July 2016, Real-Time Live! p. 25 (2016)
Ortis, A., Farinella, G., Torrisi, G., Battiato, S.: Exploiting objective text description of images for visual sentiment analysis. Multimedia Tools Appl. 80, 22323–22346 (2020)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. CoRR abs/1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Rosano, M., Furnari, A., Farinella, G.M.: A comparison of visual navigation approaches based on localization and reinforcement learning in virtual and real environments. In: International Conference on Computer Vision Theory and Applications (VISAPP) (2020)
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., Lecun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (ICLR2014) (2014)
Sutera, G., Guastella, D.C., Muscato, G.: A lightweight magnetic gripper for an aerial delivery vehicle: design and applications. ACTA IMEKO 10(3), 61–65 (2021)
Tan, Z., Nie, X., Qian, Q., Li, N., Li, H.: Learning to rank proposals for object detection. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., Birchfield, S.: Deep object pose estimation for semantic robotic grasping of household objects. In: Conference on Robot Learning (CoRL) (2018)
Wang, C., et al.: Densefusion: 6d object pose estimation by iterative dense fusion (2019)
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Wang, T., Anwer, R.M., Cholakkal, H., Khan, F.S., Pang, Y., Shao, L.: Learning rich features at high-speed for single-shot object detection. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Wu, Y., He, K.: Group normalization. In: The European Conference on Computer Vision (ECCV) (2018)
Zakharov, S., Shugurov, I., Ilic, S.: Dpod: 6d pose object detector and refiner. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Zhu, R., et al.: Scratchdet: training single-shot object detectors from scratch. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2268–2277 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Battiato, S. et al. (2023). The UNICT-TEAM Vision Modules for the Mohamed Bin Zayed International Robotics Challenge 2020. In: Gupta, D., Bhurchandi, K., Murala, S., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2022. Communications in Computer and Information Science, vol 1776. Springer, Cham. https://doi.org/10.1007/978-3-031-31407-0_53
Download citation
DOI: https://doi.org/10.1007/978-3-031-31407-0_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31406-3
Online ISBN: 978-3-031-31407-0
eBook Packages: Computer ScienceComputer Science (R0)