Skip to main content

The UNICT-TEAM Vision Modules for the Mohamed Bin Zayed International Robotics Challenge 2020

  • Conference paper
  • First Online:
Computer Vision and Image Processing (CVIP 2022)

Abstract

Real-world advanced robotics applications cannot be conceived without the employment of onboard visual perception. By perception we refer not only to image acquisition, but more importantly to the information extraction required to carry out the robotic task. In this paper the computer vision system developed by the team of the University of Catania for the Mohamed Bin Zayed International Robotics Challenge 2020 is presented. The two challenges required to: 1) develop a team of drones for grasping a ball attached to another flying vehicle and to pierce a set of randomly placed balloons, 2) to build a wall by adopting a mobile manipulator and a flying vehicle. Several aspects have been taken into account in order to obtain a real-time and robust system, which are crucial features in demanding situations such as the ones posed by the challenges. The experimental results achieved in the real-world setting are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.youtube.com/watch?v=u106Vy-XJ7c.

References

  1. Battiato, S., et al.: A system for autonomous landing of a UAV on a moving vehicle. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10484, pp. 129–139. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68560-1_12

    Chapter  Google Scholar 

  2. blender online community: blender - a 3d modelling and rendering package. Blender foundation, blender institute, Amsterdam. https://www.blender.org

  3. Cantelli, L., et al.: Autonomous landing of a UAV on a moving vehicle for the mbzirc. In: Human-Centric Robotics- Proceedings of the 20th International Conference on Climbing and Walking Robots and the Support Technologies for Mobile Machines, CLAWAR 2017, pp. 197–204 (2018)

    Google Scholar 

  4. Choi, J., Chun, D., Kim, H., Lee, H.J.: Gaussian yolov3: an accurate and fast object detector using localization uncertainty for autonomous driving. In: The IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  5. Collet, A., Martinez, M., Srinivasa, S.: The moped framework: object recognition and pose estimation for manipulation. I. J. Robot. Res. 30, 1284–1306 (2011)

    Article  Google Scholar 

  6. Dutta, A., Zisserman, A.: The VIA annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia. MM 2019, ACM, New York, NY, USA (2019)

    Google Scholar 

  7. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  8. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  9. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. arXiv preprint arXiv:1703.06870 (2017)

  10. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 (2014)

    Google Scholar 

  11. Hinterstoisser, S., et al.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 International Conference on Computer Vision, pp. 858–865, November 2011

    Google Scholar 

  12. Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42

    Chapter  Google Scholar 

  13. Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: The European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  14. Kendall, A., Grimes, M., Cipolla, R.: Convolutional networks for real-time 6-dof camera relocalization. CoRR abs/1505.07427 (2015)

    Google Scholar 

  15. Law, H., Deng, J.: Cornernet: Detecting objects as paired keypoints. In: The European Conference on Computer Vision (ECCV), September 2018

    Google Scholar 

  16. Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection. In: ICCV 2019 (2019)

    Google Scholar 

  17. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  18. Marchand, E., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: a hands-on survey. IEEE Trans. Vis. Comput. Graph. 22(12), 2633–2651 (2016)

    Article  Google Scholar 

  19. Marder-Eppstein, E.: Project tango. In: Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH 2016, Anaheim, CA, USA, 24–28 July 2016, Real-Time Live! p. 25 (2016)

    Google Scholar 

  20. Ortis, A., Farinella, G., Torrisi, G., Battiato, S.: Exploiting objective text description of images for visual sentiment analysis. Multimedia Tools Appl. 80, 22323–22346 (2020)

    Article  Google Scholar 

  21. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  22. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)

  23. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. CoRR abs/1804.02767 (2018)

    Google Scholar 

  24. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  25. Rosano, M., Furnari, A., Farinella, G.M.: A comparison of visual navigation approaches based on localization and reinforcement learning in virtual and real environments. In: International Conference on Computer Vision Theory and Applications (VISAPP) (2020)

    Google Scholar 

  26. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., Lecun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (ICLR2014) (2014)

    Google Scholar 

  27. Sutera, G., Guastella, D.C., Muscato, G.: A lightweight magnetic gripper for an aerial delivery vehicle: design and applications. ACTA IMEKO 10(3), 61–65 (2021)

    Article  Google Scholar 

  28. Tan, Z., Nie, X., Qian, Q., Li, N., Li, H.: Learning to rank proposals for object detection. In: The IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  29. Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., Birchfield, S.: Deep object pose estimation for semantic robotic grasping of household objects. In: Conference on Robot Learning (CoRL) (2018)

    Google Scholar 

  30. Wang, C., et al.: Densefusion: 6d object pose estimation by iterative dense fusion (2019)

    Google Scholar 

  31. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  32. Wang, T., Anwer, R.M., Cholakkal, H., Khan, F.S., Pang, Y., Shao, L.: Learning rich features at high-speed for single-shot object detection. In: The IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  33. Wu, Y., He, K.: Group normalization. In: The European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  34. Zakharov, S., Shugurov, I., Ilic, S.: Dpod: 6d pose object detector and refiner. In: The IEEE International Conference on Computer Vision (ICCV), October 2019

    Google Scholar 

  35. Zhu, R., et al.: Scratchdet: training single-shot object detectors from scratch. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2268–2277 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Guarnera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Battiato, S. et al. (2023). The UNICT-TEAM Vision Modules for the Mohamed Bin Zayed International Robotics Challenge 2020. In: Gupta, D., Bhurchandi, K., Murala, S., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2022. Communications in Computer and Information Science, vol 1776. Springer, Cham. https://doi.org/10.1007/978-3-031-31407-0_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31407-0_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31406-3

  • Online ISBN: 978-3-031-31407-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics