Abstract
In this work, we propose the use of Multimodal Human-Computer Interfaces (MHCI) through body poses to command a drone in an easy and intuitive way. First, the human-user pose is recovered from a video stream, with the help of the open source library Open Pose. Then, a Support Vector Classifier (SVC), trained to distinguish between different body poses, is used to interpret eleven different human poses as the most important high level commands to the drone. The proposed strategy was successfully implemented to remotely control a drone through a web interface, where the user can interact with a drone in a remote location using only a web interface. Real-time experiments were carried out with fourteen different volunteers, selected to represent different segments of the population, with different ages, gender, experience with technology, socioeconomic class, etc., in order to evaluate the user experience with the help of a User Experience Questionnaire (UEQ), demonstrating satisfactory results. The study suggests that the use of the proposed MHCI received good acceptance between the participants, even in users without previous experience with drones, and received excellent scores in terms of attractiveness, stimulation and novelty from most of the volunteers.










Similar content being viewed by others
Availability of data and materials
Video available at https://youtu.be/aUho-uN1DzM.
Code availability
Not applicable.
References
Choudhury S, Solovey K, Kochenderfer MJ, Pavone M (2021). Efficient large-scale multi-drone delivery using transit networks. J Artif Int Res 70:757–788. https://doi.org/10.1613/jair.1.12450
Tzelepi M, Tefas A (2019) Graph embedded convolutional neural networks in human crowd detection for drone flight safety. IEEE Trans Emerging Top Comput Intell. https://doi.org/10.1109/TETCI.2019.2897815
Kim J, Kim S, Ju C, Son H (2019) Unmanned aerial vehicles in agriculture: a review of perspective of platform, control, and applications. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2932119
Dowling L, Poblete T, Hook I, Tang H, Tan Y, Glenn W, Unnithan RR (2018) Accurate indoor mapping using an autonomous unmanned aerial vehicle (UAV). CoRR arXiv:1808.01940
Gonçalves J, Renato H (2015) UAV photogrammetry for topographic monitoring of coastal areas. ISPRS J Photogram Remote Sens. https://doi.org/10.1016/j.isprsjprs.2015.02.009
Kaufmann E, Loquercio A, Ranftl R, Dosovitskiy A, Koltun V, Scaramuzza D (2018) Deep drone racing: learning agile flight in dynamic environments. CoRR arXiv:1806.08548
Oviatt S, Schuller B, Cohen PR, Sonntag D, Potamianos G, Krüger A (eds) (2017) The handbook of multimodal-multisensor interfaces: foundations, user modeling, and common modality combinations—volume 1, vol 14. Association for Computing Machinery and Morgan & Claypool
Karray F, Alemzadeh M, Abou Saleh J, Arab MN (2008) Human–computer interaction: overview on state of the art. Int J Smart Sens Intell Syst 1(1178–5608), 137–159. https://doi.org/10.21307/ijssis-2017-283
Vernier F, Nigay L (2001) A framework for the combination and characterization of output modalities. In: Palanque P, Paternò F (eds) Interactive systems design, specification, and verification. Springer, Berlin, Heidelberg, pp 35–50
Kaushik D, Jain R (2014) Natural user interfaces: trend in virtual interaction. Int J Latest Technol Eng Manag Appl Sci 3(4):141–143
Haddadin S, Suppa M, Fuchs S, Bodenmüller T, Albu-Schäffer A, Hirzinger G (2011) Towards the robotic co-worker. In: Robotics research. Springer, pp 261–282
Martínez-de Dios JR, Torres-González A, Paneque JL, Fuego-García D, Ramírez JRA, Ollero A (2018) Aerial robot coworkers for autonomous localization of missing tools in manufacturing plants. In: 2018 international conference on unmanned aircraft systems (ICUAS), Dallas, pp 1063–1069. https://doi.org/10.1109/ICUAS.2018.8453291
Jofré N, Rodríguez G, Alvarado Y, Fernández J, Guerrero R (2018) Natural user interfaces: a physical activity trainer. Springer, pp 122–131. https://doi.org/10.1007/978-3-319-75214-3_12
Rybarczyk Y, Cointe C, Gonçalves T, Minhoto V, Deters JK, Villarreal S, Gonzalvo AA, Baldeon J, Esparza D (2018) On the use of natural user interfaces in physical rehabilitation: a web-based application for patients with hip prosthesis. J Sci Technol Arts 10(2):15–24. https://doi.org/10.7559/citarj.v10i2.402
Sethu-Jones G, Bianchi-Berthouze N, Bielski R, Julier S (2010) Towards a situated, multimodal interface for multiple UAV control, pp 1739 – 1744. https://doi.org/10.1109/ROBOT.2010.5509960
Aliprantis J, Konstantakis M, Nikopoulou R, Mylonas P, Caridakis G (2019) Natural interaction in augmented reality context
Ju MH, Kang HB (2007) Human robot interaction using face pose recognition, pp 1–2. https://doi.org/10.1109/ICCE.2007.341353
Fernández RAS, Sanchez-Lopez JL, Sampedro C, Bavle H, Molina M, Campoy P (2016) Natural user interfaces for human-drone multi-modal interaction. In: 2016 international conference on unmanned aircraft systems (ICUAS), pp 1013–1022
Mercado-Ravell D, Castillo P, Lozano R (2019) Visual detection and tracking with UAVs, following a mobile object. Adv Robot 33:1–15. https://doi.org/10.1080/01691864.2019.1596834
Berra E, Cuautle R (2013) Interfaz natural para el control de drone mediante kinect natural interface control using drone kinect. J Cienc Ingen 5:53–65
Cai C, Yang S, Yan P, Tian J, Du L, Yang X (2019) Real-time human-posture recognition for human-drone interaction using monocular vision
Cao Z, Simon T, Wei SE, Sheikh Y (2016) Openpose: realtime multi-person 2d pose estimation using part affinity fields. In: IEEE transactions on pattern analysis and machine intelligence, vol. 43(1), pp. 172–186. https://doi.org/10.1109/TPAMI.2019.2929257
Yam-Viramontes BA, Mercado-Ravell D (2020) Implementation of a natural user interface to command a drone. In: 2020 international conference on unmanned aircraft systems (ICUAS), pp 1139–1144. https://doi.org/10.1109/ICUAS48674.2020.9213995
Schrepp M, Hinderks A, Thomaschewski J (2017) Construction of a benchmark for the user experience questionnaire (UEQ). Int J Interact Multimedia Artif Intell 4:40–44. https://doi.org/10.9781/ijimai.2017.445
Rogers Y, Sharp H, Preece J (2011) Interaction design: beyond human–computer interaction, 3rd edn. Wiley
Glonek G, Pietruszka M (2012) Natural user interfaces (NUI): review. J Appl Comput Sci 20:27–45
Chen X (2014) Xbox one natural user interface. http://www.xiaoji-chen.com/2014/xbox-one-natural-user-interface
Regazzoni D, Rizzi C, Vitali A (2018) Virtual reality applications: guidelines to design natural user interface. In: ASME 2018 international design engineering technical conferences and computers and information in engineering conference. American Society of Mechanical Engineers Digital Collection
Del Ra W (2011) Brave NUI world: designing natural user interfaces for touch and gesture by Daniel Wigdor and Dennis Wixon. SIGSOFT Softw. Eng. Notes 36(6):29–30. https://doi.org/10.1145/2047414.2047439
Perera A, Law YW, Chahl J (2019) Drone-action: an outdoor recorded drone video dataset for action recognition. Drones 3:82. https://doi.org/10.3390/drones3040082
Cauchard J, E J, Zhai K, Landay J (2015) Drone & me: an exploration into natural human-drone interaction, pp 361–365. https://doi.org/10.1145/2750858.2805823
Martin B, Hanington B (2012) Universal methods of design: 100 ways to research complex problems, develop innovative ideas, and design effective solutions. Rockport Publishers, Beverly
Sekii T (2018) Pose proposal networks. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer, pp 350–366
Güler RA, Neverova N, Kokkinos I (2018) Densepose: dense human pose estimation in the wild. CoRR arXiv:1802.00434
Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y (2019) Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/bf00994018
Powering the world’s robots. https://www.ros.org/
Ubuntu 18.04.5 lts (bionic beaver). https://releases.ubuntu.com/18.04/
Apache http server project. https://https.apache.org/
Ros bridge suite. http://wiki.ros.org/rosbridge/_suite
Marrs T (2017) JSON at work: practical data integration for the web. O’Reilly Media, Inc
The standard ros javascript library. http://wiki.ros.org/roslibjs
Websockets in javascript. https://developer.mozilla.org/en-US/docs/Web/API/WebSocket
Openpose ros library. https://github.com/firephinx/openpose_ros
Ros web video server library. http://wiki.ros.org/web/_video/_server
Fielding R, Gettys J, Mogul J, Frystyk H, Masinter L, Leach P, Berners-Lee T (1999) Hypertext transfer protocol—http/1.1
Lubbers P, Albers B, Salim F (2011) Working with audio and video. In: Pro HTML5 programming. Springer, pp 83–106
Berners-Lee T, Masinter L, McCahill M, et al. (1994) Uniform resource locators (url)
Hassenzahl M, Tractinsky N (2006) User experience—a research agenda. Behav Inf Technol 25(2):91–97
Väänänen-Vainio-Mattila K, Roto V, Hassenzahl M (2008) Towards practical user experience evaluation methods. In: Meaningful measures: valid useful user experience measurement (VUUM) pp 19–22
Schrepp M, Hinderks A, Thomaschewski J (2014) Applying the user experience questionnaire (UEQ) in different evaluation scenarios. In: International conference of design, user experience, and usability. Springer, pp 383–392
Funding
This work was supported by the Mexican National Council of Science and Technology CONACYT, and the FORDECyT project 296737 “Consorcio en Inteligencia Artificial”.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. First and fourth authors were in charge of the system implementation under the supervision of second and fifth authors. Third author developed the Support Vector Classifier to distinguish between the different body poses. First and Forth authors performed the experiments with human-users and evaluated the user’s experience. The paper writing was mainly done by First, Second, Third and Fifth authors. Finally, the Corresponding author was in charge of the whole project supervision. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yam-Viramontes, B., Cardona-Reyes, H., González-Trejo, J. et al. Commanding a drone through body poses, improving the user experience. J Multimodal User Interfaces 16, 357–369 (2022). https://doi.org/10.1007/s12193-022-00396-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-022-00396-0