Multilingual speech control for ROS-driven robots

Hofer, Dominik P.; Strohmeier, Felix

doi:10.1007/s00502-019-00739-y

Multilingual speech control for ROS-driven robots

Multilinguale Sprachsteuerung für ROS-gesteuerte Roboter

Originalarbeit
Published: 16 October 2019

Volume 136, pages 334–340, (2019)
Cite this article

e & i Elektrotechnik und Informationstechnik Aims and scope Submit manuscript

667 Accesses
1 Citation
Explore all metrics

Abstract

To improve the collaboration between humans and robots, multilingual speech control (MLS) can be used to easily manage multiple robots at any time by spoken commands. Once a command is recognised by one of the corresponding ROS-driven robots inside the network, it will be executed and a related audio feedback is provided to the user. Our MLS implementation has a modular design, so that single functional modules can be implemented by either online cloud-based services or by local offline software for increased privacy. Furthermore, the extensible design allows to meet future user needs or to be adapted to different robot capabilities. The MLS follows a principal workflow: Initially, a language identification analysis is done, followed by speech-to-text transformation. Afterwards, the intent is detected and possible variables are analysed for the interpretation of the command, which is furthermore sent to the corresponding robot. Finally, the robot will publish the state achieved by the command execution back to the user. We integrated several cloud-services and open-source implementations based on artificial intelligence technologies and achieved a software framework that is used in a scenario with two different robot systems, a collaborative robot arm and an autonomously moving robot car.

Zusammenfassung

Sprachsteuerung mit automatischer Sprachenerkennung (Multilingual Speech Control – MLS) ist ein wesentliches Element zur natürlichen Zusammenarbeit zwischen Mensch und Robotern. Wird der jeweilige Roboter direkt mit Namen angesprochen, kann die Steuerung auf mehrere Geräte verteilt werden. Davor wird der Benutzer/die Benutzerin durch akustische Rückmeldung über Erfolg oder Misserfolg der Ausführung benachrichtigt. In diesem Paper beschreiben wir eine modular aufgebaute MLS-Implementierung. Die einzelnen Funktionsmodule können dabei entweder über Online-Dienste eingebunden werden, oder – für erhöhten Datenschutz – offline mit lokalen Ressourcen ausgeführt werden. Die Systemarchitektur wurde erweiterbar gestaltet, um zukünftigen Anforderungen, z.B. neuen Roboterfähigkeiten, gerecht zu werden. Die MLS folgt dabei grundsätzlich immer demselben Ablauf: Nach der Identifikation der verwendeten Sprache erfolgt zuerst die Transformation in geschriebenen Text (Speech-to-Text). Aus diesem Text wird nun versucht, den Zielroboter und die Absicht des Befehls zu erkennen. Außerdem werden etwaige variable Parameter extrahiert, interpretiert und dem Befehl übergeben. Erreicht der Befehl den Zielroboter, gibt dieser seinen erreichten Zustand durch die Sprachausgabe an den Benutzer zurück. Die Beispielimplementierung wurde mit Technologien der künstlichen Intelligenz umgesetzt und in einem Szenario mit einem kollaborativen Roboterarm einerseits und einem sich autonom bewegenden Roboterfahrzeug andererseits erfolgreich getestet. Das entstandene Software-Framework integriert dabei sowohl Cloud-Dienste als auch bestehende Open-Source-Implementierungen.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Language Modeling for Robots-Human Interaction

How can social robot use cases in healthcare be pushed - with an interoperable programming interface

Article Open access 11 July 2023

A User-Centric Design of Service Robots Speech Interface for the Elderly

Notes

http://wiki.ros.org [Accessed 2019 August 11].
https://www.franka.de [Accessed 2019 August 11].
https://clearpathrobotics.com/husky-unmanned-ground-vehicle-robot/ [Accessed 2019 August 11].
https://www.amazon.com/echo [Accessed 2019 August 11].
https://home.google.com [Accessed 2019 August 11].
http://www.voxforge.org [Accessed 2019 August 11].
https://www.youtube.com [Accessed 2019 August 11].
https://www.tensorflow.org [Accessed 2019 August 11].
https://pypi.org/project/SpeechRecognition/ [Accessed 2019 August 11].
https://cmusphinx.github.io [Accessed 2019 August 11].
https://cloud.google.com/speech-to-text/ [Accessed 2019 August 11].

References

Gundogdu, K., Bayrakdar, S., Yucedag, I. (2018): Developing and modeling of voice control system for prosthetic robot arm in medical systems. J. King Saud Univ, Comput. Inf. Sci., 30, 198–205.
Google Scholar
Chen, Y.-H., Song, K.-T. (2017): Voice control design of a mobile robot using shared-control approach. In IEEE international conference on systems, man, and cybernetics (SMC), Banff, Canada.
Google Scholar
Zhang, Y., Lu, Z., Wang, C., Liu, C., Wang, Y. (2018): Voice control dual arm robot based on ROS system. In IEEE international conference on intelligence and safety for robotics, Shenyang, China.
Google Scholar
Interaction Design (2018): How to design voice user interfaces. Interaction. Design, [Online]. Available: https://www.interaction-design.org/literature/article/how-to-design-voice-user-interfaces. Accessed 21 Dezember.
Open Source Robotics Foundation, ROS (2019): [Online]. Available: http://wiki.ros.org. Accessed 2019 Juli 9.
GmbH, F. E. (2017): Franka control interface (FCI). [Online]. Available: https://frankaemika.github.io/docs/index.html. Accessed 2019 Juli 9.
Google Scholar
Herold, T., Werkmeister, T. (2016): Practical applications of multimedia retrieval. 7 April 2016, [Online]. Available: https://github.com/twerkmeister/iLID/raw/2d74aae9e5863ca4640bae986830832d4ff80858/Deep. Accessed 2019 March 9.
Jurafsky, D., Martin, J. H. (2009): Speech and language processing, upper saddle river. New Jersey: Pearson Education, Inc. Available: https://web.stanford.edu/~jurafsky/slp3/. Accessed 2019 Juli 7.
Google Scholar
Sourceforge.net, CMU Sphinx (2019). [Online]. Available: https://sourceforge.net/projects/cmusphinx/. Accessed 2019 Juli 7.
CMU Sphinx, Open source speech recognition toolkit. 7 Juni 2017. [Online]. Available: https://cmusphinx.github.io. Accessed 2019 March 9.
eSpeak (1995): eSpeak text to speech. [Online]. Available: http://espeak.sourceforge.net. Accessed 2019 March 9.
DeepL (2019): DeepL translator API documentation. [Online]. Available: https://www.deepl.com/docs-api.html. Accessed 2019 July 9.

Download references

Acknowledgements

We would like to thank our project partners from the Digital Transfer Centre Salzburg (“DTZ” https://www.dtz-salzburg.at). DTZ is a collaboration by Fachhochschule Salzburg and Salzburg Research, funded by the regional government of Salzburg under the WISS2025 Knowledge Initiative.

Author information

Authors and Affiliations

Salzburg Research Forschungsgesellschaft mbH, Fachhochschule Salzburg GmbH, Techno-Z III, Jakob-Haringer-Straße 5, 5020, Salzburg, Österreich
Dominik P. Hofer
Salzburg Research Forschungsgesellschaft mbH, Techno-Z III, Jakob-Haringer-Straße 5, 5020, Salzburg, Österreich
Felix Strohmeier

Authors

Dominik P. Hofer
View author publications
You can also search for this author in PubMed Google Scholar
Felix Strohmeier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dominik P. Hofer.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hofer, D.P., Strohmeier, F. Multilingual speech control for ROS-driven robots. Elektrotech. Inftech. 136, 334–340 (2019). https://doi.org/10.1007/s00502-019-00739-y

Download citation

Received: 13 July 2019
Accepted: 26 September 2019
Published: 16 October 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s00502-019-00739-y

Keywords

Schlüsselwörter

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multilingual speech control for ROS-driven robots

Abstract

Zusammenfassung

Access this article

Similar content being viewed by others

Language Modeling for Robots-Human Interaction

How can social robot use cases in healthcare be pushed - with an interoperable programming interface

A User-Centric Design of Service Robots Speech Interface for the Elderly

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Schlüsselwörter

Navigation

Multilingual speech control for ROS-driven robots

Abstract

Zusammenfassung

Access this article

Similar content being viewed by others

Language Modeling for Robots-Human Interaction

How can social robot use cases in healthcare be pushed - with an interoperable programming interface

A User-Centric Design of Service Robots Speech Interface for the Elderly

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Schlüsselwörter

Search

Navigation