Automatic Generation of 3D Animations from Text and Images

Cannavò, Alberto; Gatteschi, Valentina; Macis, Luca; Lamberti, Fabrizio

doi:10.1007/978-3-031-15546-8_6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13445))

Included in the following conference series:

International Conference on Extended Reality

1317 Accesses

Abstract

The understanding of information in a text description can be improved by visually accompanying it with images or videos. This opportunity is particularly relevant for books and other traditional instructional material. Videos or, more in general, (interactive) graphics contents, can help to increase the effectiveness of this material, by providing, e.g., an animated representation of the steps to be performed to carry out a given procedure. The generation of 3D animated contents, however, is still very labor-intensive and time-consuming. Systems able to speed up this process offering flexible and easy-to-use interfaces are becoming of paramount importance. Hence, this paper describes a system designed to automatically generate a computer graphics video by processing a text description and a set of associated images. The system combines Natural Language Processing and image analysis for extracting information needed to visually represent the procedure depicted in an instruction manual using 3D animations. It relies on a database of 3D models and preconfigured animations that are activated according to the information extracted from the said input. Moreover, by analyzing the images, the system can also generate new animations from scratch. Promising results have been obtained assessing the system performance in a specific use case focused on printers maintenance.

This work was developed in the frame of the VR@POLITO initiative. The research was supported by PON “Ricerca e Innovazione” 2014-2020 – DM 1062/2021 funds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
NeuralCoref: https://github.com/huggingface/neuralcoref.
2.
spaCy: https://spacy.io/.
3.
Scene Graph Parser: http://tiny.cc/dnqpuz.
4.
Mask R-CNN: https://github.com/matterport/Mask_RCNN.
5.
WordNet: https://wordnet.princeton.edu/.
6.
VGG Image Annotator: http://tiny.cc/fnqpuz.
7.
Canon PIXMA-MX495 manual: https://bit.ly/3hxeohx.
8.
Epson WF-7010 manual: https://bit.ly/3pxLNNu.
9.
HP Deskjet 3000 manual: https://bit.ly/3hw1nVv.
10.
Video generated in the experiments: http://tiny.cc/iiopuz.

References

Ali, G., Lee, M., Hwang, J.I.: Automatic text-to-gesture rule generation for embodied conversational agents. Comput. Anim. Virtual Worlds 31(4–5), e1944 (2020)
Google Scholar
Armando, A., Pecchiari, P.: NALIG: a CAD system for interior design with high level interaction capabilities. In: Proceedings of the IEEE Conference on Tools with AI, pp. 446–447 (1993)
Google Scholar
Badler, N.I., Bindiganavale, R., Allbeck, J.: Parameterized Action Representation for Virtual Embodied Conversational Agents. MIT Press, Cambridge (2000)
Google Scholar
Cannavò, A., et al.: An automatic 3D scene generation pipeline based on a single 2D image. In: De Paolis, L.T., Arpaia, P., Bourdot, P. (eds.) AVR 2021. LNCS, vol. 12980, pp. 109–117. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87595-4_9
Chapter Google Scholar
Cannavò, A., Lamberti, F.: A virtual character posing system based on reconfigurable tangible user interfaces and immersive virtual reality. In: Proceedings of the Conference on Smart Tools and Applications in Graphics, pp. 1–11 (2018)
Google Scholar
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Article Google Scholar
Chang, A.X., Eric, M., Savva, M., Manning, C.D.: SceneSeer: 3D scene design with natural language. arXiv preprint arXiv:1703.00050 (2017)
Chen, C.Y., Wong, S.K., Liu, W.Y.: Generation of small groups with rich behaviors from natural language interface. Comput. Anim. Virtual Worlds 31(4–5), e1960 (2020)
Google Scholar
Coyne, B., Sproat, R.: WordsEye: an automatic text-to-scene conversion system. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 487–496 (2001)
Google Scholar
Denis, M., Logie, R., Cornoldo, C., de Vega, M., EngelKamp, J.: Imagery, Language and Visuo-spatial Thinking, vol. 1. Psychology Press, Hove (2012)
Book Google Scholar
Hanser, E., Mc Kevitt, P., Lunney, T., Condell, J.: SceneMaker: automatic visualisation of screenplays. In: Mertsching, B., Hund, M., Aziz, Z. (eds.) KI 2009. LNCS (LNAI), vol. 5803, pp. 265–272. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04617-9_34
Chapter Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the 4th Alvey Vision Conference, pp. 147–151 (1988)
Google Scholar
Hassani, K., Nahvi, A., Ahmadi, A.: Design and implementation of an intelligent virtual environment for improving speaking and listening skills. Interact. Learn. Environ. 24(1), 252–271 (2016)
Article Google Scholar
Johansson, R., Williams, D., Berglund, A., Nugues, P.: Carsim: A system to visualize written road accident reports as animated 3D scenes. In: Proceedings of the 2nd Workshop on Text Meaning and Interpretation, pp. 57–64 (2004)
Google Scholar
Liu, Z.Q., Leung, K.M.: Script visualization (ScriptViz): a smart system that makes writing fun. Soft Comput. 10(1), 34–40 (2006)
Article Google Scholar
Ma, M.: Automatic conversion of natural language to 3D animation. Ph.D. thesis, University of Ulster (2006)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Google Scholar
Mansor, N.R., et al.: A review survey on the use computer animation in education. IOP Conf. Ser. Mater. Sci. Eng. 917, 012021 (2020)
Article Google Scholar
Marti, M., et al.: Cardinal: computer assisted authoring of movie scripts. In: Proceedings of the 23rd International Conference on Intelligent User Interfaces, pp. 509–519 (2018)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Özdemir, S.: Supporting printed books with multimedia: a new way to use mobile technology for learning. Br. J. Educ. Technol. 41(6), E135–E138 (2010)
Article Google Scholar
Preim, B., Meuschke, M.: A survey of medical animations. Comput. Graph. 90, 145–168 (2020)
Article Google Scholar
Seversky, L.M., Yin, L.: Real-time automatic 3D scene generation from natural language voice and text descriptions. In: Proceedings of the 14th ACM international Conference on Multimedia, pp. 61–64 (2006)
Google Scholar
Shi, J., et al.: Good features to track. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994)
Google Scholar
Soogund, N.U.N., Joseph, M.H.: Signar: A sign language translator application with augmented reality using text and image recognition. In: Proceedings of the IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing, pp. 1–5 (2019)
Google Scholar
Wolfartsberger, J., Niedermayr, D.: Authoring-by-doing: animating work instructions for industrial virtual reality learning environments. In: Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces - Abstracts and Workshops, pp. 173–176 (2020)
Google Scholar
Yadav, P., Sathe, K., Chandak, M.: Generating animations from instructional text. Int. J. Adv. Trends Comput. Sci. Eng. 9(3), 3023–3027 (2020)
Article Google Scholar
Zhang, Y., Tsipidi, E., Schriber, S., Kapadia, M., Gross, M., Modi, A.: Generating animations from screenplays. arXiv preprint arXiv:1904.05440 (2019)
Zyda, M.: From visual simulation to virtual reality to games. Computer 38(9), 25–32 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi 24, Turin, Italy
Alberto Cannavò, Valentina Gatteschi, Luca Macis & Fabrizio Lamberti

Authors

Alberto Cannavò
View author publications
You can also search for this author in PubMed Google Scholar
Valentina Gatteschi
View author publications
You can also search for this author in PubMed Google Scholar
Luca Macis
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Lamberti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Cannavò .

Editor information

Editors and Affiliations

University of Salento, Lecce, Italy
Lucio Tommaso De Paolis
Università di Napoli Federico II, Naples, Italy
Pasquale Arpaia
CNR-STIIMA, Lecco, Italy
Marco Sacco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cannavò, A., Gatteschi, V., Macis, L., Lamberti, F. (2022). Automatic Generation of 3D Animations from Text and Images. In: De Paolis, L.T., Arpaia, P., Sacco, M. (eds) Extended Reality. XR Salento 2022. Lecture Notes in Computer Science, vol 13445. Springer, Cham. https://doi.org/10.1007/978-3-031-15546-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-15546-8_6
Published: 26 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15545-1
Online ISBN: 978-3-031-15546-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Generation of 3D Animations from Text and Images