Skip to main content

Automatic Generation of 3D Animations from Text and Images

  • Conference paper
  • First Online:
Extended Reality (XR Salento 2022)

Abstract

The understanding of information in a text description can be improved by visually accompanying it with images or videos. This opportunity is particularly relevant for books and other traditional instructional material. Videos or, more in general, (interactive) graphics contents, can help to increase the effectiveness of this material, by providing, e.g., an animated representation of the steps to be performed to carry out a given procedure. The generation of 3D animated contents, however, is still very labor-intensive and time-consuming. Systems able to speed up this process offering flexible and easy-to-use interfaces are becoming of paramount importance. Hence, this paper describes a system designed to automatically generate a computer graphics video by processing a text description and a set of associated images. The system combines Natural Language Processing and image analysis for extracting information needed to visually represent the procedure depicted in an instruction manual using 3D animations. It relies on a database of 3D models and preconfigured animations that are activated according to the information extracted from the said input. Moreover, by analyzing the images, the system can also generate new animations from scratch. Promising results have been obtained assessing the system performance in a specific use case focused on printers maintenance.

This work was developed in the frame of the VR@POLITO initiative. The research was supported by PON “Ricerca e Innovazione” 2014-2020 – DM 1062/2021 funds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    NeuralCoref: https://github.com/huggingface/neuralcoref.

  2. 2.

    spaCy: https://spacy.io/.

  3. 3.

    Scene Graph Parser: http://tiny.cc/dnqpuz.

  4. 4.

    Mask R-CNN: https://github.com/matterport/Mask_RCNN.

  5. 5.

    WordNet: https://wordnet.princeton.edu/.

  6. 6.

    VGG Image Annotator: http://tiny.cc/fnqpuz.

  7. 7.

    Canon PIXMA-MX495 manual: https://bit.ly/3hxeohx.

  8. 8.

    Epson WF-7010 manual: https://bit.ly/3pxLNNu.

  9. 9.

    HP Deskjet 3000 manual: https://bit.ly/3hw1nVv.

  10. 10.

    Video generated in the experiments: http://tiny.cc/iiopuz.

References

  1. Ali, G., Lee, M., Hwang, J.I.: Automatic text-to-gesture rule generation for embodied conversational agents. Comput. Anim. Virtual Worlds 31(4–5), e1944 (2020)

    Google Scholar 

  2. Armando, A., Pecchiari, P.: NALIG: a CAD system for interior design with high level interaction capabilities. In: Proceedings of the IEEE Conference on Tools with AI, pp. 446–447 (1993)

    Google Scholar 

  3. Badler, N.I., Bindiganavale, R., Allbeck, J.: Parameterized Action Representation for Virtual Embodied Conversational Agents. MIT Press, Cambridge (2000)

    Google Scholar 

  4. Cannavò, A., et al.: An automatic 3D scene generation pipeline based on a single 2D image. In: De Paolis, L.T., Arpaia, P., Bourdot, P. (eds.) AVR 2021. LNCS, vol. 12980, pp. 109–117. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87595-4_9

    Chapter  Google Scholar 

  5. Cannavò, A., Lamberti, F.: A virtual character posing system based on reconfigurable tangible user interfaces and immersive virtual reality. In: Proceedings of the Conference on Smart Tools and Applications in Graphics, pp. 1–11 (2018)

    Google Scholar 

  6. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)

    Article  Google Scholar 

  7. Chang, A.X., Eric, M., Savva, M., Manning, C.D.: SceneSeer: 3D scene design with natural language. arXiv preprint arXiv:1703.00050 (2017)

  8. Chen, C.Y., Wong, S.K., Liu, W.Y.: Generation of small groups with rich behaviors from natural language interface. Comput. Anim. Virtual Worlds 31(4–5), e1960 (2020)

    Google Scholar 

  9. Coyne, B., Sproat, R.: WordsEye: an automatic text-to-scene conversion system. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 487–496 (2001)

    Google Scholar 

  10. Denis, M., Logie, R., Cornoldo, C., de Vega, M., EngelKamp, J.: Imagery, Language and Visuo-spatial Thinking, vol. 1. Psychology Press, Hove (2012)

    Book  Google Scholar 

  11. Hanser, E., Mc Kevitt, P., Lunney, T., Condell, J.: SceneMaker: automatic visualisation of screenplays. In: Mertsching, B., Hund, M., Aziz, Z. (eds.) KI 2009. LNCS (LNAI), vol. 5803, pp. 265–272. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04617-9_34

    Chapter  Google Scholar 

  12. Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the 4th Alvey Vision Conference, pp. 147–151 (1988)

    Google Scholar 

  13. Hassani, K., Nahvi, A., Ahmadi, A.: Design and implementation of an intelligent virtual environment for improving speaking and listening skills. Interact. Learn. Environ. 24(1), 252–271 (2016)

    Article  Google Scholar 

  14. Johansson, R., Williams, D., Berglund, A., Nugues, P.: Carsim: A system to visualize written road accident reports as animated 3D scenes. In: Proceedings of the 2nd Workshop on Text Meaning and Interpretation, pp. 57–64 (2004)

    Google Scholar 

  15. Liu, Z.Q., Leung, K.M.: Script visualization (ScriptViz): a smart system that makes writing fun. Soft Comput. 10(1), 34–40 (2006)

    Article  Google Scholar 

  16. Ma, M.: Automatic conversion of natural language to 3D animation. Ph.D. thesis, University of Ulster (2006)

    Google Scholar 

  17. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  18. Mansor, N.R., et al.: A review survey on the use computer animation in education. IOP Conf. Ser. Mater. Sci. Eng. 917, 012021 (2020)

    Article  Google Scholar 

  19. Marti, M., et al.: Cardinal: computer assisted authoring of movie scripts. In: Proceedings of the 23rd International Conference on Intelligent User Interfaces, pp. 509–519 (2018)

    Google Scholar 

  20. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  21. Özdemir, S.: Supporting printed books with multimedia: a new way to use mobile technology for learning. Br. J. Educ. Technol. 41(6), E135–E138 (2010)

    Article  Google Scholar 

  22. Preim, B., Meuschke, M.: A survey of medical animations. Comput. Graph. 90, 145–168 (2020)

    Article  Google Scholar 

  23. Seversky, L.M., Yin, L.: Real-time automatic 3D scene generation from natural language voice and text descriptions. In: Proceedings of the 14th ACM international Conference on Multimedia, pp. 61–64 (2006)

    Google Scholar 

  24. Shi, J., et al.: Good features to track. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994)

    Google Scholar 

  25. Soogund, N.U.N., Joseph, M.H.: Signar: A sign language translator application with augmented reality using text and image recognition. In: Proceedings of the IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing, pp. 1–5 (2019)

    Google Scholar 

  26. Wolfartsberger, J., Niedermayr, D.: Authoring-by-doing: animating work instructions for industrial virtual reality learning environments. In: Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces - Abstracts and Workshops, pp. 173–176 (2020)

    Google Scholar 

  27. Yadav, P., Sathe, K., Chandak, M.: Generating animations from instructional text. Int. J. Adv. Trends Comput. Sci. Eng. 9(3), 3023–3027 (2020)

    Article  Google Scholar 

  28. Zhang, Y., Tsipidi, E., Schriber, S., Kapadia, M., Gross, M., Modi, A.: Generating animations from screenplays. arXiv preprint arXiv:1904.05440 (2019)

  29. Zyda, M.: From visual simulation to virtual reality to games. Computer 38(9), 25–32 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Cannavò .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cannavò, A., Gatteschi, V., Macis, L., Lamberti, F. (2022). Automatic Generation of 3D Animations from Text and Images. In: De Paolis, L.T., Arpaia, P., Sacco, M. (eds) Extended Reality. XR Salento 2022. Lecture Notes in Computer Science, vol 13445. Springer, Cham. https://doi.org/10.1007/978-3-031-15546-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15546-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15545-1

  • Online ISBN: 978-3-031-15546-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics