Skip to main content

LessonAble: Leveraging Deep Fakes in MOOC Content Creation

  • Conference paper
  • First Online:
Image Analysis and Processing – ICIAP 2022 (ICIAP 2022)

Abstract

This paper introduces LessonAble, a pipelined methodology leveraging the concept of Deep Fakes for generating MOOC (Massive Online Open Course) visual contents directly from a lesson narrative. To achieve this, the proposed pipeline consists of three main modules: audio generation, video generation and lip-syncing. In this work, we use the NVIDIA Tacotron2 Text-to-Speech model to generate custom speech from text, adapt the famous First Order Motion Model to generate the video sequence from different driving sequences and target images, and modify the Wav2Lip model to deal with lip-syncing. Moreover, we introduce some novel strategies to support the use of markdown-like formatting to guide the pipeline in the generation of expression aware (i.e. curious, happy, etc.) contents. Despite the use and adaptation of third parties modules, developing such a pipeline presented interesting challenges, all analysed and reported in this work. The result is an extremely intuitive tool to support MOOC content generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.marketsandmarkets.com/Market-Reports/massive-open-online-course-market-237288995.html.

  2. 2.

    https://github.com/priamus-lab/LessonAble_SDG.

References

  1. Bernard, M., Titeux, H.: Phonemizer: Text to phones transcription for multiple languages in python. J. Open Source Softw. 6(68), 3958 (2021). https://doi.org/10.21105/joss.03958, https://doi.org/10.21105/joss.03958

  2. Favaro, A., Sbattella, L., Tedesco, R., Scotti, V.: ITAcotron 2: transfering English speech synthesis architectures and speech features to Italian. In: Proceedings of The Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021), pp. 83–88. Association for Computational Linguistics, Trento, Italy, 12–13 Nov 2021. https://aclanthology.org/2021.icnlsp-1.10

  3. Fried, O., et al.: Text-based editing of talking-head video. CoRR abs/1906.01524 (2019). http://arxiv.org/abs/1906.01524

  4. Jamaludin, A., Chung, J.S., Zisserman, A.: You said that? synthesising talking faces from audio. Int. J. Comput. Vis. 127, December 2019. https://doi.org/10.1007/s11263-019-01150-y

  5. Nguyen, T.T., Nguyen, C.M., Nguyen, D.T., Nguyen, D.T., Nahavandi, S.: Deep learning for deepfakes creation and detection. CoRR abs/1909.11573 (2019). http://arxiv.org/abs/1909.11573

  6. Post, M.: A call for clarity in reporting BLEU scores. CoRR abs/1804.08771 (2018). http://arxiv.org/abs/1804.08771

  7. Prajwal, K.R., Mukhopadhyay, R., Namboodiri, V., Jawahar, C.V.: A lip sync expert is all you need for speech to lip generation in the wild. CoRR abs/2008.10010 (2020). https://arxiv.org/abs/2008.10010

  8. Prajwal, K.R., Mukhopadhyay, R., Philip, J., Jha, A., Namboodiri, V., Jawahar, C.V.: Towards automatic face-to-face translation. CoRR abs/2003.00418 (2020), https://arxiv.org/abs/2003.00418

  9. Reich, J.: Rebooting mooc research. Science 347(6217), 34–35 (2015). https://doi.org/10.1126/science.1261627. https://www.science.org/doi/abs/10.1126/science.1261627

  10. Shen, J., et al.: Natural TTS synthesis by conditioning wavenet on mel spectrogram predictions. CoRR abs/1712.05884 (2017). http://arxiv.org/abs/1712.05884

  11. Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., Sebe, N.: Animating arbitrary objects via deep motion transfer. CoRR abs/1812.08861 (2018). http://arxiv.org/abs/1812.08861

  12. Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., Sebe, N.: First order motion model for image animation. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/31c0b36aef265d9221af80872ceb62f9-Paper.pdf

  13. Thies, J., Elgharib, M., Tewari, A., Theobalt, C., Nießner, M.: Neural voice puppetry: Audio-driven facial reenactment. CoRR abs/1912.05566 (2019). http://arxiv.org/abs/1912.05566

  14. Wiles, O., Koepke, A.S., Zisserman, A.: X2face: A network for controlling face generation by using images, audio, and pose codes. CoRR abs/1807.10550 (2018). http://arxiv.org/abs/1807.10550

Download references

Acknowledgements

We acknowledge the CINECA award under the ISCRA initiatives, for the availability of high-performance computing resources and support within the projects IsC80_FEAD-D and IsC93_FEAD-DII. We also acknowledge the NVIDIA AI Technology Center, EMEA, for its support and access to computing resources, and the Federica Web Learning University center for providing professor Sansone’s videos.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michela Gravina .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sannino, C., Gravina, M., Marrone, S., Fiameni, G., Sansone, C. (2022). LessonAble: Leveraging Deep Fakes in MOOC Content Creation. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13231. Springer, Cham. https://doi.org/10.1007/978-3-031-06427-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06427-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06426-5

  • Online ISBN: 978-3-031-06427-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics