ABSTRACT
Enriching the script of a story with visual aids is an effective approach for promoting language learning and literacy development for young children and learners. In this paper, we propose a new system, that can generate short Arabic stories with generated images that accurately represent the story, scene and context of the given input. We use a text generation technique with a text-to-image synthesis network and minimize the human intervention. We build a corpus of Arabic stories with vocabulary and visualizations. The obtained results with various generative models to create text-image contents show the effectiveness of the proposed approach. The system can be used in education and assist the instructors to build stories on different domains. It can be used in distance learning to deliver online tutorials during COVID-19.
- M. Phillips, The Effects of Visual Vocabulary Strategies on Vocabulary Knowledge, Theses, Dissertations and Capstones, https://mds.marshall.edu/etd/987, 2016.Google Scholar
- D. Kurniati, D. Rukmini, M. Saleh, D. Anggani and L. Bharati, "How is Picture Mnemonic Implemented in Teaching English Vocabulary to Students with Intellectual Disability," in Proceedings of the 1st International Conference on Science, Health, Economics, Education and Technology (ICoSHEET 2019), 2020.Google Scholar
- D. S. Weisberg and E. J. Hopkins, "Preschoolers' extension and export of information from realistic and fantastical stories," Infant and Child Development, p. doi:10.1002/icd.2182, 2020.Google Scholar
- L. Yao, N. Peng, R. M. Weischedel, K. Knight, D. Zhao and R. Yan, "Plan-and-write: Towards better automatic storytelling," in Proceedings of the Thirty-Third AAAI Conference onArtificial Intelligence, pages 7378–7385., 2019.Google Scholar
- P. Tambwekar, M. Dhuliawala, L. J. Martin, A. Mehta, B. Harrison and M. O. Riedl, "Controllable neural story plot generation via reinforcement learning," 2019.Google Scholar
- A. I. Alhussain and A. M. Azmi, "Automatic story generation: A Survey of Approaches," Association for Computing Machinery, vol. 54, no. 5, pp. Article 103 (June 2021), 38 pages. DOI:https://doi.org/10.1145/3453156, 2021.Google ScholarDigital Library
- J. Zakraoui, M. Saleh, U. Asghar, J. M. AlJa'am and S. Al-Maadeed, "Generating Images from Arabic Story-Text using Scene Graph," in IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), pp. 469-475, Doha, Qatar, 2020.Google Scholar
- J. Zakraoui, M. Saleh, S. Al-Maadeed and J. M. Jaam, "Improving Text-to-Image Generation with Object Layout Guidance," Multimedia Tools Appl 80, 27423–27443, pp. https://doi.org/10.1007/s11042-021-11038-0, 2021.Google ScholarDigital Library
- Z. Li, X. Ding, T. Liu, J. E. Hu and B. Van Durme, "Guided Generation of Cause and Effect," Proceedings of the 29th International Joint Conference on Artificial Intelligence, Christian Bessiere (Ed.)., pp. 3629-3636, 2020.Google Scholar
- J. Guan, Y. Wang, Huang and Minlie, "Story ending generation with incremental encoding and commonsense knowledge.," in Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Vol. 33., pp. 6473-6480, 2019.Google Scholar
- S. Wang, G. Durrett and K. Erk, "Narrative Interpolation for Generating and Understanding Stories.," arXiv e-prints, 2020.Google Scholar
- A. Holtzman, J. Buys, L. Du, M. Forbes and Y. Choi, "The Curious Case of Neural Text Degeneration," in International Conference on Learning Representations, 2019.Google Scholar
- J. Guan, F. Huang, Z. Zhao, X. Zhu and M. Huang, "A knowledge-enhanced pretraining model for commonsense story generation," in Transactions of the Association for Computational Linguistics 8, 93-108, 2020.Google ScholarCross Ref
- J. Zakraoui, M. Saleh, Aljaam and M. Jihad, "Text-to-picture Tools, Systems and Approaches: A Survey," Journal of Multimedia Tools and Applications, Springer, vol. 78, pp. 22833-22859, 2019.Google ScholarDigital Library
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, "Generative adversarial nets," In Advances in Neural Information Processing Systems, p. pages 2672–2680, 2014.Google ScholarDigital Library
- H. Ravi, L. Wang, C. Muñiz, L. Sigal, N. D. Metaxas and M. Kapadia, "Show Me a Story: Towards Coherent Neural Story Illustration," CVPR, pp. 7613-7621, doi: 10.1109/CVPR.2018.00794, 2018.Google Scholar
- T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang and X. He, "Attngan: Fine-grained text to image generation with attentional generative adversarial networks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316-1324, 2018.Google Scholar
- L. Yitong, G. Zhe, S. Yelong, L. Jingjing, C. Yu, W. Yuexin, C. Lawrence, C. David and G. Jianfeng, "StoryGAN: A Sequential Conditional GAN for Story Visualization," CoRR, vol. abs/1812.02784, 2018.Google Scholar
- N. Akoury, S. Wang, J. Whiting, S. Hood, N. Peng and M. Iyyer, "STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation," Arxiv: 2010.01717v1, 2020.Google Scholar
- Y. Bengio, R. Ducharme, P. Vincent and C. Jauvin, "A neural probabilistic language model.," Journal of machine learning research, vol. 3 (Feb), pp. 1137-1155, 2003.Google ScholarDigital Library
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser and I. Polosukhin, "Attention is All you Need," in The 31st International Conference on Neural Information Processing, pp. 6000-6010, 2017.Google Scholar
- A. Radford, K. Narasimhan, T. Salimans and I. Sutskever, "Improving language understanding by generative pre-training," 2018.Google Scholar
- T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz and J. Brew, "Huggingface's transformers: State-of-the-art natural language processing," ArXiv, abs/1910.03771, 2019.Google Scholar
- A. Alabdulkarim, S. Li and X. Peng, "Automatic Story Generation: Challenges and Attempts," in Proceedings of the 3rd Workshop on Narrative Understanding, pages 72–83, 2021.Google Scholar
- A. Brock, J. Donahue and K. Simonyan, "Large scale GAN training for high fidelity natural image synthesis," CoRR, abs/1809.11096, 2018.Google Scholar
- A. e. a. Radford, "Learning transferable visual models from natural language supervision," arXiv:2103.00020, 2021.Google Scholar
- Z. Gangyan, L. Zhaohui and Z. Yuan, "PororoGAN: An Improved Story Visualization Model on Pororo-SV Dataset," in 3rd International Conference on Computer Science and Artificial Intelligence, Normal IL USA, 2019.Google Scholar
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI blog, 1(8), 9, 2019.Google Scholar
- W. Antoun, F. Baly and H. Hajj, "ARAGPT2: Pre-Trained Transformer for Arabic Language Generation," ArXiv:2012.15520v2, 2021.Google Scholar
- A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen and I. Sutskever, "Zero-Shot Text-to-Image Generation," ArXiv:2102.12092, 2021.Google Scholar
Index Terms
- A Generative Approach to Enrich Arabic Story Text with Visual Aids
Recommendations
Narrative abstraction model for story-oriented video
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on MultimediaTV program review services, especially drama review services, are one of the most popular video on demand services on the Web. In this paper, we propose a novel video abstraction model for a review service of story-oriented video such as dramas. In a ...
Character-Preserving Coherent Story Visualization
Computer Vision – ECCV 2020AbstractStory visualization aims at generating a sequence of images to narrate each sentence in a multi-sentence story. Different from video generation that focuses on maintaining the continuity of generated images (frames), story visualization emphasizes ...
A narrative-based abstraction framework for story-oriented video
This article proposes a novel video abstraction framework for online review services of story-oriented videos such as dramas. Among the many genres of TV programs, a drama is one of the most popularly watched on the Web. The abstracts generated by the ...
Comments