short-paper

Text-to-Metaverse: Towards a Digital Twin-Enabled Multimodal Conditional Generative Metaverse

Author:

Ahmed ElhagryAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 9336 - 9339

https://doi.org/10.1145/3581783.3613432

Published: 27 October 2023 Publication History

Abstract

Developing realistic and interactive virtual environments is a major hurdle in the progress of Metaverse. At present, majority of Metaverse applications necessitate the manual construction of 3D models which is both time-consuming and costly. Additionally, it is challenging to design environments that can promptly react to users' actions. To address this challenge, this paper proposes a novel approach to generate virtual worlds using digital twin (DT) technology and AI through a Text-to-Metaverse pipeline. This pipeline converts natural language input into a scene JSON, which is used to generate a 3D virtual world using two engines: Generative Script Engine (GSE) and Generative Metaverse Engine (GME). GME generates a design script from the JSON file, and then uses it to generate 3D objects in an environment. It aims to use multimodal AI and DT technology to produce realistic and highly detailed virtual environments. The proposed pipeline has potential applications including education, training, architecture, healthcare and entertainment, and could change the way designers and developers create virtual worlds. While this short paper covers an abstract as per the Doctorial Symposium's guidelines, it contributes to the research on generative models for multimodal data and provides a new direction for creating immersive virtual experiences.

References

[1]

Rajeswari Chengoden, Nancy Victor, Thien Huynh-The, Gokul Yenduri, Rutvij H Jhaveri, Mamoun Alazab, Sweta Bhattacharya, Pawan Hegde, Praveen Kumar Reddy Maddikunta, and Thippa Reddy Gadekallu. 2023. Metaverse for Healthcare: A Survey on Potential Applications, Challenges and Future Directions. IEEE Access (2023).

[2]

Rafael da Silva Mendonça, Sidney de Oliveira Lins, Iury Valente de Bessa, Florindo Antônio de Carvalho Ayres Jr, Renan Landau Paiva de Medeiros, and Vicente Ferreira de Lucena Jr. 2022. Digital twin applications: A survey of recent advances and challenges. Processes 10, 4 (2022), 744.

[3]

Abdulmotaleb El Saddik. 2018. Digital twins: The convergence of multimedia technologies. IEEE multimedia 25, 2 (2018), 87--92.

[4]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139--144.

Digital Library

[5]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401--4410.

[6]

Stefan Mihai, Mahnoor Yaqoob, Dang V Hung, William Davis, Praveer Towakel, Mohsin Raza, Mehmet Karamanoglu, Balbir Barn, Dattaprasad Shetve, Raja V Prasad, et al. 2022. Digital twins: a survey on enabling technologies, challenges, trends and future prospects. IEEE Communications Surveys & Tutorials (2022).

[7]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99--106.

Digital Library

[8]

Huansheng Ning, Hang Wang, Yujia Lin, Wenxi Wang, Sahraoui Dhelim, Fadi Farha, Jianguo Ding, and Mahmoud Daneshmand. 2021. A Survey on Metaverse: the State-of-the-art, Technologies, Applications, and Challenges. arXiv preprint arXiv:2111.09673 (2021).

[9]

Sang-Min Park and Young-Gab Kim. 2022. A metaverse: taxonomy, components, applications, and open challenges. IEEE access 10 (2022), 4209--4251.

[10]

Tingting Qiao, Jing Zhang, Duanqing Xu, and Dacheng Tao. 2019. Mirrorgan: Learning text-to-image generation by redescription. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1505--1514.

[11]

Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In International conference on machine learning. PMLR, 1060--1069.

[12]

Nicoletta Sala. 2021. Virtual reality, augmented reality, and mixed reality in education: A brief overview. Current and prospective applications of virtual reality in higher education (2021), 48--73.

[13]

Fuwen Tan, Song Feng, and Vicente Ordonez. 2019. Text2scene: Generating compositional scenes from textual descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6710--6719.

[14]

Fei Tao, Jiangfeng Cheng, Qinglin Qi, Meng Zhang, He Zhang, and Fangyuan Sui. 2018. Digital twin-driven product design, manufacturing and service with big data. The International Journal of Advanced Manufacturing Technology 94 (2018), 3563--3576.

[15]

Aamir Wali, Zareen Alamgir, Saira Karim, Ather Fawaz, Mubariz Barkat Ali, Muhammad Adan, and Malik Mujtaba. 2022. Generative adversarial networks for speech processing: A review. Computer Speech & Language 72 (2022), 101308.

Digital Library

[16]

Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016. Learning a probabilistic latent space of object shapes via 3d generative adversarial modeling. Advances in neural information processing systems 29 (2016).

[17]

Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, and Xiaodong He. 2018. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1316--1324.

[18]

Xiaoshan Yang, Tianzhu Zhang, and Changsheng Xu. 2018. Text2video: An endto- end learning framework for expressing text with videos. IEEE Transactions on Multimedia 20, 9 (2018), 2360--2370.

Digital Library

[19]

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. 2017. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision. 5907--5915.

[20]

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. 2018. Stackgan: Realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence 41, 8 (2018), 1947--1962.

[21]

Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A Efros, Oliver Wang, and Eli Shechtman. 2017. Toward multimodal image-to-image translation. Advances in neural information processing systems 30 (2017).

Cited By

Kaur HBhatia M(2025)Scientometric Analysis of Digital Twin in Industry 4.0IEEE Internet of Things Journal10.1109/JIOT.2024.345996512:2(1200-1221)Online publication date: 15-Jan-2025
https://doi.org/10.1109/JIOT.2024.3459965
Kaur HBhatia M(2025)Digital twins: A scientometric investigation into current progress and future directionsExpert Systems with Applications10.1016/j.eswa.2024.125917265(125917)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125917
Yang STsui YWang XAlhilal AHadi Mogavi RWang XHui PFarzan RLópez CCardoso Llach DQuercia DMustafa MNiu SWong-Villacrés M(2024)From Prompt to Metaverse: User Perceptions of Personalized Spaces Crafted by Generative AICompanion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing10.1145/3678884.3681897(497-504)Online publication date: 11-Nov-2024
https://dl.acm.org/doi/10.1145/3678884.3681897
Show More Cited By

Index Terms

Text-to-Metaverse: Towards a Digital Twin-Enabled Multimodal Conditional Generative Metaverse
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Natural language processing

Recommendations

AI for Immersive Metaverse Experience
CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

Metaverse has received a huge attention in recent times with several Big Techs having invested in this concept. Accenture defines the metaverse as “an evolution of the Internet that enables a user to move beyond ‘browsing’ to ‘inhabiting’ in a ...
Shared Reconstructed Environments in Extended Reality
SVR '24: Proceedings of the 26th Symposium on Virtual and Augmented Reality

Modern extended reality headsets can identify the geometry of the physical surrounding space, which can be shared and synchronized with remote users, allowing them to view the space through their own extended reality headsets from different points of ...
Tourist eXperience and Use of Virtual Reality, Augmented Reality and Metaverse: A Literature Review
Social Computing and Social Media
Abstract
Tourist experience (TX) is considered a specification of the customer experience directly associated with the tourism industry. Researchers agree that the tourist experience begins before the trip with preparations and extends during and after it, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
287
Total Downloads

Downloads (Last 12 months)159
Downloads (Last 6 weeks)17

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kaur HBhatia M(2025)Scientometric Analysis of Digital Twin in Industry 4.0IEEE Internet of Things Journal10.1109/JIOT.2024.345996512:2(1200-1221)Online publication date: 15-Jan-2025
https://doi.org/10.1109/JIOT.2024.3459965
Kaur HBhatia M(2025)Digital twins: A scientometric investigation into current progress and future directionsExpert Systems with Applications10.1016/j.eswa.2024.125917265(125917)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125917
Yang STsui YWang XAlhilal AHadi Mogavi RWang XHui PFarzan RLópez CCardoso Llach DQuercia DMustafa MNiu SWong-Villacrés M(2024)From Prompt to Metaverse: User Perceptions of Personalized Spaces Crafted by Generative AICompanion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing10.1145/3678884.3681897(497-504)Online publication date: 11-Nov-2024
https://dl.acm.org/doi/10.1145/3678884.3681897
Longdon SAdjaye ASosu DMartin JKwakye WOdoi ROsei GOkae PRichardson A(2024)Teaching in the Metaverse at the University of Ghana2024 IEEE 9th International Conference on Adaptive Science and Technology (ICAST)10.1109/ICAST61769.2024.10856481(1-7)Online publication date: 24-Oct-2024
https://doi.org/10.1109/ICAST61769.2024.10856481

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten