skip to main content
10.1145/3372923.3404791acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
research-article

Text2SceneVR: Generating Hypertexts with VAnnotatoR as a Pre-processing Step for Text2Scene Systems

Published: 13 July 2020 Publication History

Abstract

The automatic generation of digital scenes from texts is a central task of computer science. This task requires a kind of text comprehension, the automation of which is tied to the availability of sufficiently large, diverse and deeply annotated data, which is freely available. This paper introduces Text2SceneVR, a system that addresses this bottleneck problem by allowing its users to create a sort of spatial hypertexts in Virtual Reality (VR). We describe Text2SceneVR's data model, its user interface and a number of problems related to the implicitness of natural language in the manifestation of spatial relations that Text2SceneVR aims to address while trying to remain language independent. Finally, we present a user study with which we evaluated Text2SceneVR.

References

[1]
Giuseppe Abrami and Alexander Mehler. 2018. A UIMA Database Interface for Managing NLP-related Text Annotations. In Proc. of LREC (LREC 2018). Miyazaki, Japan.
[2]
Giuseppe Abrami, Alexander Mehler, Andy Lücking, Elias Rieb, and Philipp Helfrich. 2019. Text Annotator: A flexible framework for semantic annotations. In Proc. of ISA-15 (Gothenburg, Sweden) (ISA-15).
[3]
Giuseppe Abrami, Alexander Mehler, and Christian Spiekermann. 2019. Graphbased Format for Modeling Multimodal Annotations in Virtual Reality by Means of VAnnotatoR. In Proc. of HCI 2019 (Orlando, Florida, USA) (HCII 2019), Constantine Stephanidis and Margherita Antona (Eds.). Springer International Publishing, Cham, 351--358.
[4]
Giuseppe Abrami, Manuel Stoeckel, and Alexander Mehler. 2020. TextAnnotator: A UIMA based tool for simultaneous and collaborative annotation of texts. In Proc. of LREC 2020 (Marseille, France) (LREC 2020).
[5]
Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual String Embeddings for Sequence Labeling. In Proc. of COLING 2018. 1638--1649.
[6]
Collin F Baker, Charles J Fillmore, and John B Lowe. 1998. The berkeley framenet project. In Proc. of COLING 98. ACL, 86--90.
[7]
B R Barricelli, A De Bonis, S Di Gaetano, and S Valtolina. 2018. Semiotic Framework for Virtual Reality Usability and UX Evaluation. In Proc. of GHItaly18.
[8]
B R Barricelli, D Gadia, A Rizzi, and D L R Marini. 2016. Semiotics of virtual reality as a communication process. Behav Inform Technol 35, 11 (2016), 879--896.
[9]
M Benbouriche, K Nolet, D Trottier, and P Renaud. 2014. Virtual Reality Applications in Forensic Psychiatry. In Proc. of VRIC '14 (Laval). ACM, New York, 7:1--7:4.
[10]
S. Benford, Ch Greenhalgh, and D. Lloyd. 1997. Crowded Collaborative Virtual Environments. In Proc. of CHI 1997 (Atlanta, Georgia, USA). ACM, New York, 59--66.
[11]
Mark Bernstein. 2011. Can We Talk about Spatial Hypertext. In Proc. of HT 11 (Eindhoven, The Netherlands) (HT '11). ACM, New York, NY, USA, 103--112.
[12]
S K Card, G G Robertson, and J D Mackinlay. 1991. The Information Visualizer, an Information Workspace. In Proc. of CHI 1991 (New Orleans, USA). ACM, New York, 181--186.
[13]
S K Card, G G Robertson, and W York. 1996. The WebBook and the Web Forager: An Information Workspace for the World-Wide Web. In Proc. of CHI 1996.
[14]
James R Challenger, Jaroslaw Cwiklik, Louis R Degenaro, Edward A Epstein, and Burn L Lewis. 2016. Distributed UIMA cluster computing (DUCC) facility. US Patent 9,396,031.
[15]
Angel X. Chang, Mihail Eric, Manolis Savva, and Christopher D Manning. 2017. SceneSeer: 3D scene design with natural language. arXiv preprint arXiv:1703.00050 (2017).
[16]
Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 2015. ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR]. Stanford University - Princeton University - Toyota Technological Institute at Chicago.
[17]
Angel X. Chang, Will Monroe, Manolis Savva, Christopher Potts, and Christopher D. Manning. 2015. Text to 3D Scene Generation with Rich Lexical Grounding. In Proc. of IJCNLP 15. ACL, Beijing, China, 53--62.
[18]
Angel X. Chang, Manolis Savva, and Christopher D. Manning. 2014. Interactive Learning of Spatial Knowledge for Text to 3D Scene Generation. In Proc. of ILLVI.
[19]
Angel X. Chang, Manolis Savva, and Christopher D Manning. 2014. Learning Spatial Knowledge for Text to 3D Scene Generation. In Proc. of EMNLP 14.
[20]
Bob Coyne and Richard Sproat. 2001. WordsEye: an automatic text-to-scene conversion system. In Proc. of SIGGRAPH 01. 487--496.
[21]
Robert Eric Coyne, Daniel Bauer, and Owen C Rambow. 2011. Vignet: Grounding language in graphics using frame semantics. (2011).
[22]
Katrin Dennerlein. 2009. Narratologie des Raumes. Vol. 22. Walter de Gruyter.
[23]
David Ferrucci, Adam Lally, Karin Verspoor, and Eric Nyberg. 2009. Unstructured Information Management Architecture (UIMA) Version 1.0. OASIS Standard. https://docs.oasis-open.org/uima/v1.0/uima-v1.0.html
[24]
Kraig Finstad. 2010. The usability metric for user experience. Interacting with Computers 22, 5 (2010), 323--327.
[25]
Luis Francisco-Revilla and Frank Shipman. 2005. Parsing and Interpreting Ambiguous Structures in Spatial Hypermedia. In Proc. of HT 05 (Salzburg, Austria) (HT '05). ACM, New York, NY, USA, 107--116.
[26]
Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Peters, Michael Schmitz, and Luke S. Zettlemoyer. 2017. AllenNLP: A Deep Semantic Natural Language Processing Platform. arXiv:arXiv:1803.07640
[27]
Tobias Glasmachers. 2017. Limits of end-to-end learning. arXiv preprint arXiv:1704.08305 (2017).
[28]
Rüdiger Gleim, Alexander Mehler, and Alexandra Ernst. 2012. SOA implementation of the eHumanities Desktop. In Proc. of the Workshop on Service-oriented Architectures (SOAs) for the Humanities: Solutions and Impacts, Digital Humanities 2012, Hamburg, Germany.
[29]
T. Götz and O. Suhre. 2004. Design and implementation of the UIMA Common Analysis System. IBM Systems Journal 43, 3 (2004), 476--489.
[30]
Michelle R Greene. 2013. Statistics of high-level scene context. Frontiers in psychology 4 (2013), 777. HT '20, July 13--15, 2020, Virtual Event, USA Abrami, Henlein, Kett, Mehler
[31]
Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, et al. 2014. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014).
[32]
Eva Hanser, Paul Mc Kevitt, Tom Lunney, and Joan Condell. 2009. SceneMaker: automatic visualisation of screenplays. In Proc. of AAAI 09. Springer, 265--272.
[33]
Eva Hanser, Paul Mc Kevitt, Tom Lunney, Joan Condell, and Minhua Ma. 2010. SceneMaker: multimodal visualisation of natural language film scripts. In Proc. of KES 2010. Springer, 430--439.
[34]
Kaveh Hassani and Won-Sook Lee. 2016. Visualizing natural language descriptions: A survey. ACM Computing Surveys (CSUR) 49, 1 (2016), 1--34.
[35]
Wahed Hemati, Tolga Uslu, and Alexander Mehler. 2016. TextImager: a Distributed UIMA-based System for NLP. In Proc. of COLING 2016 System Demonstrations (Osaka, Japan). Federated Conference on Computer Science and Information Systems.
[36]
Nancy Ide and James Pustejovsky. 2017. Handbook of linguistic annotation. Springer.
[37]
Nancy Ide and Keith Suderman. 2009. Bridging the Gaps: Interoperability for GrAF, GATE, and UIMA. In Proc. of LAW III. ACL, Suntec, Singapore, 27--34.
[38]
ISO. 2012. Language resource management - Semantic annotation framework (SemAF) - Part 1: Time and events (SemAF-Time, ISO-TimeML). Standard ISO/IEC TR 24617-1:2012. International Organization for Standardization, Geneva, CH. https://www.iso.org/standard/37331.html
[39]
ISO. 2014. Language resource management - Semantic annotation framework (SemAF) - Part 7: Spatial information (ISOspace). Standard ISO/IEC TR 24617- 7:2014. International Organization for Standardization, Geneva, CH. https:// www.iso.org/standard/60779.html
[40]
Hans Kamp. 1975. TwoTheories about Adjectives. In Formal Semantics of Natural Language, Edward L. Keenan (Ed.). Cambridge University Press, 123--155.
[41]
Attila Kett. 2020. text2City: Räumliche Visualisierung textueller Strukturen. Bachelor Thesis., 61 pages. Goethe University of Frankfurt.
[42]
Attila Kett, Giuseppe Abrami, Alexander Mehler, and Christian Spiekermann. 2018. Resources2City Explorer: A System for Generating Interactive Walkable Virtual Cities out of File Systems. In Proc. of UIST 2018 (Berlin, Germany).
[43]
B M Kuehn. 2018. Virtual and augmented reality put a twist on medical education. JAMA 319, 8 (2018), 756--758.
[44]
Vincent Kühn, Giuseppe Abrami, and Alexander Mehler. 2020. WikiNectVR: A Gesture-based Approach for Interacting in Virtual Reality Based on WikiNect and Gestural Writing. In Proc. of HCII 2020 (Copenhagen, Denmark) (HCII 2020).
[45]
George Lakoff. 1987. Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. University of Chicago Press, Chicago.
[46]
Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. 2017. End-to-end neural coreference resolution. arXiv preprint arXiv:1707.07045 (2017).
[47]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.
[48]
Rui Ma, Akshay Gadi Patil, Matthew Fisher, Manyi Li, Sören Pirk, Binh-Son Hua, Sai-Kit Yeung, Xin Tong, Leonidas Guibas, and Hao Zhang. 2018. Languagedriven synthesis of 3D scenes from scene databases. In SIGGRAPH Asia 2018 Technical Papers. ACM, 212.
[49]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proc. of ACL System Demonstrations. 55--60.
[50]
D Marini, R Folgieri, D Gadia, and A Rizzi. 2012. Virtual reality as a communication process. Virtual Reality 16, 3 (2012), 233--241.
[51]
Catherine C. Marshall, Frank M. Shipman, and James H. Coombs. 1994. VIKI: Spatial Hypertext Supporting Emergent Structure. In Proc. of ECHT 94 (Edinburgh, Scotland) (ECHT '94). ACM, New York, NY, USA, 13--23.
[52]
Catherine C. Marshall and Frank M. Shipman III. 1995. Spatial hypertext: designing for change. Commun. ACM 38, 8 (1995), 88--97.
[53]
Catherine C. Marshall and Frank M. Shipman III. 1997. Spatial hypertext and the practice of information triage. In Proc. of HT 97. 124--133.
[54]
Alexander Mehler, Giuseppe Abrami, Steffen Bruendel, Lisa Felder, Thomas Ostertag, and Christian Spiekermann. 2017. Stolperwege: An App for a Digital Public History of the Holocaust. In Proc. of HT 17 (Prague, Czech Republic) (HT '17). ACM, New York, NY, USA, 319--320. https://doi.org/10.1145/3078714.3078748
[55]
Alexander Mehler, Giuseppe Abrami, Christian Spiekermann, and Matthias Jostock. 2018. VAnnotatoR: A Framework for Generating Multimodal Hypertexts. In Proc. HT 2018 (Baltimore, Maryland). ACM, New York, NY, USA.
[56]
Alexander Mehler, Benno Wagner, and Rüdiger Gleim. 2016. Wikidition: Towards A Multi-layer Network Model of Intertextuality. In Proc. of DH 2016 (Kraków) (DH 2016). http://dh2016.adho.org/abstracts/250
[57]
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39--41.
[58]
Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Subarna Tripathi, Leonidas J Guibas, and Hao Su. 2019. Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In Proc. of CVPR 2019. 909--918.
[59]
C A Naranjo, J S Ortiz, V M Álvarez, J S Sánchez, V M Tamayo, F A Acosta, L E Proaño, and V H Andaluz. 2017. Teaching Process for Children with Autism in Virtual Reality Environments. In Proc. of ICETC 17 (Barcelona, Spain). ACM, New York, 41--45.
[60]
C Nguyen, S DiVerdi, A Hertzmann, and F Liu. 2017. Vremiere: In-Headset Virtual Reality Video Editing. In Proc. of CHI 17 (Denver). ACM, New York, 5428-- 5438.
[61]
R Oberhauser and C Lecon. 2017. Virtual Reality Flythrough of Program Code Structures. In Proc. of VRIC 17. ACM, New York, 10:1--10:4.
[62]
James Pustejovsky and Nikhil Krishnaswamy. 2016. VoxML: A visualization modeling language. arXiv preprint arXiv:1610.01508 (2016).
[63]
James Pustejovsky, Jessica L Moszkowicz, and Marc Verhagen. 2011. ISO-Space: The annotation of spatial information in language. In Proc. of SIGSEM, Vol. 6. 1--9.
[64]
James Pustejovsky, Jessica L Moszkowicz, and Marc Verhagen. 2011. Using ISOSpace for annotating spatial information. In Proc. of the International Conference on Spatial Information Theory.
[65]
Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 2020. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. arXiv:2003.07082 [cs.CL]
[66]
Daniel Roßner, Claus Atzenbeck, and Tom Gross. 2019. Visualization of the Relevance: Using Physics Simulations for Encoding Context. In Proc. of HT 19 (Hof, Germany) (HT '19). ACM, New York, NY, USA, 67--76.
[67]
Jessica Rubart. 2019. On Managing Spatial Hypermedia with Document Stores. In Proc. of HUMAN 19 (Hof, Germany) (HUMAN '19). ACM, New York, NY, USA, 13--18.
[68]
Marie-Laure Ryan. 2012. Space. Hühn, Peter et al. (eds.): the living handbook of narratology (2012). http://www.lhn.uni-hamburg.de/article/space view date:12 Feb 2019.
[69]
A Z Sampaio, D Rosario, A Gomes, and J Santos. 2013. Virtual reality applied on civil engineering education: Construction activity supported on interactive models. Int. Journal of Engineering Education 29, 6 (2013), 1331--1347.
[70]
Manolis Savva, Angel X. Chang, and Pat Hanrahan. 2015. Semantically-Enriched 3D Models for Common-sense Knowledge. CVPR 2015 Workshop on Functionality, Physics, Intentionality and Causality (2015).
[71]
Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. 2018. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. In Proceedings of ACL.
[72]
Frank M. Shipman III., Haowei Hsieh, Preetam Maloor, and J. Michael Moore. 2001. The visual knowledge builder: a second generation spatial hypertext. In Proc. of HT 01. 113--122.
[73]
Carlos Solís and Nour Ali. 2008. ShyWiki-A Spatial Hypertext Wiki. In Proc. WikiSym 08 (Porto, Portugal) (WikiSym '08). ACM, New York, NY, USA, Article 10, 5 pages.
[74]
Shuran Song, Fisher Yu, Andy Zeng, Angel X Chang, Manolis Savva, and Thomas Funkhouser. 2017. Semantic Scene Completion from a Single Depth Image. Proc. of CVPR 2017 (2017).
[75]
Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In AAAI Conference on Artificial Intelligence. 4444--4451.
[76]
Christian Spiekermann, Giuseppe Abrami, and Alexander Mehler. 2018. VAnnotatoR: a Gesture-driven Annotation Framework for Linguistic and Multimodal Annotation. In Proc. AREA 2018 (Miyazaki, Japan) (AREA).
[77]
Systap LLC. 2015. BlazeGraph. https://blazegraph.com/. Accessed: 2020-02-15.
[78]
Fuwen Tan, Song Feng, and Vicente Ordonez. 2019. Text2Scene: Generating Compositional Scenes from Textual Descriptions. In Proc. of CVPR 2019.
[79]
Manfred Thüring, Jörg M Haake, and Jörg Hannemann. 1991. What's Eliza doing in the Chinese room? Incoherent hyperdocuments-and how to avoid them. In Proc. HT 91. 161--177.
[80]
Morgan Ulinski, Bob Coyne, and Julia Hirschberg. 2019. SpatialNet: A Declarative Resource for Spatial Relations. In Proc. of SpLU and RoboNLP 2019. 61--70.
[81]
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proc. of CVPR 2015. 3156--3164.
[82]
Melissa Le-Hoa Võ, Sage EP Boettcher, and Dejan Draschkow. 2019. Reading scenes: How scene grammar guides attention and aids perception in real-world environments. Current opinion in psychology (2019).
[83]
K Wolf, M Funk, R Khalil, and P Knierim. 2017. Using Virtual Reality for Prototyping Interactive Architecture. In Proc. of MUM 17. ACM, New York, 457--464.
[84]
Song-Hai Zhang, Shao-Kui Zhang, Yuan Liang, and Peter Hall. 2019. A survey of 3D indoor scene synthesis. Computer Science and Technology 34, 3 (2019), 594--608.
[85]
C Lawrence Zitnick, Devi Parikh, and Lucy Vanderwende. 2013. Learning the visual interpretation of sentences. In Proc. of IVVC 2013. 1681--1688.

Cited By

View all
  • (2024)Survey of Annotations in Extended Reality SystemsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.328886930:8(5074-5096)Online publication date: Aug-2024
  • (2023)Va.Si.Li-Lab as a collaborative multi-user annotation tool in virtual reality and its potential fields of applicationProceedings of the 34th ACM Conference on Hypertext and Social Media10.1145/3603163.3609076(1-9)Online publication date: 4-Sep-2023
  • (2023)Generating Activity Snippets by Learning Human-Scene InteractionsACM Transactions on Graphics10.1145/359209642:4(1-15)Online publication date: 26-Jul-2023
  • Show More Cited By

Index Terms

  1. Text2SceneVR: Generating Hypertexts with VAnnotatoR as a Pre-processing Step for Text2Scene Systems

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        HT '20: Proceedings of the 31st ACM Conference on Hypertext and Social Media
        July 2020
        327 pages
        ISBN:9781450370981
        DOI:10.1145/3372923
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 13 July 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. 3d annotations
        2. spatial hypertext
        3. text2scene
        4. vannotator
        5. virtual reality

        Qualifiers

        • Research-article

        Conference

        HT '20
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 378 of 1,158 submissions, 33%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)37
        • Downloads (Last 6 weeks)4
        Reflects downloads up to 20 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Survey of Annotations in Extended Reality SystemsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.328886930:8(5074-5096)Online publication date: Aug-2024
        • (2023)Va.Si.Li-Lab as a collaborative multi-user annotation tool in virtual reality and its potential fields of applicationProceedings of the 34th ACM Conference on Hypertext and Social Media10.1145/3603163.3609076(1-9)Online publication date: 4-Sep-2023
        • (2023)Generating Activity Snippets by Learning Human-Scene InteractionsACM Transactions on Graphics10.1145/359209642:4(1-15)Online publication date: 26-Jul-2023
        • (2023)Empowering the Metaverse with Generative AI: Survey and Future Directions2023 IEEE 43rd International Conference on Distributed Computing Systems Workshops (ICDCSW)10.1109/ICDCSW60045.2023.00022(85-90)Online publication date: 18-Jul-2023
        • (2023)Exploring the Role of Mathematical Modelling in Automatic Scene Generation amidst Rapid Technological Advances2023 4th International Conference on Data Analytics for Business and Industry (ICDABI)10.1109/ICDABI60145.2023.10629356(391-397)Online publication date: 25-Oct-2023
        • (2023)Evaluating the usage of Text to3D scene generation methods in Game-Based Learning2023 24th International Conference on Control Systems and Computer Science (CSCS)10.1109/CSCS59211.2023.00105(633-640)Online publication date: May-2023
        • (2023)A Multimodal Data Model for Simulation-Based Learning with Va.Si.Li-LabDigital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management10.1007/978-3-031-35741-1_39(539-565)Online publication date: 9-Jul-2023
        • (2021)Write‐An‐Animation: High‐level Text‐based Animation Editing with Character‐Scene InteractionComputer Graphics Forum10.1111/cgf.1441540:7(217-228)Online publication date: 27-Nov-2021

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media