poster

Communicating Design Intent Using Drawing and Text

Authors:

William P. McCarthy,

Justin Matejka,

Karl D.D. Willis,

Yewen PuAuthors Info & Claims

C&C '24: Proceedings of the 16th Conference on Creativity & Cognition

Pages 512 - 519

https://doi.org/10.1145/3635636.3664261

Published: 23 June 2024 Publication History

Abstract

Realizing a designer’s intent in software currently requires tedious manipulation of geometric primitives, such as points and curves. By contrast, designers routinely communicate more abstract design goals to one another using an efficient combination of natural language and drawings. What would it take to develop artificial systems that understand how humans naturally convey design intent, and thereby enable more seamless interactions between humans and machines throughout the design process? First, it is vital to establish benchmarks that showcase the full range of strategies that humans use to successfully communicate about design intent. Here we take initial steps towards that goal by conducting an online study in which pairs of human participants – a “Designer” and “Maker” – collaborated over multiple turns to recreate target designs. In each turn, Designers sent messages containing language, drawings, or both to the Maker, describing how to modify an existing design toward the target. We found a preference for communicating using drawings in early turns and observed several multimodal strategies for conveying design intent. By comparing how human Makers and GPT-4V carried out instructions, we identify a gap in human and machine understanding of multimodal instructions and suggest a path for bridging this gap.

References

[1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).

[2]

Panos Achlioptas, Ian Huang, Minhyuk Sung, Sergey Tulyakov, and Leonidas Guibas. 2023. ShapeTalk: A language dataset and framework for 3d shape edits and deformations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12685–12694.

[3]

James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jianfeng Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee, Yufei Guo, 2023. Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf 2, 3 (2023), 8.

[4]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

[5]

Hansheng Chen, Ruoxi Shi, Yulin Liu, Bokui Shen, Jiayuan Gu, Gordon Wetzstein, Hao Su, and Leonidas Guibas. 2024. Generic 3D Diffusion Adapter Using Controlled Multi-View Editing. arXiv preprint arXiv:2403.12032 (2024).

[6]

Judith E Fan, Wilma A Bainbridge, Rebecca Chamberlain, and Jeffrey D Wammes. 2023. Drawing as a versatile cognitive tool. Nature Reviews Psychology 2, 9 (2023), 556–568.

[7]

Judith E Fan, Robert D Hawkins, Mike Wu, and Noah D Goodman. 2020. Pragmatic inference and visual abstraction enable contextual flexibility during visual communication. Computational Brain & Behavior 3, 1 (2020), 86–101.

[8]

Yaroslav Ganin, Sergey Bartunov, Yujia Li, Ethan Keller, and Stefano Saliceti. 2021. Computer-aided design as language. Advances in Neural Information Processing Systems 34 (2021), 5885–5897.

[9]

Robert XD Hawkins, Mike Frank, and Noah D Goodman. 2017. Convention-formation in iterated reference games. In CogSci.

[10]

Holly Huey, Xuanchen Lu, Caren M Walker, and Judith E Fan. 2023. Visual explanations prioritize functional properties at the expense of visual fidelity. Cognition 236 (2023), 105414.

[11]

Bryan Lawson. 2006. How designers think. Routledge.

[12]

Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang, Linjie Li, Lijuan Wang, and Jianfeng Gao. 2023. Multimodal foundation models: From specialists to general-purpose assistants. arXiv preprint arXiv:2309.10020 1, 2 (2023), 2.

[13]

Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, 2022. Competition-level code generation with alphacode. Science 378, 6624 (2022), 1092–1097.

[14]

William P McCarthy, Robert D Hawkins, Haoliang Wang, Cameron Holdaway, and Judith E Fan. 2021. Learning to communicate about shared procedural abstractions. arXiv preprint arXiv:2107.00077 (2021).

[15]

Aditya Sanghi, Hang Chu, Joseph G Lambourne, Ye Wang, Chin-Yi Cheng, Marco Fumero, and Kamal Rahimi Malekshan. 2022. Clip-forge: Towards zero-shot text-to-shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18603–18613.

[16]

Ari Seff, Wenda Zhou, Nick Richardson, and Ryan P Adams. 2021. Vitruvion: A generative model of parametric cad sketches. arXiv preprint arXiv:2109.14124 (2021).

[17]

Anthony Williams and Robert Cowdroy. 2002. How designers communicate ideas to each other in design meetings. In DS 30: Proceedings of DESIGN 2002, the 7th International Design Conference, Dubrovnik.

[18]

Karl DD Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G Lambourne, Armando Solar-Lezama, and Wojciech Matusik. 2021. Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–24.

Digital Library

Recommendations

Towards methods for evaluating and communicating participatory design

Participatory Design (PD) has been proposed as a useful strategy to address pitfalls in the design of serious games for children with special needs. Nonetheless, methodological weaknesses in the analysis of the results of PD workshops may hinder its ...
Communicating Awareness and Intent in Autonomous Vehicle-Pedestrian Interaction
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

Drivers use nonverbal cues such as vehicle speed, eye gaze, and hand gestures to communicate awareness and intent to pedestrians. Conversely, in autonomous vehicles, drivers can be distracted or absent, leaving pedestrians to infer awareness and intent ...
Drawing in aphasia: moving towards the interactive
Special issue: Interactive graphical communication

This paper reviews the literature on the use of drawing to communicate by people whose language is restricted due to aphasia. The advantages of drawing over other forms of non-verbal communication for this population are detailed, followed by discussion ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

C&C '24: Proceedings of the 16th Conference on Creativity & Cognition

June 2024

718 pages

ISBN:9798400704857

DOI:10.1145/3635636

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2024

Check for updates

Qualifiers

Poster
Research
Refereed limited

Conference

C&C '24

Sponsor:

SIGCHI

C&C '24: Creativity and Cognition

June 23 - 26, 2024

IL, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 108 of 371 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
232
Total Downloads

Downloads (Last 12 months)232
Downloads (Last 6 weeks)26

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten