Work in Progress

Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses

Authors:

Swaroop Mishra,

Sophie Ying Su,

Chinmay Kulkarni,

Heng-Tze Cheng,

Ed ChiAuthors Info & Claims

CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

Article No.: 56, Pages 1 - 12

https://doi.org/10.1145/3613905.3651093

Published: 11 May 2024 Publication History

Abstract

Large language model (LLM) powered chatbots are primarily text-based today, and impose a large interactional cognitive load, especially for exploratory or sensemaking tasks such as planning a trip or learning about a new city. Because the interaction is textual, users have little scaffolding in the way of structure, informational “scent”, or ability to specify high-level preferences or goals. We introduce ExploreLLM that allows users to structure thoughts, help explore different options, navigate through the choices and recommendations, and to more easily steer models to generate more personalized responses. We conduct a user study and show that users find it helpful to use ExploreLLM for exploratory or planning tasks, because it provides a useful schema-like structure to the task, and guides users in planning. The study also suggests that users can more easily personalize responses with high-level preferences with ExploreLLM.

Supplemental Material

MP4 File

Talk Video

Transcript for: Talk Video

References

[1]

Saul Albert and Jan P De Ruiter. 2018. Repair: the interface between interaction and cognition. Topics in cognitive science 10, 2 (2018), 279–313.

[2]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

[3]

Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, and Denny Zhou. 2023. Large language models as tool makers. arXiv preprint arXiv:2305.17126 (2023).

[4]

Minsuk Chang, Stefania Druga, Alexander J Fiannaca, Pedro Vergani, Chinmay Kulkarni, Carrie J Cai, and Michael Terry. 2023. The Prompt Artists. In Proceedings of the 15th Conference on Creativity and Cognition. 75–87.

Digital Library

[5]

Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W Cohen. 2022. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588 (2022).

[6]

Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining. 797–806.

Digital Library

[7]

Motahhare Eslami, Aimee Rickman, Kristen Vaccaro, Amirhossein Aleyasen, Andy Vuong, Karrie Karahalios, Kevin Hamilton, and Christian Sandvig. 2015. " I always assumed that I wasn’t really that close to [her]" Reasoning about Invisible Algorithms in News Feeds. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 153–162.

Digital Library

[8]

Kristie Fisher, Scott Counts, and Aniket Kittur. 2012. Distributed sensemaking: improving sensemaking by leveraging the efforts of previous users. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 247–256.

Digital Library

[9]

Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. Pal: Program-aided language models. In International Conference on Machine Learning. PMLR, 10764–10799.

[10]

Andreas Gegenfurtner, Erno Lehtinen, Laura Helle, Markus Nivala, Erkki Svedström, and Roger Säljö. 2019. Learning to see like an expert: On the practices of professional vision and visual expertise. International Journal of Educational Research 98 (2019), 280–291.

[11]

James Hollan, Edwin Hutchins, and David Kirsh. 2000. Distributed cognition: toward a new foundation for human-computer interaction research. ACM Transactions on Computer-Human Interaction (TOCHI) 7, 2 (2000), 174–196.

Digital Library

[12]

Bernard J Jansen. 1998. The graphical user interface. ACM SIGCHI Bulletin 30, 2 (1998), 22–26.

Digital Library

[13]

Peiling Jiang, Jude Rayan, Steven P Dow, and Haijun Xia. 2023. Graphologue: Exploring Large Language Model Responses with Interactive Diagrams. arXiv preprint arXiv:2305.11473 (2023).

[14]

Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, and Ashish Sabharwal. 2022. Decomposed prompting: A modular approach for solving complex tasks. arXiv preprint arXiv:2210.02406 (2022).

[15]

Yoonsu Kim, Jueon Lee, Seoyoung Kim, Jaehyuk Park, and Juho Kim. 2023. Understanding Users’ Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level. arXiv preprint arXiv:2311.07434 (2023).

[16]

David Kirsh. 1995. Complementary strategies: Why we use our hands when we think. (1995).

[17]

Scott R Klemmer, Björn Hartmann, and Leila Takayama. 2006. How bodies matter: five themes for interaction design. In Proceedings of the 6th conference on Designing Interactive systems. 140–149.

Digital Library

[18]

Preethi Lahoti, Nicholas Blumm, Xiao Ma, Raghavendra Kotikalapudi, Sahitya Potluri, Qijun Tan, Hansa Srinivasan, Ben Packer, Ahmad Beirami, Alex Beutel, 2023. Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting. arXiv preprint arXiv:2310.16523 (2023).

[19]

Dor Ma’ayan, Wode Ni, Katherine Ye, Chinmay Kulkarni, and Joshua Sunshine. 2020. How domain experts create conceptual diagrams and implications for tool design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–14.

Digital Library

[20]

Paul P Maglio, Teenie Matlock, Dorth Raphaely, Brian Chernicky, and David Kirsh. 2020. Interactive skill in Scrabble. In Proceedings of the twenty-first annual conference of the cognitive science society. Psychology Press, 326–330.

[21]

Sandra P Marshall. 1995. Schemas in problem solving. Cambridge University Press.

[22]

Marvin Minsky. 1974. A framework for representing knowledge.

[23]

Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, 2022. LILA: A Unified Benchmark for Mathematical Reasoning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 5807–5832.

[24]

Swaroop Mishra, Daniel Khashabi, Chitta Baral, Yejin Choi, and Hannaneh Hajishirzi. 2022. Reframing Instructional Prompts to GPTk’s Language. In Findings of the Association for Computational Linguistics: ACL 2022. 589–612.

[25]

Swaroop Mishra, Daniel Khashabi, Chitta Baral, and Hannaneh Hajishirzi. 2022. Cross-Task Generalization via Natural Language Crowdsourcing Instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 3470–3487.

[26]

Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, 2022. Show Your Work: Scratchpads for Intermediate Computation with Language Models. In Deep Learning for Code Workshop.

[27]

Douglas W Oard and Jinmook Kim. 1998. Implicit feedback for recommender systems. (1998).

[28]

OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]

[29]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.

[30]

Pruthvi Patel, Swaroop Mishra, Mihir Parmar, and Chitta Baral. 2022. Is a Question Decomposition Unit All We Need?. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 4553–4569.

[31]

Sarah R Powell. 2011. Solving word problems using schemas: A review of the literature. Learning Disabilities Research & Practice 26, 2 (2011), 94–108.

[32]

Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A Smith, and Mike Lewis. 2022. Measuring and narrowing the compositionality gap in language models. arXiv preprint arXiv:2210.03350 (2022).

[33]

Swarnadeep Saha, Omer Levy, Asli Celikyilmaz, Mohit Bansal, Jason Weston, and Xian Li. 2023. Branch-Solve-Merge Improves Large Language Model Evaluation and Generation. arXiv preprint arXiv:2310.15123 (2023).

[34]

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 (2023).

[35]

Sangho Suh, Bryan Min, Srishti Palani, and Haijun Xia. 2023. Sensecape: Enabling Multilevel Exploration and Sensemaking with Large Language Models. arXiv preprint arXiv:2305.11483 (2023).

[36]

Masaki Suwa, Barbara Tversky, John Gero, and Terry Purcell. 2001. Seeing into sketches: Regrouping parts encourages new interpretations. In Visual and spatial reasoning in design. 207–219.

[37]

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. arxiv:2203.11171 [cs.CL]

[38]

Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. 2021. Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.

[39]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.

[40]

Terry Winograd. 1971. Procedures as a representation for data in a computer program for understanding natural language. (1971).

[41]

Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. 2023. Large language models as optimizers. arXiv preprint arXiv:2309.03409 (2023).

[42]

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601 (2023).

[43]

JD Zamfirescu-Pereira, Richmond Y Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–21.

Digital Library

[44]

Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, Heng-Tze Cheng, Ed H Chi, Quoc V Le, and Denny Zhou. 2023. Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models. arXiv preprint arXiv:2310.06117 (2023).

[45]

Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, 2022. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625 (2022).

[46]

Tamara Zubatiy, Niharika Mathur, Larry Heck, Kayci L Vickers, Agata Rozga, and Elizabeth D Mynatt. 2023. " I don’t know how to help with that"-Learning from Limitations of Modern Conversational Agent Systems in Caregiving Networks. Proceedings of the ACM on Human-Computer Interaction 7, CSCW2 (2023), 1–28.

Cited By

Nandy PAdalgeirsson SSinha AKraljic TCleron MShi LSingh AChaudhary AGanti AMelancon CZhang SRobishaw DCiurdar HSecor JRobertsen KClimer KLe MVenkatesan MChi PLi PMcDermott PShim ROnsan SVaishnav SGuamán S(2024)Bespoke: Using LLM agents to generate just-in-time interfaces by reasoning about user intentCompanion Proceedings of the 26th International Conference on Multimodal Interaction10.1145/3686215.3688372(78-81)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3686215.3688372

Index Terms

Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
  2. Interaction design
    1. Empirical studies in interaction design

Recommendations

Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data
CSCW

Large language models (LLMs) provide a new way to build chatbots by accepting natural language prompts. Yet, it is unclear how to design prompts to power chatbots to carry on naturalistic conversations while pursuing a given goal such as collecting self-...
User Engagement with Chatbots: A Discursive Psychology Approach
CUI '20: Proceedings of the 2nd Conference on Conversational User Interfaces

Conversational agents have transcended into multiple industries with increased ability for user engagement in intelligent conversation. Conversations with chatbots are different from interpersonal communication in terms of turn-taking, intentions, and ...
Understanding is a Two-Way Street: User-Initiated Repair on Agent Responses and Hearing in Conversational Interfaces
CSCW

Although methods for repairing prior turns in natural conversation are critical for enabling mutual understanding, or successful communication, these methods are seldom built into conversational user interfaces systematically. Chatbots and voice ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

May 2024

4761 pages

ISBN:9798400703317

DOI:10.1145/3613905

Editors:
Florian Floyd Mueller
Monash University
,
Penny Kyburz
The Australian National University
,
Julie R. Williamson
University of Glasgow
,
Corina Sas
Lancaster University

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2024

Check for updates

Author Tags

Qualifiers

Work in progress
Research
Refereed limited

Conference

CHI '24

Sponsor:

CHI '24: CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

HI, Honolulu, USA

Acceptance Rates

Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
1,163
Total Downloads

Downloads (Last 12 months)1,163
Downloads (Last 6 weeks)178

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nandy PAdalgeirsson SSinha AKraljic TCleron MShi LSingh AChaudhary AGanti AMelancon CZhang SRobishaw DCiurdar HSecor JRobertsen KClimer KLe MVenkatesan MChi PLi PMcDermott PShim ROnsan SVaishnav SGuamán S(2024)Bespoke: Using LLM agents to generate just-in-time interfaces by reasoning about user intentCompanion Proceedings of the 26th International Conference on Multimodal Interaction10.1145/3686215.3688372(78-81)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3686215.3688372

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Table of Conten