skip to main content
10.1145/3613905.3651093acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Work in Progress

Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses

Published: 11 May 2024 Publication History

Abstract

Large language model (LLM) powered chatbots are primarily text-based today, and impose a large interactional cognitive load, especially for exploratory or sensemaking tasks such as planning a trip or learning about a new city. Because the interaction is textual, users have little scaffolding in the way of structure, informational “scent”, or ability to specify high-level preferences or goals. We introduce ExploreLLM that allows users to structure thoughts, help explore different options, navigate through the choices and recommendations, and to more easily steer models to generate more personalized responses. We conduct a user study and show that users find it helpful to use ExploreLLM for exploratory or planning tasks, because it provides a useful schema-like structure to the task, and guides users in planning. The study also suggests that users can more easily personalize responses with high-level preferences with ExploreLLM.

Supplemental Material

MP4 File
Talk Video
Transcript for: Talk Video

References

[1]
Saul Albert and Jan P De Ruiter. 2018. Repair: the interface between interaction and cognition. Topics in cognitive science 10, 2 (2018), 279–313.
[2]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
[3]
Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, and Denny Zhou. 2023. Large language models as tool makers. arXiv preprint arXiv:2305.17126 (2023).
[4]
Minsuk Chang, Stefania Druga, Alexander J Fiannaca, Pedro Vergani, Chinmay Kulkarni, Carrie J Cai, and Michael Terry. 2023. The Prompt Artists. In Proceedings of the 15th Conference on Creativity and Cognition. 75–87.
[5]
Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W Cohen. 2022. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588 (2022).
[6]
Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining. 797–806.
[7]
Motahhare Eslami, Aimee Rickman, Kristen Vaccaro, Amirhossein Aleyasen, Andy Vuong, Karrie Karahalios, Kevin Hamilton, and Christian Sandvig. 2015. " I always assumed that I wasn’t really that close to [her]" Reasoning about Invisible Algorithms in News Feeds. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 153–162.
[8]
Kristie Fisher, Scott Counts, and Aniket Kittur. 2012. Distributed sensemaking: improving sensemaking by leveraging the efforts of previous users. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 247–256.
[9]
Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. Pal: Program-aided language models. In International Conference on Machine Learning. PMLR, 10764–10799.
[10]
Andreas Gegenfurtner, Erno Lehtinen, Laura Helle, Markus Nivala, Erkki Svedström, and Roger Säljö. 2019. Learning to see like an expert: On the practices of professional vision and visual expertise. International Journal of Educational Research 98 (2019), 280–291.
[11]
James Hollan, Edwin Hutchins, and David Kirsh. 2000. Distributed cognition: toward a new foundation for human-computer interaction research. ACM Transactions on Computer-Human Interaction (TOCHI) 7, 2 (2000), 174–196.
[12]
Bernard J Jansen. 1998. The graphical user interface. ACM SIGCHI Bulletin 30, 2 (1998), 22–26.
[13]
Peiling Jiang, Jude Rayan, Steven P Dow, and Haijun Xia. 2023. Graphologue: Exploring Large Language Model Responses with Interactive Diagrams. arXiv preprint arXiv:2305.11473 (2023).
[14]
Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, and Ashish Sabharwal. 2022. Decomposed prompting: A modular approach for solving complex tasks. arXiv preprint arXiv:2210.02406 (2022).
[15]
Yoonsu Kim, Jueon Lee, Seoyoung Kim, Jaehyuk Park, and Juho Kim. 2023. Understanding Users’ Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level. arXiv preprint arXiv:2311.07434 (2023).
[16]
David Kirsh. 1995. Complementary strategies: Why we use our hands when we think. (1995).
[17]
Scott R Klemmer, Björn Hartmann, and Leila Takayama. 2006. How bodies matter: five themes for interaction design. In Proceedings of the 6th conference on Designing Interactive systems. 140–149.
[18]
Preethi Lahoti, Nicholas Blumm, Xiao Ma, Raghavendra Kotikalapudi, Sahitya Potluri, Qijun Tan, Hansa Srinivasan, Ben Packer, Ahmad Beirami, Alex Beutel, 2023. Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting. arXiv preprint arXiv:2310.16523 (2023).
[19]
Dor Ma’ayan, Wode Ni, Katherine Ye, Chinmay Kulkarni, and Joshua Sunshine. 2020. How domain experts create conceptual diagrams and implications for tool design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–14.
[20]
Paul P Maglio, Teenie Matlock, Dorth Raphaely, Brian Chernicky, and David Kirsh. 2020. Interactive skill in Scrabble. In Proceedings of the twenty-first annual conference of the cognitive science society. Psychology Press, 326–330.
[21]
Sandra P Marshall. 1995. Schemas in problem solving. Cambridge University Press.
[22]
Marvin Minsky. 1974. A framework for representing knowledge.
[23]
Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, 2022. LILA: A Unified Benchmark for Mathematical Reasoning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 5807–5832.
[24]
Swaroop Mishra, Daniel Khashabi, Chitta Baral, Yejin Choi, and Hannaneh Hajishirzi. 2022. Reframing Instructional Prompts to GPTk’s Language. In Findings of the Association for Computational Linguistics: ACL 2022. 589–612.
[25]
Swaroop Mishra, Daniel Khashabi, Chitta Baral, and Hannaneh Hajishirzi. 2022. Cross-Task Generalization via Natural Language Crowdsourcing Instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 3470–3487.
[26]
Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, 2022. Show Your Work: Scratchpads for Intermediate Computation with Language Models. In Deep Learning for Code Workshop.
[27]
Douglas W Oard and Jinmook Kim. 1998. Implicit feedback for recommender systems. (1998).
[28]
OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]
[29]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
[30]
Pruthvi Patel, Swaroop Mishra, Mihir Parmar, and Chitta Baral. 2022. Is a Question Decomposition Unit All We Need?. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 4553–4569.
[31]
Sarah R Powell. 2011. Solving word problems using schemas: A review of the literature. Learning Disabilities Research & Practice 26, 2 (2011), 94–108.
[32]
Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A Smith, and Mike Lewis. 2022. Measuring and narrowing the compositionality gap in language models. arXiv preprint arXiv:2210.03350 (2022).
[33]
Swarnadeep Saha, Omer Levy, Asli Celikyilmaz, Mohit Bansal, Jason Weston, and Xian Li. 2023. Branch-Solve-Merge Improves Large Language Model Evaluation and Generation. arXiv preprint arXiv:2310.15123 (2023).
[34]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 (2023).
[35]
Sangho Suh, Bryan Min, Srishti Palani, and Haijun Xia. 2023. Sensecape: Enabling Multilevel Exploration and Sensemaking with Large Language Models. arXiv preprint arXiv:2305.11483 (2023).
[36]
Masaki Suwa, Barbara Tversky, John Gero, and Terry Purcell. 2001. Seeing into sketches: Regrouping parts encourages new interpretations. In Visual and spatial reasoning in design. 207–219.
[37]
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. arxiv:2203.11171 [cs.CL]
[38]
Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. 2021. Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.
[39]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
[40]
Terry Winograd. 1971. Procedures as a representation for data in a computer program for understanding natural language. (1971).
[41]
Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. 2023. Large language models as optimizers. arXiv preprint arXiv:2309.03409 (2023).
[42]
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601 (2023).
[43]
JD Zamfirescu-Pereira, Richmond Y Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–21.
[44]
Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, Heng-Tze Cheng, Ed H Chi, Quoc V Le, and Denny Zhou. 2023. Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models. arXiv preprint arXiv:2310.06117 (2023).
[45]
Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, 2022. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625 (2022).
[46]
Tamara Zubatiy, Niharika Mathur, Larry Heck, Kayci L Vickers, Agata Rozga, and Elizabeth D Mynatt. 2023. " I don’t know how to help with that"-Learning from Limitations of Modern Conversational Agent Systems in Caregiving Networks. Proceedings of the ACM on Human-Computer Interaction 7, CSCW2 (2023), 1–28.

Cited By

View all
  • (2024)Bespoke: Using LLM agents to generate just-in-time interfaces by reasoning about user intentCompanion Proceedings of the 26th International Conference on Multimodal Interaction10.1145/3686215.3688372(78-81)Online publication date: 4-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems
May 2024
4761 pages
ISBN:9798400703317
DOI:10.1145/3613905
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2024

Check for updates

Author Tags

  1. Artificial Intelligence
  2. Chatbots
  3. Graphical User Interfaces
  4. Interaction
  5. Large Language Models
  6. Learning from Instruction.
  7. Natural Language Interfaces
  8. Prompting
  9. Schema
  10. Task Decomposition

Qualifiers

  • Work in progress
  • Research
  • Refereed limited

Conference

CHI '24

Acceptance Rates

Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,163
  • Downloads (Last 6 weeks)178
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Bespoke: Using LLM agents to generate just-in-time interfaces by reasoning about user intentCompanion Proceedings of the 26th International Conference on Multimodal Interaction10.1145/3686215.3688372(78-81)Online publication date: 4-Nov-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media