skip to main content
10.1145/3613905.3650921acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Work in Progress

Chart What I Say: Exploring Cross-Modality Prompt Alignment in AI-Assisted Chart Authoring

Published: 11 May 2024 Publication History

Abstract

Recent chart-authoring systems, such as Amazon Q in QuickSight and Copilot for Power BI, demonstrate an emergent focus on supporting natural language input to share meaningful insights from data through chart creation. Currently, chart-authoring systems tend to integrate voice input capabilities by relying on speech-to-text transcription, processing spoken and typed input similarly. However, cross-modality input comparisons in other interaction domains suggest that the structure of spoken and typed-in interactions could notably differ, reflecting variations in user expectations based on interface affordances. Thus, in this work, we compare spoken and typed instructions for chart creation. Findings suggest that while both text and voice instructions cover chart elements and element organization, voice descriptions have a variety of command formats, element characteristics, and complex linguistic features. Based on these findings, we developed guidelines for designing voice-based authoring-oriented systems and additional features that can be incorporated into existing text-based systems to support speech modality.

Supplemental Material

MP4 File
Talk Video
Transcript for: Talk Video

References

[1]
Iyad Abu Doush, Enrico Pontelli, Tran Cao Son, Dominic Simon, and Ou Ma. 2010. Multimodal Presentation of Two-Dimensional Charts: An Investigation Using Open Office XML and Microsoft Excel. ACM Trans. Access. Comput. 3, 2, Article 8 (nov 2010), 50 pages. https://doi.org/10.1145/1857920.1857925
[2]
Sriram Karthik Badam, Arjun Srinivasan, and Niklas Elmqvist. 2017. Affordances of Input Modalities for Visual Data Exploration in Immersive Environments. https://api.semanticscholar.org/CorpusID:20980425
[3]
Nicholas J Belkin. 1980. Anomalous states of knowledge as a basis for information retrieval. Canadian journal of information science 5, 1 (1980), 133–143.
[4]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101.
[5]
Kenneth Cox, Rebecca Grinter, Stacie Hibino, Lalita Jagadeesan, and David Mantilla. 2001. A Multi-Modal Natural Language Interface to an Information Visualization Environment. International Journal of Speech Technology 4 (07 2001), 297–314. https://doi.org/10.1023/A:1011368926479
[6]
Weiwei Cui, Xiaoyu Zhang, Yun Wang, He Huang, B. Chen, Lei Fang, Haidong Zhang, Jian-Guang Lou, and Dongmei Zhang. 2020. Text-to-Viz: Automatic Generation of Infographics from Proportion-Related Natural Language Statements. IEEE Transactions on Visualization and Computer Graphics 26 (2020), 906–916.
[7]
Christin Engel, Emma Franziska Müller, and Gerhard Weber. 2019. SVGPlott: An Accessible Tool to Generate Highly Adaptable, Accessible Audio-Tactile Charts for and from Blind and Visually Impaired People. In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments (Rhodes, Greece) (PETRA ’19). Association for Computing Machinery, New York, NY, USA, 186–195. https://doi.org/10.1145/3316782.3316793
[8]
Gemini Team et al.2023. Gemini: A Family of Highly Capable Multimodal Models. arxiv:2312.11805 [cs.CL]
[9]
Percy Liang et al.2023. Holistic Evaluation of Language Models. arxiv:2211.09110 [cs.CL]
[10]
C Ailie Fraser, Julia M Markel, N James Basa, Mira Dontcheva, and Scott Klemmer. 2020. ReMap: Lowering the barrier to help-seeking with multimodal search. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 979–986.
[11]
Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G. Karahalios. 2015. DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (Charlotte, NC, USA) (UIST ’15). Association for Computing Machinery, New York, NY, USA, 489–500. https://doi.org/10.1145/2807442.2807478
[12]
Ido Guy. 2018. The Characteristics of Voice Search: Comparing Spoken with Typed-in Mobile Web Search Queries. ACM Trans. Inf. Syst. 36, 3, Article 30 (mar 2018), 28 pages. https://doi.org/10.1145/3182163
[13]
Enamul Hoque, Vidya Setlur, Melanie Tory, and Isaac Dykeman. 2018. Applying Pragmatics Principles for Interaction with Visual Analytics. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 309–318. https://doi.org/10.1109/TVCG.2017.2744684
[14]
Crescentia Jung, Shubham Mehta, Atharva Kulkarni, Yuhang Zhao, and Yea-Seul Kim. 2022. Communicating Visualizations without Visuals: Investigation of Visualization Alternative Text for People with Visual Impairments. IEEE Transactions on Visualization and Computer Graphics 28, 1 (2022), 1095–1105. https://doi.org/10.1109/TVCG.2021.3114846
[15]
Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The Power of Scale for Parameter-Efficient Prompt Tuning. arxiv:2104.08691 [cs.CL]
[16]
Susan Lin, Jeremy Warner, JD Zamfirescu-Pereira, Matthew G Lee, Sauhard Jain, Michael Xuelin Huang, Piyawat Lertvittayakumjorn, Shanqing Cai, Shumin Zhai, Björn Hartmann, 2024. Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation. arXiv preprint arXiv:2401.10838 (2024).
[17]
Can Liu, Yun Han, Ruike Jiang, and Xiaoru Yuan. 2021. ADVISor: Automatic Visualization Answer for Natural-Language Question on Tabular Data. 2021 IEEE 14th Pacific Visualization Symposium (PacificVis) (2021), 11–20.
[18]
Yuyu Luo, Jiawei Tang, and Guoliang Li. 2021. nvBench: A Large-Scale Synthesized Dataset for Cross-Domain Natural Language to Visualization Task. ArXiv abs/2112.12926 (2021).
[19]
Seyed Mahed Mousavi, Gabriel Roccabruna, Simone Alghisi, Massimo Rizzoli, Mirco Ravanelli, and Giuseppe Riccardi. 2024. Are LLMs Robust for Spoken Dialogues?arXiv e-prints (2024), arXiv–2401.
[20]
Shiri Melumad. 2023. Vocalizing search: How voice technologies alter consumer search processes and satisfaction. Journal of Consumer Research (2023), ucad009.
[21]
Arpit Narechania, Arjun Srinivasan, and John T. Stasko. 2021. NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries. IEEE Transactions on Visualization and Computer Graphics 27 (2021), 369–379.
[22]
OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]
[23]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
[24]
Md. Mahinur Rashid, Hasin Kawsar Jahan, Annysha Huzzat, Riyasaat Ahmed Rahul, Tamim Bin Zakir, Farhana Firoz Meem, Md. Saddam Hossain Mukta, and Swakkhar Shatabda. 2022. Text2Chart: A Multi-Staged Chart Generator from Natural Language Text. In PAKDD.
[25]
Vidya Setlur, Sarah E. Battersby, Melanie Tory, Rich Gossweiler, and Angel X. Chang. 2016. Eviza: A Natural Language Interface for Visual Analysis. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (Tokyo, Japan) (UIST ’16). Association for Computing Machinery, New York, NY, USA, 365–377. https://doi.org/10.1145/2984511.2984588
[26]
Vidya Setlur and Melanie Tory. 2022. How Do You Converse with an Analytical Chatbot? Revisiting Gricean Maxims for Designing Analytical Conversational Behavior. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 29, 17 pages. https://doi.org/10.1145/3491102.3501972
[27]
Arjun Srinivasan, Bongshin Lee, Nathalie Henry Riche, Steven M. Drucker, and Ken Hinckley. 2020. InChorus: Designing Consistent Multimodal Interactions for Data Visualization on Tablet Devices. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376782
[28]
Arjun Srinivasan, Nikhila Nyapathy, Bongshin Lee, Steven Mark Drucker, and John T. Stasko. 2021. Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (2021).
[29]
Arjun Srinivasan and Vidya Setlur. 2021. Snowy: Recommending Utterances for Conversational Visual Analysis. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 864–880. https://doi.org/10.1145/3472749.3474792
[30]
Statista. 2007. Statista - The Statistics Portal for Market Data, Market Research and Market Studies. https://www.statista.com. [Accessed: April 17, 2023].
[31]
Jiawei Tang, Yuyu Luo, Mourad Ouzzani, Guoliang Li, and Hongyang Chen. 2022. Sevi: Speech-to-Visualization through Neural Machine Translation. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 2353–2356. https://doi.org/10.1145/3514221.3520150
[32]
Philip Tucker and Dylan M. Jones. 1991. Voice as interface: An overview. International Journal of Human–Computer Interaction 3, 2 (1991), 145–170. https://doi.org/10.1080/10447319109526002 arXiv:https://doi.org/10.1080/10447319109526002
[33]
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2020. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. arxiv:1905.00537 [cs.CL]
[34]
Yun Wang, Zhitao Hou, Leixian Shen, Tongshuang Wu, Jiaqi Wang, He Huang, Haidong Zhang, and Dongmei Zhang. 2023. Towards Natural Language-Based Visualization Authoring. IEEE Transactions on Visualization and Computer Graphics 29, 1 (2023), 1222–1232. https://doi.org/10.1109/TVCG.2022.3209357

Index Terms

  1. Chart What I Say: Exploring Cross-Modality Prompt Alignment in AI-Assisted Chart Authoring

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems
    May 2024
    4761 pages
    ISBN:9798400703317
    DOI:10.1145/3613905
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 May 2024

    Check for updates

    Author Tags

    1. data visualization
    2. natural language corpus
    3. natural language interface
    4. visualization authoring
    5. visualization specification
    6. voice interface

    Qualifiers

    • Work in progress
    • Research
    • Refereed limited

    Conference

    CHI '24

    Acceptance Rates

    Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 253
      Total Downloads
    • Downloads (Last 12 months)253
    • Downloads (Last 6 weeks)67
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media