ABSTRACT
Data-centric NLP is a highly iterative process requiring careful exploration of text data throughout entire model development lifecycle. Unfortunately, existing data exploration tools are not suitable to support data-centric NLP because of workflow discontinuity and lack of support for unstructured text. In response, we propose Weedle, a seamless and customizable exploratory text analysis system for data-centric NLP. Weedle is equipped with built-in text transformation operations and a suite of visual analysis features. With its widget, users can compose customizable dashboards interactively and programmatically in computational notebooks.
Supplemental Material
- 2019. Twitter US Airline Sentiment. https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment.Google Scholar
- Sara Alspaugh, Nava Zokaei, Andrea Liu, Cindy Jin, and Marti A. Hearst. 2019. Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices. IEEE TVCG 25, 1 (2019), 22–31.Google Scholar
- Alex Bäuerle, Ángel Alexander Cabrera, Fred Hohman, Megan Maher, David Koski, Xavier Suau, Titus Barik, and Dominik Moritz. 2022. Symphony: Composing Interactive Interfaces for Machine Learning. In Proc. CHI 2022. Article 210, 14 pages.Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. NAACL 2019. 4171–4186.Google Scholar
- Peter Griggs, Cagatay Demiralp, and Sajjadur Rahman. 2021. Towards integrated, interactive, and extensible text data analytics with Leam. In Proc. DaSH 2021. 52–58.Google ScholarCross Ref
- John D. Hunter. 2007. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering 9, 3 (2007), 90–95.Google ScholarDigital Library
- Stratos Idreos, Olga Papaemmanouil, and Surajit Chaudhuri. 2015. Overview of Data Exploration Techniques. In Proc. SIGMOD 2015. 277–281.Google ScholarDigital Library
- Andrew Ng. 2021. MLOps: from model-centric to data-centric AI. https://www.deeplearning.ai/wp-content/uploads/2021/06/MLOps-From-Model-centric-to-Data-centricAI.pdf.Google Scholar
- Jinglin Peng, Weiyuan Wu, Brandon Lockhart, Song Bian, Jing Nathan Yan, Linghao Xu, Zhixuan Chi, Jeffrey M. Rzeszotarski, and Jiannan Wang. 2021. DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python. In Proc. SIGMOD 2021. 2271–2280.Google ScholarDigital Library
- Sajjadur Rahman and Eser Kandogan. 2022. Characterizing Practices, Limitations, and Opportunities Related to Text Information Extraction Workflows: A Human-in-the-Loop Perspective. In Proc. CHI 2022. Article 628, 15 pages.Google ScholarDigital Library
- Frederick Reiss, Hong Xu, Bryan Cutler, Karthik Muthuraman, and Zachary Eichenberger. 2020. Identifying Incorrect Labels in the CoNLL-2003 Corpus. In Proc. CoNLL 2020. 215–226.Google ScholarCross Ref
- Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and Explanation in Computational Notebooks. In Proc. CHI 2018. 1–12.Google ScholarDigital Library
- John W Tukey. 1977. Exploratory Data Analysis. Vol. 2. Reading, MA.Google Scholar
- Jacob VanderPlas, Brian Granger, Jeffrey Heer, Dominik Moritz, Kanit Wongsuphasawat, Arvind Satyanarayan, Eitan Lees, Ilia Timofeev, Ben Welsh, and Scott Sievert. 2018. Altair: Interactive Statistical Visualizations for Python. Journal of Open Source Software 3, 32 (2018), 1057.Google ScholarCross Ref
- Kanit Wongsuphasawat, Yang Liu, and Jeffrey Heer. 2019. Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study. arXiv1911.00568 (2019).Google Scholar
- Yifan Wu, Joseph M Hellerstein, and Arvind Satyanarayan. 2020. B2: Bridging Code and Interactive Visualization in Computational Notebooks. In Proc. UIST 2020. 152–165.Google ScholarDigital Library
- Ge Zhang, Mike A Merrill, Yang Liu, Jeffrey Heer, and Tim Althoff. 2022. CORAL: COde RepresentAtion learning with weakly-supervised transformers for analyzing data analysis. EPJ Data Science 11, 1 (2022), 14.Google ScholarCross Ref
Index Terms
- Weedle: Composable Dashboard for Data-Centric NLP in Computational Notebooks
Recommendations
EDAssistant: Supporting Exploratory Data Analysis in Computational Notebooks with In Situ Code Search and Recommendation
Using computational notebooks (e.g., Jupyter Notebook), data scientists rationalize their exploratory data analysis (EDA) based on their prior experience and external knowledge, such as online examples. For novices or data scientists who lack specific ...
Facilitating Dependency Exploration in Computational Notebooks
HILDA '23: Proceedings of the Workshop on Human-In-the-Loop Data AnalyticsComputational notebooks promote exploration by structuring code, output, and explanatory text, into cells. The input code and rich outputs help users iteratively investigate ideas as they explore or analyze data. The links between these cells--how the ...
ToonNote: Improving Communication in Computational Notebooks Using Interactive Data Comics
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsComputational notebooks help data analysts analyze and visualize datasets, and share analysis procedures and outputs. However, notebooks typically combine code (e.g., Python scripts), notes, and outputs (e.g., tables, graphs). The combination of ...
Comments