ABSTRACT
With the ongoing digitalization of complex systems, for example in manufacturing, domain experts’ detailed understanding of datasets is pivotal to effectively training machine learning (ML) models. This understanding obtained through their deep domain knowledge, enables domain experts to collaborate with method experts to identify deficiencies in datasets, such as biases or anomalies, and curate them. Such curated datasets build the foundation for training effective ML models, which are able to inform subsequent decision-making processes. However, understanding the increasingly large and complex datasets and systems they represent is challenging. Therefore, this doctoral thesis investigates methods to support domain experts in building a solid data understanding for complex datasets. Specifically, the thesis focuses on three key areas: conceptualizing data understanding, augmenting domain knowledge through VIS4ML systems to curate datasets, and providing contextual information for AI-assisted decision-making. Initial findings indicate that VIS4ML systems effectively support domain experts in understanding and contextualizing datasets, enabling them to curate datasets collaboratively. This understanding, particularly when enriched through contextual information, shows promise in enhancing AI-assisted decision-making.
- Yongsu Ahn, Yu-Ru Lin, Panpan Xu, and Zeng Dai. 2023. ESCAPE: Countering Systematic Errors from Machine’s Blind Spots via Interactive Visual Analysis. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–16.Google ScholarDigital Library
- Adriana Alvarado Garcia, Marisol Wong-Villacres, Milagros Miceli, Benjamín Hernández, and Christopher A Le Dantec. 2023. Mobilizing Social Media Data: Reflections of a Researcher Mediating between Data and Organization. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 866, 19 pages.Google ScholarDigital Library
- Christian Haertel, Matthias Pohl, Abdulrahman Nahhas, Daniel Staegemann, and Klaus Turowski. 2022. Toward a lifecycle for data science: a literature review of data science process models. PACIS 2022 Proceedings (2022).Google Scholar
- Patrick Hemmer, Monika Westphal, Max Schemmer, Sebastian Vetter, Michael Vössing, and Gerhard Satzger. 2023. Human-AI Collaboration: The Effect of AI Delegation on Human Task Performance and Task Satisfaction. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 453–463.Google ScholarDigital Library
- Joshua Holstein, Max Schemmer, Johannes Jakubik, Michael Vössing, and Gerhard Satzger. 2023. Sanitizing data for analysis: Designing systems for data understanding. Electronic Markets 33, 1 (2023), 52.Google ScholarCross Ref
- Joshua Holstein, Philipp Spitzer, Marieke Hoell, Michael Vössing, and Niklas Kühl. 2024. Understanding Data Understanding: A framework to navigate the Intricacies of Data Analytics. In Working Paper.Google Scholar
- Petra Isenberg, Niklas Elmqvist, Jean Scholtz, Daniel Cernea, Kwan-Liu Ma, and Hans Hagen. 2011. Collaborative visualization: Definition, challenges, and research agenda. Information Visualization 10, 4 (2011), 310–326.Google ScholarDigital Library
- Johannes Jakubik, Michael Vössing, Niklas Kühl, Jannis Walk, and Gerhard Satzger. 2023. Data-Centric Artificial Intelligence. arxiv:2212.11854Google Scholar
- Dongyu Liu, Sarah Alnegheimish, Alexandra Zytek, and Kalyan Veeramachaneni. 2022. MTV: Visual Analytics for Detecting, Investigating, and Annotating Anomalies in Multivariate Time Series. Proc. ACM Hum.-Comput. Interact. 6, CSCW1, Article 103 (apr 2022), 30 pages.Google ScholarDigital Library
- Thomas Ludwig, Christoph Kotthaus, and Volkmar Pipek. 2015. Should I Try Turning It Off and On Again?: Outlining HCI Challenges for Cyber-Physical Production Systems. International Journal of Information Systems for Crisis Response and Management (IJISCRAM) 7, 3 (2015), 55–68.Google ScholarDigital Library
- Marc Pinski, Martin Adam, and Alexander Benlian. 2023. AI Knowledge: Improving AI Delegation through Human Enablement. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–17.Google ScholarDigital Library
- Max Schemmer, Joshua Holstein, Niklas Bauer, Niklas Kühl, and Gerhard Satzger. 2023. Towards Meaningful Anomaly Detection: The Effect of Counterfactual Explanations on the Investigation of Anomalies in Multivariate Time Series. arXiv preprint arXiv:2302.03302 (2023).Google Scholar
- Max Schemmer, Niklas Kuehl, Carina Benz, Andrea Bartos, and Gerhard Satzger. 2023. Appropriate reliance on AI advice: Conceptualization and the effect of explanations. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 410–422.Google ScholarDigital Library
- Christoph Schröer, Felix Kruse, and Jorge Marx Gómez. 2021. A Systematic Literature Review on Applying CRISP-DM Process Model. Procedia Computer Science 181 (2021).Google Scholar
- Philipp Spitzer, Joshua Holstein, Patrick Hemmer, Michael Vössing, Niklas Kühl, Dominik Martin, and Gerhard Satzger. 2024. On the Effect of Contextual Information on Human Delegation Behavior in Human-AI collaboration. arxiv:2401.04729 [cs.HC]Google Scholar
- Philipp Spitzer, Joshua Holstein, Michael Vössing, and Niklas Kühl. 2023. On the Perception of Difficulty: Differences between Humans and AI. arXiv preprint arXiv:2304.09803 (2023).Google Scholar
- Hariharan Subramonyam and Jessica Hullman. 2023. Are We Closing the Loop Yet? Gaps in the Generalizability of VIS4ML Research. IEEE Transactions on Visualization and Computer Graphics (2023).Google ScholarDigital Library
- Junpeng Wang, Shixia Liu, and Wei Zhang. 2023. Visual Analytics For Machine Learning: A Data Perspective Survey. (7 2023). https://arxiv.org/abs/2307.07712v1Google Scholar
Index Terms
- Bridging Domain Expertise and AI through Data Understanding
Recommendations
Problem solving, domain expertise and learning: ground-truth performance results for math data corpus
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interactionProblem solving, domain expertise, and learning are analyzed for the Math Data Corpus, which involves multimodal data on collaborating student groups as they solve math problems together across sessions. Compared with non-expert students, domain experts ...
Comments