ABSTRACT
Faceted search systems enable users to filter results by selecting values along different dimensions or facets. Traditionally, facets have corresponded to properties of information items that are part of the document metadata. Recently, faceted search systems have begun to use machine learning to automatically associate documents with facet-values that are more subjective and abstract. Examples include search systems that support topic-based filtering of research articles, concept-based filtering of medical documents, and tag-based filtering of images. While machine learning can be used to infer facet-values when the collection is too large for manual annotation, machine-learned classifiers make mistakes. In such cases, it is desirable to have a scrutable system that explains why a filtered result is relevant to a facet-value. Such explanations are missing from current systems. In this paper, we investigate how explainability features can help users interpret results filtered using machine-learned facets. We consider two explainability features: (1) showing prediction confidence values and (2) highlighting rationale sentences that played an influential role in predicting a facet-value. We report on a crowdsourced study involving 200 participants. Participants were asked to scrutinize movie plot summaries predicted to satisfy multiple genres and indicate their agreement or disagreement with the system. Participants were exposed to four interface conditions. We found that both explainability features had a positive impact on participants' perceptions and performance. While both features helped, the sentence-highlighting feature played a more instrumental role in enabling participants to reject false positive cases. We discuss implications for designing tools to help users scrutinize automatically assigned facet-values.
Supplemental Material
- Krisztian Balog, Filip Radlinski, and Shushan Arakelyan. 2019. Transparent, scrutable and explainable user models for personalized recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 265--274. Google ScholarDigital Library
- David Bamman, Brendan O'Connor, and Noah A Smith. 2013. Learning latent personas of film characters. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 352--361.Google Scholar
- Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, and Eric Horvitz. 2019. Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 2429--2437.Google ScholarDigital Library
- Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the whole exceed its parts? the effect of ai explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--16. Google ScholarDigital Library
- John Brooke. 2013. SUS: a retrospective. Journal of usability studies 8, 2 (2013), 29--40. Google ScholarDigital Library
- Claudio Carpineto, Stanislaw Osi'ski, Giovanni Romano, and Dawid Weiss. 2009. A survey of web clustering engines. ACM Computing Surveys (CSUR) 41, 3 (2009), 1--38. Google ScholarDigital Library
- Samuel Carton, Qiaozhu Mei, and Paul Resnick. 2020. Feature-Based Explanations Don't Help People Detect Misclassifications of Online Toxicity. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 95--106.Google ScholarCross Ref
- Mon Chu Chen, John R. Anderson, and Myeong Ho Sohn. 2001. What Can a Mouse Cursor Tell Us More? Correlation of Eye/Mouse Movements on Web Browsing. In CHI '01 Extended Abstracts on Human Factors in Computing Systems. ACM, New York, NY, USA, 281--282. Google ScholarDigital Library
- Ian Covert, Scott Lundberg, and Su-In Lee. 2020. Feature Removal Is a Unifying Principle for Model Explanation Methods. arXiv preprint arXiv:2011.03623 (2020).Google Scholar
- Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017).Google Scholar
- Jody Condit Fagan. 2010. Usability studies of faceted browsing: A literature review. Information Technology and Libraries 29, 2 (2010), 58--66.Google ScholarCross Ref
- Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA). IEEE, 80--89.Google ScholarCross Ref
- Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. 2017. On calibration of modern neural networks. In International Conference on Machine Learning. PMLR, 1321--1330. Google ScholarDigital Library
- Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Human Mental Workload. Advances in Psychology, Vol. 52. North-Holland, 139--183.Google Scholar
- Marti Hearst. 2009. Integrating Navigation with Search. In Search User Interfaces. Cambridge University Press, Chapter 8.Google Scholar
- Marti A Hearst. 1995. Tilebars: Visualization of term distribution information in full text information access. In Proceedings of the SIGCHI conference on Human factors in computing systems. 59--66. Google ScholarDigital Library
- Orland Hoeber, Daniel Schroeder, and Michael Brooks. 2009. Real-world user evaluations of a visual and interactiveWeb search interface. In 2009 13th International Conference Information Visualisation. IEEE, 119--126. Google ScholarDigital Library
- Tom Hope, Jason Portenoy, Kishore Vasan, Jonathan Borchardt, Eric Horvitz, Daniel Weld, Marti Hearst, and Jevin West. 2020. SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search. In Proceedings of the 2020conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 135--143. https://doi.org/10.18653/v1/2020.emnlp-demos.18Google ScholarCross Ref
- Ece Kamar. 2016. Directions in Hybrid Intelligence: Complementing AI Systems with Human Intelligence.. In IJCAI. 4070--4073. Google ScholarDigital Library
- Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning. In Proceedings of the 2020cHI Conference on Human Factors in Computing Systems. 1--14. Google ScholarDigital Library
- Been Kim, Cynthia Rudin, and Julie A Shah. 2014. The bayesian case model: A generative approach for case-based reasoning and prototype classification. In Advances in neural information processing systems. 1952--1960. Google ScholarDigital Library
- Weize Kong and James Allan. 2014. Extending faceted search to the general web. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 839--848. Google ScholarDigital Library
- Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 29--38. Google ScholarDigital Library
- Benjamin CG Lee and Daniel S Weld. 2020. Newspaper Navigator: Open Faceted Search for 1.5 Million Images. In Adjunct Publication of the 33rd Annual ACM Symposium on User Interface Software and Technology. 120--122. Google ScholarDigital Library
- Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing Neural Predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 107--117.Google ScholarCross Ref
- Piyawat Lertvittayakumjorn and Francesca Toni. 2019. Human-grounded Evaluations of Explanation Methods for Text Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5198--5208.Google ScholarCross Ref
- Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30 (2017), 4765--4774. Google ScholarDigital Library
- Yuqing Mao and Zhiyong Lu. 2017. MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank. Journal of biomedical semantics 8, 1 (2017), 1--9.Google ScholarCross Ref
- Iain J Marshall, Joël Kuiper, and Byron C Wallace. 2016. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. Journal of the American Medical Informatics Association 23, 1 (2016), 193--201.Google ScholarCross Ref
- Siyu Mi and Jiepu Jiang. 2019. Understanding the Interpretability of Search Result Summaries. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 989--992. Google ScholarDigital Library
- Christoph Molnar. 2020. Interpretable machine learning.Google Scholar
- Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning. 625--632. Google ScholarDigital Library
- National Library of Medicine. 2017. Frequently Asked Questions about Indexing for MEDLINE. https://www.nlm.nih.gov/bsd/indexfaq.html. Accessed: 2021-05.Google Scholar
- Jerome Ramos and Carsten Eickhoff. 2020. Search Result Explanations Improve Efficiency and Trust. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1597--1600. Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144. Google ScholarDigital Library
- Axel J Soto, Piotr Przybya, and Sophia Ananiadou. 2019. Thalia: semantic search engine for biomedical abstracts. Bioinformatics 35, 10 (2019), 1799--1801.Google ScholarCross Ref
- Harini Suresh, Steven R Gomez, Kevin K Nam, and Arvind Satyanarayan. 2021. Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and their Needs. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--16. Google ScholarDigital Library
- Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 1, 1 (2020), 396--413.Google ScholarCross Ref
- Chih-Hsuan Wei, Alexis Allot, Robert Leaman, and Zhiyong Lu. 2019. PubTator central: automated concept annotation for biomedical full text articles. Nucleic acids research 47, W1 (2019), W587--W593.Google Scholar
- Chih-Hsuan Wei, Hung-Yu Kao, and Zhiyong Lu. 2013. PubTator: a web-based text mining tool for assisting biocuration. Nucleic acids research 41, W1 (2013), W518--W522.Google Scholar
- Honghan Wu, Giulia Toti, Katherine I Morley, Zina M Ibrahim, Amos Folarin, Richard Jackson, Ismail Kartoglu, Asha Agrawal, Clive Stringer, Darren Gale, et al. 2018. SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research. Journal of the American Medical Informatics Association 25, 5 (2018), 530--537.Google ScholarCross Ref
- Yongfeng Zhang, Xu Chen, et al. 2020. Explainable Recommendation: A Survey and New Perspectives. Foundations and Trends® in Information Retrieval 14, 1 (2020), 1--101.Google ScholarDigital Library
- Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of confidence and explanation on accuracy and trust calibration in ai-assisted decision making. In Proceedings of the 2020conference on Fairness, Accountability, and Transparency. 295--305. Google ScholarDigital Library
Index Terms
- A Study of Explainability Features to Scrutinize Faceted Filtering Results
Recommendations
Understanding Faceted Search from Data Science and Human Factor Perspectives
Faceted search has become a common feature on most search interfaces in e-commerce websites, digital libraries, government’s open information portals, and so on. Beyond the existing studies on developing algorithms for faceted search and empirical ...
A Study of Snippet Length and Informativeness: Behaviour, Performance and User Experience
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalThe design and presentation of a Search Engine Results Page (SERP) has been subject to much research. With many contemporary aspects of the SERP now under scrutiny, work still remains in investigating more traditional SERP components, such as the result ...
A Comparative Study of Query-biased and Non-redundant Snippets for Structured Search on Mobile Devices
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementTo investigate what kind of snippets are better suited for structured search on mobile devices, we built an experimental mobile search application and conducted a task-oriented interactive user study with 36 participants. Four different versions of a ...
Comments