skip to main content
10.1145/3459637.3482409acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

A Study of Explainability Features to Scrutinize Faceted Filtering Results

Authors Info & Claims
Published:30 October 2021Publication History

ABSTRACT

Faceted search systems enable users to filter results by selecting values along different dimensions or facets. Traditionally, facets have corresponded to properties of information items that are part of the document metadata. Recently, faceted search systems have begun to use machine learning to automatically associate documents with facet-values that are more subjective and abstract. Examples include search systems that support topic-based filtering of research articles, concept-based filtering of medical documents, and tag-based filtering of images. While machine learning can be used to infer facet-values when the collection is too large for manual annotation, machine-learned classifiers make mistakes. In such cases, it is desirable to have a scrutable system that explains why a filtered result is relevant to a facet-value. Such explanations are missing from current systems. In this paper, we investigate how explainability features can help users interpret results filtered using machine-learned facets. We consider two explainability features: (1) showing prediction confidence values and (2) highlighting rationale sentences that played an influential role in predicting a facet-value. We report on a crowdsourced study involving 200 participants. Participants were asked to scrutinize movie plot summaries predicted to satisfy multiple genres and indicate their agreement or disagreement with the system. Participants were exposed to four interface conditions. We found that both explainability features had a positive impact on participants' perceptions and performance. While both features helped, the sentence-highlighting feature played a more instrumental role in enabling participants to reject false positive cases. We discuss implications for designing tools to help users scrutinize automatically assigned facet-values.

Skip Supplemental Material Section

Supplemental Material

CIKM21-fp1720.mp4

mp4

153.3 MB

References

  1. Krisztian Balog, Filip Radlinski, and Shushan Arakelyan. 2019. Transparent, scrutable and explainable user models for personalized recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 265--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. David Bamman, Brendan O'Connor, and Noah A Smith. 2013. Learning latent personas of film characters. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 352--361.Google ScholarGoogle Scholar
  3. Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, and Eric Horvitz. 2019. Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 2429--2437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the whole exceed its parts? the effect of ai explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. John Brooke. 2013. SUS: a retrospective. Journal of usability studies 8, 2 (2013), 29--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Claudio Carpineto, Stanislaw Osi'ski, Giovanni Romano, and Dawid Weiss. 2009. A survey of web clustering engines. ACM Computing Surveys (CSUR) 41, 3 (2009), 1--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Samuel Carton, Qiaozhu Mei, and Paul Resnick. 2020. Feature-Based Explanations Don't Help People Detect Misclassifications of Online Toxicity. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 95--106.Google ScholarGoogle ScholarCross RefCross Ref
  8. Mon Chu Chen, John R. Anderson, and Myeong Ho Sohn. 2001. What Can a Mouse Cursor Tell Us More? Correlation of Eye/Mouse Movements on Web Browsing. In CHI '01 Extended Abstracts on Human Factors in Computing Systems. ACM, New York, NY, USA, 281--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ian Covert, Scott Lundberg, and Su-In Lee. 2020. Feature Removal Is a Unifying Principle for Model Explanation Methods. arXiv preprint arXiv:2011.03623 (2020).Google ScholarGoogle Scholar
  10. Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017).Google ScholarGoogle Scholar
  11. Jody Condit Fagan. 2010. Usability studies of faceted browsing: A literature review. Information Technology and Libraries 29, 2 (2010), 58--66.Google ScholarGoogle ScholarCross RefCross Ref
  12. Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA). IEEE, 80--89.Google ScholarGoogle ScholarCross RefCross Ref
  13. Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. 2017. On calibration of modern neural networks. In International Conference on Machine Learning. PMLR, 1321--1330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Human Mental Workload. Advances in Psychology, Vol. 52. North-Holland, 139--183.Google ScholarGoogle Scholar
  15. Marti Hearst. 2009. Integrating Navigation with Search. In Search User Interfaces. Cambridge University Press, Chapter 8.Google ScholarGoogle Scholar
  16. Marti A Hearst. 1995. Tilebars: Visualization of term distribution information in full text information access. In Proceedings of the SIGCHI conference on Human factors in computing systems. 59--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Orland Hoeber, Daniel Schroeder, and Michael Brooks. 2009. Real-world user evaluations of a visual and interactiveWeb search interface. In 2009 13th International Conference Information Visualisation. IEEE, 119--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tom Hope, Jason Portenoy, Kishore Vasan, Jonathan Borchardt, Eric Horvitz, Daniel Weld, Marti Hearst, and Jevin West. 2020. SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search. In Proceedings of the 2020conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 135--143. https://doi.org/10.18653/v1/2020.emnlp-demos.18Google ScholarGoogle ScholarCross RefCross Ref
  19. Ece Kamar. 2016. Directions in Hybrid Intelligence: Complementing AI Systems with Human Intelligence.. In IJCAI. 4070--4073. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning. In Proceedings of the 2020cHI Conference on Human Factors in Computing Systems. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Been Kim, Cynthia Rudin, and Julie A Shah. 2014. The bayesian case model: A generative approach for case-based reasoning and prototype classification. In Advances in neural information processing systems. 1952--1960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Weize Kong and James Allan. 2014. Extending faceted search to the general web. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 839--848. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Benjamin CG Lee and Daniel S Weld. 2020. Newspaper Navigator: Open Faceted Search for 1.5 Million Images. In Adjunct Publication of the 33rd Annual ACM Symposium on User Interface Software and Technology. 120--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing Neural Predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 107--117.Google ScholarGoogle ScholarCross RefCross Ref
  26. Piyawat Lertvittayakumjorn and Francesca Toni. 2019. Human-grounded Evaluations of Explanation Methods for Text Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5198--5208.Google ScholarGoogle ScholarCross RefCross Ref
  27. Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30 (2017), 4765--4774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yuqing Mao and Zhiyong Lu. 2017. MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank. Journal of biomedical semantics 8, 1 (2017), 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  29. Iain J Marshall, Joël Kuiper, and Byron C Wallace. 2016. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. Journal of the American Medical Informatics Association 23, 1 (2016), 193--201.Google ScholarGoogle ScholarCross RefCross Ref
  30. Siyu Mi and Jiepu Jiang. 2019. Understanding the Interpretability of Search Result Summaries. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 989--992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Christoph Molnar. 2020. Interpretable machine learning.Google ScholarGoogle Scholar
  32. Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning. 625--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. National Library of Medicine. 2017. Frequently Asked Questions about Indexing for MEDLINE. https://www.nlm.nih.gov/bsd/indexfaq.html. Accessed: 2021-05.Google ScholarGoogle Scholar
  34. Jerome Ramos and Carsten Eickhoff. 2020. Search Result Explanations Improve Efficiency and Trust. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1597--1600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Axel J Soto, Piotr Przybya, and Sophia Ananiadou. 2019. Thalia: semantic search engine for biomedical abstracts. Bioinformatics 35, 10 (2019), 1799--1801.Google ScholarGoogle ScholarCross RefCross Ref
  37. Harini Suresh, Steven R Gomez, Kevin K Nam, and Arvind Satyanarayan. 2021. Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and their Needs. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 1, 1 (2020), 396--413.Google ScholarGoogle ScholarCross RefCross Ref
  39. Chih-Hsuan Wei, Alexis Allot, Robert Leaman, and Zhiyong Lu. 2019. PubTator central: automated concept annotation for biomedical full text articles. Nucleic acids research 47, W1 (2019), W587--W593.Google ScholarGoogle Scholar
  40. Chih-Hsuan Wei, Hung-Yu Kao, and Zhiyong Lu. 2013. PubTator: a web-based text mining tool for assisting biocuration. Nucleic acids research 41, W1 (2013), W518--W522.Google ScholarGoogle Scholar
  41. Honghan Wu, Giulia Toti, Katherine I Morley, Zina M Ibrahim, Amos Folarin, Richard Jackson, Ismail Kartoglu, Asha Agrawal, Clive Stringer, Darren Gale, et al. 2018. SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research. Journal of the American Medical Informatics Association 25, 5 (2018), 530--537.Google ScholarGoogle ScholarCross RefCross Ref
  42. Yongfeng Zhang, Xu Chen, et al. 2020. Explainable Recommendation: A Survey and New Perspectives. Foundations and Trends® in Information Retrieval 14, 1 (2020), 1--101.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of confidence and explanation on accuracy and trust calibration in ai-assisted decision making. In Proceedings of the 2020conference on Fairness, Accountability, and Transparency. 295--305. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Study of Explainability Features to Scrutinize Faceted Filtering Results

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
        October 2021
        4966 pages
        ISBN:9781450384469
        DOI:10.1145/3459637

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 October 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader