Influence of Errors on the Evaluation of Text Classification Systems

Bracamonte, Vanessa; Hidano, Seira; Kiyomoto, Shinsaku

doi:10.1007/978-3-031-45725-8_8

Vanessa Bracamonte¹⁴,
Seira Hidano¹⁴ &
Shinsaku Kiyomoto¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1815))

Included in the following conference series:

International Joint Conference on Computer Vision, Imaging and Computer Graphics

192 Accesses

Abstract

Accuracy metrics and explanation of outputs can provide users with useful information about the performance of machine learning-based systems. However, the availability of this information can result in users’ overlooking potential problems in the system. This paper investigates whether making errors obvious to the user can influence trust towards a system that has high accuracy but has flaws. In order to test this hypothesis, a series of experiments with different settings were conducted. Participants were shown examples of the predictions of text classification systems, the explanation of those predictions and the overall accuracy of the systems. The participants were then asked to evaluate the systems based on those pieces of information and to indicate the reason for their evaluation decision. The results show that participants who were shown examples where there was a pattern of errors in the explanation were less willing to recommend or choose a system even if the system’s accuracy metric was higher. In addition, fewer participants reported that the accuracy metric was the reason for their choice, and more participants mentioned the prediction explanation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Explainable AI for Text Classification: Lessons from a Comprehensive Evaluation of Post Hoc Methods

Article Open access 06 August 2024

Stop Ordering Machine Learning Algorithms by Their Explainability! An Empirical Investigation of the Tradeoff Between Performance and Explainability

Sage Advice? The Impacts of Explanations for Machine Learning Models on Human Decision-Making in Spam Detection

References

Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052
Article Google Scholar
Amershi, S., et al.: Software engineering for machine learning: a case study. In: Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2010, pp. 291–300. IEEE Press, Piscataway, NJ, USA (2019). https://doi.org/10.1109/ICSE-SEIP.2019.00042
Borkan, D., Dixon, L., Sorensen, J., Thain, N., Vasserman, L.: Nuanced metrics for measuring unintended bias with real data for text classification. In: Companion Proceedings of the 2019 World Wide Web Conference, WWW 2019, pp. 491–500. ACM, New York, NY, USA (2019). https://doi.org/10.1145/3308560.3317593
Bracamonte, V., Hidano, S., Kiyomoto, S.: Effect of errors on the evaluation of machine learning systems. In: VISIGRAPP (2: HUCAPP), pp. 48–57 (2022)
Google Scholar
Bussone, A., Stumpf, S., O’Sullivan, D.: The role of explanations on trust and reliance in clinical decision support systems. In: 2015 International Conference on Healthcare Informatics, October 2015, pp. 160–169 (2015). https://doi.org/10.1109/ICHI.2015.26
Cai, C.J., et al.: Human-centered tools for coping with imperfect algorithms during medical decision-making. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, May 2019, pp. 1–14. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3290605.3300234
Chakraborty, S., et al.: Interpretability of deep learning models: a survey of results. In: 2017 IEEE Smartworld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City innovation (smartworld/SCALCOM/UIC/ATC/CBDcom/IOP/SCI), pp. 1–6. IEEE (2017)
Google Scholar
Chen, J., Song, L., Wainwright, M.J., Jordan, M.I.: L-Shapley and C-Shapley: efficient model interpretation for structured data. arXiv preprint arXiv:1808.02610 (2018)
Cheng, H.F., et al.: Explaining decision-making algorithms through UI: strategies to help non-expert stakeholders. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, May 2019, pp. 1–12. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3290605.3300789
de Vries, P., Midden, C., Bouwhuis, D.: The effects of errors on system trust, self-confidence, and the allocation of control in route planning. Int. J. Hum Comput Stud. 58(6), 719–735 (2003). https://doi.org/10.1016/S1071-5819(03)00039-9
Article Google Scholar
Dixon, L., Li, J., Sorensen, J., Thain, N., Vasserman, L.: Measuring and mitigating unintended bias in text classification. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, December 2018, pp. 67–73. ACM (2018). https://doi.org/10.1145/3278721.3278729
Dzindolet, M.T., Peterson, S.A., Pomranky, R.A., Pierce, L.G., Beck, H.P.: The role of trust in automation reliance. Int. J. Hum Comput Stud. 58(6), 697–718 (2003). https://doi.org/10.1016/S1071-5819(03)00038-7
Article Google Scholar
Eslami, M., Vaccaro, K., Lee, M.K., Elazari Bar On, A., Gilbert, E., Karahalios, K.: User attitudes towards algorithmic opacity and transparency in online reviewing platforms. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, May 2019, pp. 1–14. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3290605.3300724
Frison, A.K., et al.: In UX we trust: investigation of aesthetics and usability of driver-vehicle interfaces and their impact on the perception of automated driving. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, May 2019, pp. 1–13. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3290605.3300374
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 93:1-93:42 (2018). https://doi.org/10.1145/3236009
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Hoff, K.A., Bashir, M.: Trust in automation: integrating empirical evidence on factors that influence trust. Hum. Factors 57(3), 407–434 (2015)
Article Google Scholar
Jigsaw: Unintended bias and names of frequently targeted groups (2018). https://medium.com/the-false-positive/unintended-bias-and-names-of-frequently-targeted-groups-8e0b81f80a23
Kaur, H., Nori, H., Jenkins, S., Caruana, R., Wallach, H., Wortman Vaughan, J.: Interpreting interpretability: understanding data scientists’ use of interpretability tools for machine learning. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI 2020, April 2020, pp. 1–14. Association for Computing Machinery, Honolulu, HI, USA (2020). https://doi.org/10.1145/3313831.3376219
Keras: Keras documentation: about Keras (2021). https://keras.io/about/
Kizilcec, R.F.: How much information? Effects of transparency on trust in an algorithmic interface. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI 2016, May 2016, pp. 2390–2395. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2858036.2858402
Kontogiannis, T.: User strategies in recovering from errors in man–machine systems. Saf. Sci. 32(1), 49–68 (1999)
Article Google Scholar
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 2015, January 2015, pp. 2267–2273. AAAI Press, Austin, Texas (2015)
Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Article MATH Google Scholar
Lee, J.D., See, K.A.: Trust in automation: designing for appropriate reliance. Hum. Factors 46(1), 50–80 (2004). https://doi.org/10.1518/hfes.46.1.50_30392
Article Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 4765–4774. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, June 2011, pp. 142–150. Association for Computational Linguistics (2011). http://www.aclweb.org/anthology/P11-1015
Mittelstadt, B., Russell, C., Wachter, S.: Explaining explanations in AI. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, January 2019, pp. 279–288. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3287560.3287574
Nickerson, R.S.: Confirmation bias: a ubiquitous phenomenon in many guises. Rev. Gen. Psychol. 2(2), 175–220 (1998). https://doi.org/10.1037/1089-2680.2.2.175
Article Google Scholar
Nourani, M., King, J., Ragan, E.: The role of domain expertise in user trust and the impact of first impressions with intelligent systems. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, October 2020, vol. 8, pp. 112–121 (2020). https://ojs.aaai.org/index.php/HCOMP/article/view/7469
Raybaud, S., Langlois, D., Smaïli, K.: “This sentence is wrong.’’ Detecting errors in machine-translated sentences. Mach. Transl. 25(1), 1 (2011). https://doi.org/10.1007/s10590-011-9094-9
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 1135–1144. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939778
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intel. 1(5), 206–215 (2019)
Article Google Scholar
Sanchez, J., Rogers, W.A., Fisk, A.D., Rovira, E.: Understanding reliance on automation: effects of error type, error distribution, age and experience. Theor. Issues Ergon. Sci. 15(2), 134–160 (2014). https://doi.org/10.1080/1463922X.2011.611269
Article Google Scholar
Sauer, J., Chavaillaz, A., Wastell, D.: Experience of automation failures in training: effects on trust, automation bias, complacency and performance. Ergonomics 59(6), 767–780 (2016). https://doi.org/10.1080/00140139.2015.1094577
Article Google Scholar
Tenney, I., et al.: The language interpretability tool: extensible, interactive visualizations and analysis for NLP models, August 2020. arXiv:2008.05122 [cs]
West, J.: Jessamyn West on Twitter: “I tested 14 sentences for “perceived toxicity” using Perspectives. Least toxic: I am a man. Most toxic: I am a gay black woman. Come on https://t.co/M4TF9uYtzE”/Twitter (2017)
Wobbrock, J.O., Findlater, L., Gergle, D., Higgins, J.J.: The aligned rank transform for nonparametric factorial analyses using only Anova procedures. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2011, May 2011, pp. 143–146. Association for Computing Machinery, New York, NY, USA (2011). https://doi.org/10.1145/1978942.1978963
Xiong, D., Zhang, M., Li, H.: Error detection for statistical machine translation using linguistic features. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, July 2010, pp. 604–611. Association for Computational Linguistics, USA (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

KDDI Research, Inc., Saitama, Japan
Vanessa Bracamonte, Seira Hidano & Shinsaku Kiyomoto

Authors

Vanessa Bracamonte
View author publications
You can also search for this author in PubMed Google Scholar
Seira Hidano
View author publications
You can also search for this author in PubMed Google Scholar
Shinsaku Kiyomoto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vanessa Bracamonte .

Editor information

Editors and Affiliations

University of Porto, Porto, Portugal
A. Augusto de Sousa
University of Warwick, Coventry, UK
Kurt Debattista
Mines ParisTech, Paris, France
Alexis Paljic
Bentley University, Waltham, USA
Mounia Ziat
French Civil Aviation University (ENAC), Toulouse, France
Christophe Hurter
Monash University, Melbourne, VIC, Australia
Helen Purchase
Department of Mathematics, University of Catania, Catania, Italy
Giovanni Maria Farinella
Computer Vision Center, University of Barcelona, Barcelona, Spain
Petia Radeva
IRISA, University of Rennes 1, Rennes, France
Kadi Bouatouch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bracamonte, V., Hidano, S., Kiyomoto, S. (2023). Influence of Errors on the Evaluation of Text Classification Systems. In: de Sousa, A.A., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2022. Communications in Computer and Information Science, vol 1815. Springer, Cham. https://doi.org/10.1007/978-3-031-45725-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-45725-8_8
Published: 18 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45724-1
Online ISBN: 978-3-031-45725-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Influence of Errors on the Evaluation of Text Classification Systems