Abstract
This study investigates three challenges for developing machine learning-based self-service web apps for consumers. First, we argue that user research must accompany the development of ML-based products so that they better serve users’ needs at all stages of development. Second, we discuss the data sourcing dilemma in developing consumer-oriented ML-based apps and propose a way to solve it by implementing an interaction design that balances the workload between users and computers according to the ML component’s performance. To dynamically define the role of the user-in-the-loop, we monitor user success and ML performance over time. Finally, we propose a lightweight typology of ML-based systems to assess the generalizability of our findings to other ML use cases.
Our case study uses a newly developed web application that allows consumers to analyze their heating bills for potential energy and cost savings. Based on domain-specific data values extracted from user-provided document images, an assessment of potential savings is derived and reported back to the user.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Shared tasks, such as the table detection and recognition challenges of the ICDAR conference series [10, 11], are a ubiquitous means within machine learning communities. They usually focus on solving or improving a specific ML use case by applying and fine-tuning (highly specialized) machine-learning techniques towards a predefined, shared goal. While a helpful motivation and illustration for the specific tasks and the applicable techniques, there is usually no need to further contextualize or generalize beyond the specific setting of the task at hand.
- 3.
See Sect. 3.3 for a brief typology of ML use cases.
- 4.
- 5.
Note that there is at least one famous class of “behind the scenes” data annotation scenarios, where users are motivated merely by their will to successfully interact with the annotation tool in order to pass a specific test: ReCaptcha requires web users to “voluntarily” perform (partly difficult) annotation tasks of (sections of) scans or photos from extensive image collections in order to authenticate themselves as humans [34].
- 6.
It is precisely those “on stage” settings, where the paradigm shift, that is referred to in the invitation to this panel, can be expected to be successfully implemented.
- 7.
The technical description of the Smart_HEC web app and its ML component is adopted from the corresponding project’s final report [31].
- 8.
Note that we do not perform any fine-tuning of language models for OCR. We use Tesseract’s pre-trained models for contemporary German as provided. Once the correct ROIs for the target values are identified by the Mask R-CNN, our lever for improving the OCR results lies mainly with ranking and filtering Tesseract’s hypotheses through pattern matching in the post-decoder.
- 9.
This highly dynamic layout with unknown positionings of the target values does not allow for classical form data extraction or otherwise useful table detection heuristics, cf. a similar discussion in [5]. Hence, our ML-based approach attempts to mimic a human visual lookup strategy for finding the required target values on the document page images.
- 10.
Improvements between the two stages were mainly achieved by re-annotating large numbers of ROIs in the ground truth and re-training the Mask R-CNN, after systematic problems with the previous annotations had been discovered.
- 11.
These results could also indicate problems with the exact locations of the identified ROIs in the production environment. Such problems were, however, not observed in the lab setting.
- 12.
For instance, in our case, it might be possible to reduce the human annotation workload through automatically pre-labelling potential ROIs by locating the users’ corrected target values on the corresponding document images.
References
Abdulla, W.: Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. https://github.com/matterport/Mask_RCNN. Accessed 11 Feb 2022
Auer, F., Felderer, M.: Shifting quality assurance of machine learning algorithms to live systems. In: Tichy, M., Bodden, E., Kuhrmann, M., Wagner, S., Steghöfer, J.-P. (eds.) Software Engineering und Software Management 2018, pp. 211–212. Gesellschaft für Informatik, Bonn (2018)
Baur, N., Blasius, J. (eds.): Handbuch Methoden der empirischen Sozialforschung. Springer, Wiesbaden (2014). https://doi.org/10.1007/978-3-531-18939-0
Beede, E., et al.: A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, pp. 1–12. ACM (2020). https://doi.org/10.1145/3313831.3376718
Bürgl, K., Reinhardt, L., Binder, F., Müller, L., Niekler, A.: Digitizing Drilling Logs - Challenges of typewritten forms. In: Gesellschaft für Informatik (ed.) 51. Jahrestagung der Gesellschaft für Informatik, INFORMATIK 2021 - Computer Science & Sustainability, Berlin, pp. 709–718. Gesellschaft für Informatik, Bonn (2021). https://doi.org/10.18420/informatik2021-059
Chegini, M., et al.: Interactive visual labelling versus active learning: an experimental comparison. Front. Inf. Technol. Electron. Eng. 21, 524–535 (2020). https://doi.org/10.1631/FITEE.1900549
Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q. 319–340 (1989)
Dietrich, T., Trischler, J., Schuster, L., Rundle-Thiele, S.: Co-designing services with vulnerable consumers. J. Serv. Theory Pract. 27, 663–688 (2017). https://doi.org/10.1108/jstp-02-2016-0036
Engl, E.: OCR-D kompakt: Ergebnisse und Stand der Forschung in der Förderinitiative. Bibliothek Forschung und Praxis (44), 218–230 (2020). https://doi.org/10.1515/bfp-2020-0024
Gao, L., et al.: ICDAR 2019 competition on table detection and recognition (cTDaR). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515 (2019). https://doi.org/10.1109/ICDAR.2019.00243
Göbel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 table competition. In: 12th International Conference on Document Analysis and Recognition, pp. 1449–1453 (2013). https://doi.org/10.1109/ICDAR.2013.292
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017). https://doi.org/10.1109/ICCV.2017.322
Hesenius, M., Schwenzfeier, N., Meyer, O., Koop, W., Gruhn, V.: Towards a software engineering process for developing data-driven applications. In: 2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE), pp. 35–41. IEEE (2019). https://doi.org/10.1109/raise.2019.00014
Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3(2), 119–131 (2016). https://doi.org/10.1007/s40708-016-0042-6
Holzinger, A., Valdez, A.C., Ziefle, M.: Towards interactive recommender systems with the doctor-in-the-loop. In: Weyers, B., Dittmar, A. (eds.) Mensch und Computer 2016 - Workshopband. Gesellschaft für Informatik e.V., Aachen (2016). https://doi.org/10.18420/MUC2016-WS11-0001
Kettner, S.E., Thorun, C.: Verbraucherstudie 2019: Wie erreicht man Verbraucherin- nen und Verbraucher im Zeitalter digitaler Informationsangebote. Final report. ConPolicy GmbH, Berlin (2019)
Lell, O., Kettner, S.E., Thorun, C., Bendig, T.: Verbraucherschutz digital neu denken: Consumer Protection Technologies - Politische Relevanz, Potential und Handlungsbedarf. ConPolicy GmbH, Berlin (2021)
Lewis, C.: Using the “thinking-aloud” method in cognitive interface design. IBM TJ Watson Research Center, Yorktown Heights (1982)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Mahlke, S.: Factors influencing the experience of website usage. In: Extended Abstracts on Human Factors in Computing Systems, CHI 2002, pp. 846–847 (2002)
Monarch, R.: Human-in-the-Loop Machine Learning. Manning Publications, New York (2021)
Morville, P.: User experience design. https://semanticstudios.com/user_experience_design/. Accessed 11 Feb 2022
Moser, C.: User Experience Design. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-13363-3
Neudecker, C., et al.: OCR-D: an end-to-end open source OCR framework for historical printed documents. In: Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage, Brussels, pp. 53–58. ACM (2019). https://doi.org/10.1145/3322905.3322917
Ng, A.: Structured and Unstructured Data: Implications for AI Development. The Batch. https://read.deeplearning.ai/the-batch/structured-and-unstructured-data-implications-for-ai-development/. Accessed 05 Nov 2021
Patton, J., Economy, P.: User Story Mapping: Discover the Whole Story, Build the Right Product. 1st edn. O’Reilly Media Inc. (2014)
Reder, B.: Machine Learning 2021. IDG Business Media GmbH, München (2021)
Reul, C., Springmann, U., Puppe, F.: LAREX: a semi-automatic open-source tool for layout analysis and region extraction on early printed books. In: Proceedings of the 2nd International Conference on Digital Access to Textual Cultural Heritage, Göttingen, pp. 137–142. Association for Computing Machinery (2017). https://doi.org/10.1145/3078081.3078097
Riccio, V., Jahangirova, G., Stocco, A., Humbatova, N., Weiss, M., Tonella, P.: Testing machine learning based systems: a systematic mapping. Empir. Softw. Eng. 25(6), 5193–5254 (2020). https://doi.org/10.1007/s10664-020-09881-0
Roberts, L.: The value of AI: now and the future (PART 2) AI Failures, Pitfalls, Key Learnings and Success. https://www.linkedin.com/pulse/value-ai-now-future-part-2-failures-pitfalls-key-success-roberts/. Accessed 05 Nov 2021
Scheurer, Y., et al.: Abschlussbericht Smart_HEC (Kurzfassung). co2online gGmbH, Berlin (2021)
Smith, R.: An overview of the tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), pp. 629–633 (2007). https://doi.org/10.1109/ICDAR.2007.4376991
Thielsch, M.T., Blotenberg, I., Jaron, R.: User evaluation of websites: from first impression to recommendation. Interact. Comput. 26(1), 89–102 (2014)
von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: human-based character recognition via web security measures. Science 321, 1465–1468 (2008). https://doi.org/10.1126/science.1160379
Yimam, S.M., Biemann, C., Majnaric, L., Šabanović, Š, Holzinger, A.: An adaptive annotation approach for biomedical entity and relation recognition. Brain Inform. 3(3), 157–168 (2016). https://doi.org/10.1007/s40708-016-0036-4
Acknowledgments
This research was supported by the German Federal Ministry of Justice and Consumer Protection (BMJV) under grants no. 28V2304A19, 28V2304B19, 28V2304C19, 28V2304D19. Partial contributions were funded by the German Federal Ministry of Education and Research (BMBF) under grant no. 01IS20091B, and by the Development Bank of Saxony (SAB) under project number 100335729.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Binder, F. et al. (2022). Putting Users in the Loop: How User Research Can Guide AI Development for a Consumer-Oriented Self-service Portal. In: Rauterberg, M. (eds) Culture and Computing. HCII 2022. Lecture Notes in Computer Science, vol 13324. Springer, Cham. https://doi.org/10.1007/978-3-031-05434-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-05434-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05433-4
Online ISBN: 978-3-031-05434-1
eBook Packages: Computer ScienceComputer Science (R0)