ABSTRACT
Object detection is a key application of machine learning. Currently, these detector models rely on deep networks that offer model builders limited agency over model construction, refinement and maintenance. Human-centered approaches to address these issues explore the exchange of knowledge between a human-in-the-loop and a learning system. This exchange, mediated through a teaching language, is often restricted to the specification of labels and constrains user expressiveness communicating other forms of knowledge to the system. We propose and assess an expressive teaching language for specifying object detectors which includes constructs such as concepts and relationships. From a formative study, we identified language building blocks and articulated design goals for creating interactive experiences in teaching object detection. We applied these goals through a design probe that highlighted further research questions and a set of design takeaways.
Supplemental Material
- 2020. Language. In The Cambridge Dictionary. Cambridge University Press. https://dictionary. cambridge.org/dictionary/english/languageGoogle Scholar
- Waleed Abdulla. 2017. Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. https://github.com/matterport/Mask_RCNN. (2017).Google Scholar
- Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access 6 (2018), 52138--52160. DOI:http://dx.doi.org/10.1109/ACCESS.2018.2870052Google ScholarCross Ref
- Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105--120. DOI: http://dx.doi.org/10.1609/aimag.v35i4.2513Google ScholarDigital Library
- Steve Branson, Catherine Wah, Boris Babenko, Florian Schroff, Peter Welinder, Pietro Perona, and Serge Belongie. 2010. Visual Recognition with Humans in the Loop. In European Conference on Computer Vision (ECCV) (2010-01-01). Heraklion, Crete. DOI: http://dx.doi.org/10.1007/978-3-642-15561-1_32Google ScholarCross Ref
- Anind K Dey, Stephanie Rosenthal, and Manuela Veloso. 2009. Using interaction to improve intelligence: how intelligent systems should ask users for input. In Workshop on Intelligence and Interaction: IJCAI.Google Scholar
- John J Dudley and Per Ola Kristensson. 2018. A Review of User Interface Design for Interactive Machine Learning. ACM Transactions on Interactive Intelligent Systems (TiiS) 8, 2 (2018), 8. DOI: http://dx.doi.org/10.1145/3185517Google ScholarDigital Library
- Jerry Alan Fails and Dan R. Olsen, Jr. 2003. Interactive Machine Learning. In Proceedings of the 8th International Conference on Intelligent User Interfaces (IUI '03). ACM, New York, NY, USA, 39--45. DOI: http://dx.doi.org/10.1145/604045.604056Google ScholarDigital Library
- Rebecca Fiebrink and Perry R. Cook. 2010. The wekinator: a system for real-time, interactive machine learning in music. In Proceedings of The Eleventh International Society for Music Information Retrieval Conference (ISMIR).Google Scholar
- James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: interactive concept learning in image search. In Proceedings of the sigchi conference on human factors in computing systems. ACM, 29--38. DOI: http://dx.doi.org/10.1145/1357054.1357061Google ScholarDigital Library
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE international conference on computer vision. 2961--2969. DOI: http://dx.doi.org/10.1109/ICCV.2017.322Google ScholarCross Ref
- Andreas Holzinger, Markus Plass, Michael Kickmeier-Rust, Katharina Holzinger, Gloria Cerasela Cri¸ san, Camelia-M. Pintea, and Vasile Palade. 2019. Interactive machine learning: experimental evidence for the human in the algorithmic loop. Applied Intelligence 49, 7 (01 Jul 2019), 2401--2414. DOI: http://dx.doi.org/10.1007/s10489-018-1361-5Google ScholarDigital Library
- Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. 2017. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2901--2910. DOI: http://dx.doi.org/10.1109/CVPR.2017.215Google ScholarCross Ref
- Tim Kraska. 2018. Northstar: An Interactive Data Science System. Proc. VLDB Endow. 11, 12 (Aug. 2018), 2150--2164. DOI: http://dx.doi.org/10.14778/3229863.3240493Google ScholarDigital Library
- Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael Bernstein, and Li Fei-Fei. 2016. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. https://arxiv.org/abs/1602.07332Google Scholar
- Todd Kulesza, Simone Stumpf, Margaret Burnett, and Irwin Kwan. 2012. Tell me more?: the effects of mental model soundness on personalizing an intelligent agent. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1--10. DOI: http://dx.doi.org/10.1145/2207676.2207678Google ScholarDigital Library
- Todd Kulesza, Simone Stumpf, Margaret Burnett, Weng-Keen Wong, Yann Riche, Travis Moore, Ian Oberst, Amber Shinsel, and Kevin McIntosh. 2010. Explanatory debugging: Supporting end-user debugging of machine-learned programs. In 2010 IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, 41--48. DOI: http://dx.doi.org/10.1109/VLHCC.2010.15Google ScholarDigital Library
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European conference on computer vision. Springer, 740--755. DOI: http://dx.doi.org/10.1007/978-3-319-10602-1_48Google ScholarCross Ref
- Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765--4774.Google Scholar
- Microsoft. 2019a. What is Azure Custom Vision? https://docs.microsoft.com/en-us/azure/cognitive-services/custom-vision-service/home. (2019).Google Scholar
- Microsoft. 2019b. What is the Bing Image Search API? https://docs.microsoft.com/en-gb/azure/cognitive-services/bing-image-search/overview. (2019).Google Scholar
- Microsoft. 2019c. What is the Bing Visual Search API? https://docs.microsoft.com/en-us/azure/cognitive-services/bing-visual-search/overview. (2019).Google Scholar
- Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks. In Proc.computer Vision and Pattern Recognition. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). DOI: http://dx.doi.org/10.1109/CVPR.2014.222Google ScholarDigital Library
- Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv (2018). https://arxiv.org/abs/1804.02767Google Scholar
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13--17, 2016. 1135--1144. DOI: http://dx.doi.org/10.1145/2939672.2939778Google ScholarDigital Library
- Aruni RoyChowdhury, Prithvijit Chakrabarty, Ashish Singh, SouYoung Jin, Huaizu Jiang, Liangliang Cao, and Erik Learned-Miller. 2019. Automatic adaptation of object detectors to new domains using self-training. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: http://dx.doi.org/10.1109/CVPR.2019.00087Google ScholarCross Ref
- Burr Settles. 2012. Active Learning. Morgan & Claypool Publishers. DOI: http://dx.doi.org/10.2200/S00429ED1V01Y201207AIM018Google ScholarCross Ref
- Patrice Y Simard, Saleema Amershi, David M Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, and others. 2017. Machine Teaching: A New Paradigm for Building Machine Learning Systems. arXiv preprint arXiv:1707.06742 (2017). https://arxiv.org/abs/1707.06742Google Scholar
- Simone Stumpf, Vidya Rajaram, Lida Li, Margaret Burnett, Thomas Dietterich, Erin Sullivan, Russell Drummond, and Jonathan Herlocker. 2007. Toward harnessing user feedback for machine learning. In Proceedings of the 12th international conference on Intelligent user interfaces. ACM, 82--91. DOI: http://dx.doi.org/10.1145/1216295.1216316Google ScholarDigital Library
- Emily Wall, Soroush Ghorashi, and Gonzalo Ramos. 2019. Using Expert Patterns in Assisted Interactive Machine Learning: A Study in Machine Teaching. In Proceedings of the 17th IFIP TC 13 International Conference on Human-Computer Interaction. Springer International Publishing. DOI: http://dx.doi.org/10.1007/978-3-030-29387-1_34Google ScholarDigital Library
- Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems 27. Curran Associates, Inc., 3320--3328.Google Scholar
- Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. 2015. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015). https://arxiv.org/abs/1506.06579Google Scholar
- Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833. DOI: http://dx.doi.org/10.1007/978-3-319-10590-1_53Google ScholarCross Ref
- Jian Zhao, Michael Glueck, Petra Isenberg, Fanny Chevalier, and Azam Khan. 2017. Supporting handoff in asynchronous collaborative sensemaking using knowledge-transfer graphs. IEEE transactions on visualization and computer graphics 24, 1 (2017), 340--350.Google Scholar
- X. Zhu. 2015. Machine Teaching: an Inverse Problem to Machine Learning and an Approach Toward Optimal Education. In The Twenty-Ninth AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Index Terms
- A Teaching Language for Building Object Detection Models
Recommendations
Understanding and Supporting Knowledge Decomposition for Machine Teaching
DIS '20: Proceedings of the 2020 ACM Designing Interactive Systems ConferenceMachine teaching (MT) is an emerging field that studies non-machine learning (ML) experts incrementally building semantic ML models in efficient ways. While MT focuses on the types of knowledge a human teacher provides a machine learner, not much is ...
Eliciting good teaching from humans for machine learners
We propose using computational teaching algorithms to improve human teaching for machine learners. We investigate example sequences produced naturally by human teachers and find that humans often do not spontaneously generate optimal teaching sequences ...
Studying Collaborative Interactive Machine Teaching in Image Classification
IUI '24: Proceedings of the 29th International Conference on Intelligent User InterfacesWhile human-centered approaches to machine learning explore various human roles within the interaction loop, the notion of Interactive Machine Teaching (IMT) emerged with a focus on leveraging the teaching skills of humans as a teacher to build machine ...
Comments