research-article

A Teaching Language for Building Object Detection Models

Authors:
Nicole Sultanum

University of Toronto, Toronto, ON, Canada

University of Toronto, Toronto, ON, Canada
View Profile

,
Soroush Ghorashi

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Christopher Meek

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Gonzalo Ramos

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

DIS '20: Proceedings of the 2020 ACM Designing Interactive Systems ConferenceJuly 2020Pages 1223–1234https://doi.org/10.1145/3357236.3395545

Published:03 July 2020Publication History

DIS '20: Proceedings of the 2020 ACM Designing Interactive Systems Conference

Pages 1223–1234

ABSTRACT

Object detection is a key application of machine learning. Currently, these detector models rely on deep networks that offer model builders limited agency over model construction, refinement and maintenance. Human-centered approaches to address these issues explore the exchange of knowledge between a human-in-the-loop and a learning system. This exchange, mediated through a teaching language, is often restricted to the specification of labels and constrains user expressiveness communicating other forms of knowledge to the system. We propose and assess an expressive teaching language for specifying object detectors which includes constructs such as concepts and relationships. From a formative study, we identified language building blocks and articulated design goals for creating interactive experiences in teaching object detection. We applied these goals through a design probe that highlighted further research questions and a set of design takeaways.

Supplemental Material

disfp9273.mp4

mp4

64.9 MB

Download

References

2020. Language. In The Cambridge Dictionary. Cambridge University Press. https://dictionary. cambridge.org/dictionary/english/languageGoogle Scholar
Waleed Abdulla. 2017. Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. https://github.com/matterport/Mask_RCNN. (2017).Google Scholar
Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access 6 (2018), 52138--52160. DOI:http://dx.doi.org/10.1109/ACCESS.2018.2870052Google ScholarCross Ref
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105--120. DOI: http://dx.doi.org/10.1609/aimag.v35i4.2513Google ScholarDigital Library
Steve Branson, Catherine Wah, Boris Babenko, Florian Schroff, Peter Welinder, Pietro Perona, and Serge Belongie. 2010. Visual Recognition with Humans in the Loop. In European Conference on Computer Vision (ECCV) (2010-01-01). Heraklion, Crete. DOI: http://dx.doi.org/10.1007/978-3-642-15561-1_32Google ScholarCross Ref
Anind K Dey, Stephanie Rosenthal, and Manuela Veloso. 2009. Using interaction to improve intelligence: how intelligent systems should ask users for input. In Workshop on Intelligence and Interaction: IJCAI.Google Scholar
John J Dudley and Per Ola Kristensson. 2018. A Review of User Interface Design for Interactive Machine Learning. ACM Transactions on Interactive Intelligent Systems (TiiS) 8, 2 (2018), 8. DOI: http://dx.doi.org/10.1145/3185517Google ScholarDigital Library
Jerry Alan Fails and Dan R. Olsen, Jr. 2003. Interactive Machine Learning. In Proceedings of the 8th International Conference on Intelligent User Interfaces (IUI '03). ACM, New York, NY, USA, 39--45. DOI: http://dx.doi.org/10.1145/604045.604056Google ScholarDigital Library
Rebecca Fiebrink and Perry R. Cook. 2010. The wekinator: a system for real-time, interactive machine learning in music. In Proceedings of The Eleventh International Society for Music Information Retrieval Conference (ISMIR).Google Scholar
James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: interactive concept learning in image search. In Proceedings of the sigchi conference on human factors in computing systems. ACM, 29--38. DOI: http://dx.doi.org/10.1145/1357054.1357061Google ScholarDigital Library
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE international conference on computer vision. 2961--2969. DOI: http://dx.doi.org/10.1109/ICCV.2017.322Google ScholarCross Ref
Andreas Holzinger, Markus Plass, Michael Kickmeier-Rust, Katharina Holzinger, Gloria Cerasela Cri¸ san, Camelia-M. Pintea, and Vasile Palade. 2019. Interactive machine learning: experimental evidence for the human in the algorithmic loop. Applied Intelligence 49, 7 (01 Jul 2019), 2401--2414. DOI: http://dx.doi.org/10.1007/s10489-018-1361-5Google ScholarDigital Library
Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. 2017. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2901--2910. DOI: http://dx.doi.org/10.1109/CVPR.2017.215Google ScholarCross Ref
Tim Kraska. 2018. Northstar: An Interactive Data Science System. Proc. VLDB Endow. 11, 12 (Aug. 2018), 2150--2164. DOI: http://dx.doi.org/10.14778/3229863.3240493Google ScholarDigital Library
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael Bernstein, and Li Fei-Fei. 2016. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. https://arxiv.org/abs/1602.07332Google Scholar
Todd Kulesza, Simone Stumpf, Margaret Burnett, and Irwin Kwan. 2012. Tell me more?: the effects of mental model soundness on personalizing an intelligent agent. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1--10. DOI: http://dx.doi.org/10.1145/2207676.2207678Google ScholarDigital Library
Todd Kulesza, Simone Stumpf, Margaret Burnett, Weng-Keen Wong, Yann Riche, Travis Moore, Ian Oberst, Amber Shinsel, and Kevin McIntosh. 2010. Explanatory debugging: Supporting end-user debugging of machine-learned programs. In 2010 IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, 41--48. DOI: http://dx.doi.org/10.1109/VLHCC.2010.15Google ScholarDigital Library
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European conference on computer vision. Springer, 740--755. DOI: http://dx.doi.org/10.1007/978-3-319-10602-1_48Google ScholarCross Ref
Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765--4774.Google Scholar
Microsoft. 2019a. What is Azure Custom Vision? https://docs.microsoft.com/en-us/azure/cognitive-services/custom-vision-service/home. (2019).Google Scholar
Microsoft. 2019b. What is the Bing Image Search API? https://docs.microsoft.com/en-gb/azure/cognitive-services/bing-image-search/overview. (2019).Google Scholar
Microsoft. 2019c. What is the Bing Visual Search API? https://docs.microsoft.com/en-us/azure/cognitive-services/bing-visual-search/overview. (2019).Google Scholar
Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks. In Proc.computer Vision and Pattern Recognition. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). DOI: http://dx.doi.org/10.1109/CVPR.2014.222Google ScholarDigital Library
Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv (2018). https://arxiv.org/abs/1804.02767Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13--17, 2016. 1135--1144. DOI: http://dx.doi.org/10.1145/2939672.2939778Google ScholarDigital Library
Aruni RoyChowdhury, Prithvijit Chakrabarty, Ashish Singh, SouYoung Jin, Huaizu Jiang, Liangliang Cao, and Erik Learned-Miller. 2019. Automatic adaptation of object detectors to new domains using self-training. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: http://dx.doi.org/10.1109/CVPR.2019.00087Google ScholarCross Ref
Burr Settles. 2012. Active Learning. Morgan & Claypool Publishers. DOI: http://dx.doi.org/10.2200/S00429ED1V01Y201207AIM018Google ScholarCross Ref
Patrice Y Simard, Saleema Amershi, David M Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, and others. 2017. Machine Teaching: A New Paradigm for Building Machine Learning Systems. arXiv preprint arXiv:1707.06742 (2017). https://arxiv.org/abs/1707.06742Google Scholar
Simone Stumpf, Vidya Rajaram, Lida Li, Margaret Burnett, Thomas Dietterich, Erin Sullivan, Russell Drummond, and Jonathan Herlocker. 2007. Toward harnessing user feedback for machine learning. In Proceedings of the 12th international conference on Intelligent user interfaces. ACM, 82--91. DOI: http://dx.doi.org/10.1145/1216295.1216316Google ScholarDigital Library
Emily Wall, Soroush Ghorashi, and Gonzalo Ramos. 2019. Using Expert Patterns in Assisted Interactive Machine Learning: A Study in Machine Teaching. In Proceedings of the 17th IFIP TC 13 International Conference on Human-Computer Interaction. Springer International Publishing. DOI: http://dx.doi.org/10.1007/978-3-030-29387-1_34Google ScholarDigital Library
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems 27. Curran Associates, Inc., 3320--3328.Google Scholar
Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. 2015. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015). https://arxiv.org/abs/1506.06579Google Scholar
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833. DOI: http://dx.doi.org/10.1007/978-3-319-10590-1_53Google ScholarCross Ref
Jian Zhao, Michael Glueck, Petra Isenberg, Fanny Chevalier, and Azam Khan. 2017. Supporting handoff in asynchronous collaborative sensemaking using knowledge-transfer graphs. IEEE transactions on visualization and computer graphics 24, 1 (2017), 340--350.Google Scholar
X. Zhu. 2015. Machine Teaching: an Inverse Problem to Machine Learning and an Approach Toward Optimal Education. In The Twenty-Ninth AAAI Conference on Artificial Intelligence.Google ScholarCross Ref

Index Terms

A Teaching Language for Building Object Detection Models
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
2. Human-centered computing
  1. Interaction design
    1. Interaction design process and methods

Recommendations

Understanding and Supporting Knowledge Decomposition for Machine Teaching
DIS '20: Proceedings of the 2020 ACM Designing Interactive Systems Conference

Machine teaching (MT) is an emerging field that studies non-machine learning (ML) experts incrementally building semantic ML models in efficient ways. While MT focuses on the types of knowledge a human teacher provides a machine learner, not much is ...
Read More
Eliciting good teaching from humans for machine learners

We propose using computational teaching algorithms to improve human teaching for machine learners. We investigate example sequences produced naturally by human teachers and find that humans often do not spontaneously generate optimal teaching sequences ...
Read More
Studying Collaborative Interactive Machine Teaching in Image Classification
IUI '24: Proceedings of the 29th International Conference on Intelligent User Interfaces

While human-centered approaches to machine learning explore various human roles within the interaction loop, the notion of Interactive Machine Teaching (IMT) emerged with a focus on leveraging the teaching skills of humans as a teacher to build machine ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DIS '20: Proceedings of the 2020 ACM Designing Interactive Systems Conference
July 2020
2264 pages
ISBN:9781450369749
DOI:10.1145/3357236
General Chairs:
Ron Wakkary
Simon Fraser University, CA and Eindhoven University of Technology, NL
,
Kristina Andersen
Eindhoven University of Technology, NL
,
Program Chairs:
Will Odom
Simon Fraser University, CA
,
Audrey Desjardins
University of Washington, USA
,
Marianne Graves Petersen
Aarhus University, DK
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 July 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Honorable Mention
Author Tags
interactive machine learning
machine teaching
object detection
qualitative study
teaching language
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,158of4,684submissions,25%
Upcoming Conference
DIS '24

Sponsor:

sigchi

Designing Interactive Systems Conference

July 1 - 5, 2024

IT University of Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 188
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Teaching Language for Building Object Detection Models

DIS '20: Proceedings of the 2020 ACM Designing Interactive Systems Conference

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Understanding and Supporting Knowledge Decomposition for Machine Teaching

Eliciting good teaching from humans for machine learners

Studying Collaborative Interactive Machine Teaching in Image Classification