skip to main content
10.1145/3357236.3395545acmconferencesArticle/Chapter ViewAbstractPublication PagesdisConference Proceedingsconference-collections
research-article
Honorable Mention

A Teaching Language for Building Object Detection Models

Published:03 July 2020Publication History

ABSTRACT

Object detection is a key application of machine learning. Currently, these detector models rely on deep networks that offer model builders limited agency over model construction, refinement and maintenance. Human-centered approaches to address these issues explore the exchange of knowledge between a human-in-the-loop and a learning system. This exchange, mediated through a teaching language, is often restricted to the specification of labels and constrains user expressiveness communicating other forms of knowledge to the system. We propose and assess an expressive teaching language for specifying object detectors which includes constructs such as concepts and relationships. From a formative study, we identified language building blocks and articulated design goals for creating interactive experiences in teaching object detection. We applied these goals through a design probe that highlighted further research questions and a set of design takeaways.

Skip Supplemental Material Section

Supplemental Material

disfp9273.mp4

mp4

64.9 MB

References

  1. 2020. Language. In The Cambridge Dictionary. Cambridge University Press. https://dictionary. cambridge.org/dictionary/english/languageGoogle ScholarGoogle Scholar
  2. Waleed Abdulla. 2017. Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. https://github.com/matterport/Mask_RCNN. (2017).Google ScholarGoogle Scholar
  3. Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access 6 (2018), 52138--52160. DOI:http://dx.doi.org/10.1109/ACCESS.2018.2870052Google ScholarGoogle ScholarCross RefCross Ref
  4. Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105--120. DOI: http://dx.doi.org/10.1609/aimag.v35i4.2513Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Steve Branson, Catherine Wah, Boris Babenko, Florian Schroff, Peter Welinder, Pietro Perona, and Serge Belongie. 2010. Visual Recognition with Humans in the Loop. In European Conference on Computer Vision (ECCV) (2010-01-01). Heraklion, Crete. DOI: http://dx.doi.org/10.1007/978-3-642-15561-1_32Google ScholarGoogle ScholarCross RefCross Ref
  6. Anind K Dey, Stephanie Rosenthal, and Manuela Veloso. 2009. Using interaction to improve intelligence: how intelligent systems should ask users for input. In Workshop on Intelligence and Interaction: IJCAI.Google ScholarGoogle Scholar
  7. John J Dudley and Per Ola Kristensson. 2018. A Review of User Interface Design for Interactive Machine Learning. ACM Transactions on Interactive Intelligent Systems (TiiS) 8, 2 (2018), 8. DOI: http://dx.doi.org/10.1145/3185517Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jerry Alan Fails and Dan R. Olsen, Jr. 2003. Interactive Machine Learning. In Proceedings of the 8th International Conference on Intelligent User Interfaces (IUI '03). ACM, New York, NY, USA, 39--45. DOI: http://dx.doi.org/10.1145/604045.604056Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Rebecca Fiebrink and Perry R. Cook. 2010. The wekinator: a system for real-time, interactive machine learning in music. In Proceedings of The Eleventh International Society for Music Information Retrieval Conference (ISMIR).Google ScholarGoogle Scholar
  10. James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: interactive concept learning in image search. In Proceedings of the sigchi conference on human factors in computing systems. ACM, 29--38. DOI: http://dx.doi.org/10.1145/1357054.1357061Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE international conference on computer vision. 2961--2969. DOI: http://dx.doi.org/10.1109/ICCV.2017.322Google ScholarGoogle ScholarCross RefCross Ref
  12. Andreas Holzinger, Markus Plass, Michael Kickmeier-Rust, Katharina Holzinger, Gloria Cerasela Cri¸ san, Camelia-M. Pintea, and Vasile Palade. 2019. Interactive machine learning: experimental evidence for the human in the algorithmic loop. Applied Intelligence 49, 7 (01 Jul 2019), 2401--2414. DOI: http://dx.doi.org/10.1007/s10489-018-1361-5Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. 2017. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2901--2910. DOI: http://dx.doi.org/10.1109/CVPR.2017.215Google ScholarGoogle ScholarCross RefCross Ref
  14. Tim Kraska. 2018. Northstar: An Interactive Data Science System. Proc. VLDB Endow. 11, 12 (Aug. 2018), 2150--2164. DOI: http://dx.doi.org/10.14778/3229863.3240493Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael Bernstein, and Li Fei-Fei. 2016. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. https://arxiv.org/abs/1602.07332Google ScholarGoogle Scholar
  16. Todd Kulesza, Simone Stumpf, Margaret Burnett, and Irwin Kwan. 2012. Tell me more?: the effects of mental model soundness on personalizing an intelligent agent. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1--10. DOI: http://dx.doi.org/10.1145/2207676.2207678Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Todd Kulesza, Simone Stumpf, Margaret Burnett, Weng-Keen Wong, Yann Riche, Travis Moore, Ian Oberst, Amber Shinsel, and Kevin McIntosh. 2010. Explanatory debugging: Supporting end-user debugging of machine-learned programs. In 2010 IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, 41--48. DOI: http://dx.doi.org/10.1109/VLHCC.2010.15Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European conference on computer vision. Springer, 740--755. DOI: http://dx.doi.org/10.1007/978-3-319-10602-1_48Google ScholarGoogle ScholarCross RefCross Ref
  19. Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765--4774.Google ScholarGoogle Scholar
  20. Microsoft. 2019a. What is Azure Custom Vision? https://docs.microsoft.com/en-us/azure/cognitive-services/custom-vision-service/home. (2019).Google ScholarGoogle Scholar
  21. Microsoft. 2019b. What is the Bing Image Search API? https://docs.microsoft.com/en-gb/azure/cognitive-services/bing-image-search/overview. (2019).Google ScholarGoogle Scholar
  22. Microsoft. 2019c. What is the Bing Visual Search API? https://docs.microsoft.com/en-us/azure/cognitive-services/bing-visual-search/overview. (2019).Google ScholarGoogle Scholar
  23. Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks. In Proc.computer Vision and Pattern Recognition. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). DOI: http://dx.doi.org/10.1109/CVPR.2014.222Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv (2018). https://arxiv.org/abs/1804.02767Google ScholarGoogle Scholar
  25. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13--17, 2016. 1135--1144. DOI: http://dx.doi.org/10.1145/2939672.2939778Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Aruni RoyChowdhury, Prithvijit Chakrabarty, Ashish Singh, SouYoung Jin, Huaizu Jiang, Liangliang Cao, and Erik Learned-Miller. 2019. Automatic adaptation of object detectors to new domains using self-training. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: http://dx.doi.org/10.1109/CVPR.2019.00087Google ScholarGoogle ScholarCross RefCross Ref
  27. Burr Settles. 2012. Active Learning. Morgan & Claypool Publishers. DOI: http://dx.doi.org/10.2200/S00429ED1V01Y201207AIM018Google ScholarGoogle ScholarCross RefCross Ref
  28. Patrice Y Simard, Saleema Amershi, David M Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, and others. 2017. Machine Teaching: A New Paradigm for Building Machine Learning Systems. arXiv preprint arXiv:1707.06742 (2017). https://arxiv.org/abs/1707.06742Google ScholarGoogle Scholar
  29. Simone Stumpf, Vidya Rajaram, Lida Li, Margaret Burnett, Thomas Dietterich, Erin Sullivan, Russell Drummond, and Jonathan Herlocker. 2007. Toward harnessing user feedback for machine learning. In Proceedings of the 12th international conference on Intelligent user interfaces. ACM, 82--91. DOI: http://dx.doi.org/10.1145/1216295.1216316Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Emily Wall, Soroush Ghorashi, and Gonzalo Ramos. 2019. Using Expert Patterns in Assisted Interactive Machine Learning: A Study in Machine Teaching. In Proceedings of the 17th IFIP TC 13 International Conference on Human-Computer Interaction. Springer International Publishing. DOI: http://dx.doi.org/10.1007/978-3-030-29387-1_34Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems 27. Curran Associates, Inc., 3320--3328.Google ScholarGoogle Scholar
  32. Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. 2015. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015). https://arxiv.org/abs/1506.06579Google ScholarGoogle Scholar
  33. Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833. DOI: http://dx.doi.org/10.1007/978-3-319-10590-1_53Google ScholarGoogle ScholarCross RefCross Ref
  34. Jian Zhao, Michael Glueck, Petra Isenberg, Fanny Chevalier, and Azam Khan. 2017. Supporting handoff in asynchronous collaborative sensemaking using knowledge-transfer graphs. IEEE transactions on visualization and computer graphics 24, 1 (2017), 340--350.Google ScholarGoogle Scholar
  35. X. Zhu. 2015. Machine Teaching: an Inverse Problem to Machine Learning and an Approach Toward Optimal Education. In The Twenty-Ninth AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Teaching Language for Building Object Detection Models

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DIS '20: Proceedings of the 2020 ACM Designing Interactive Systems Conference
        July 2020
        2264 pages
        ISBN:9781450369749
        DOI:10.1145/3357236

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 July 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,158of4,684submissions,25%

        Upcoming Conference

        DIS '24
        Designing Interactive Systems Conference
        July 1 - 5, 2024
        IT University of Copenhagen , Denmark

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader