skip to main content
10.1145/2556288.2557238acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Structured labeling for facilitating concept evolution in machine learning

Published:26 April 2014Publication History

ABSTRACT

Labeling data is a seemingly simple task required for training many machine learning systems, but is actually fraught with problems. This paper introduces the notion of concept evolution, the changing nature of a person's underlying concept (the abstract notion of the target class a person is labeling for, e.g., spam email, travel related web pages) which can result in inconsistent labels and thus be detrimental to machine learning. We introduce two structured labeling solutions, a novel technique we propose for helping people define and refine their concept in a consistent manner as they label. Through a series of five experiments, including a controlled lab study, we illustrate the impact and dynamics of concept evolution in practice and show that structured labeling helps people label more consistently in the presence of concept evolution than traditional labeling.

References

  1. Amershi, S., Cakmak, M., Knox, W. B., & Kulesza, T. Power to the people: The role of humans in interactive machine learning. AI Magazine (under review).Google ScholarGoogle Scholar
  2. Amershi, S., Lee, B., Kapoor, A., Mahajan, R., & Christian, B. CueT: Human-guided fast and accurate network alarm triage. In Proc. CHI, ACM (2011), 157--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Basu, S., Fisher, D., Drucker, S. M., & Lu, H. Assisting users with clustering tasks by combining metric learning and classification. In Proc. AAAI (2010), 394--400.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bennett, P. N., Chickering, D. M., & Mityagin, A. Learning consensus opinion: mining data from a labeling game. In Proc. of WWW (2009), 121--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Billsus, D., & Pazzani, M. J. A hybrid user model for news story classification. In Proc. UM (1999), 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Blackwell, A. F. First steps in programming: A rationale for attention investment models. In Proc. HCC, IEEE (2002), 2--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Borlund, P. The concept of relevance in IR. Journal of the American Society for information Science and Technology 54, 10 (2003), 913--925. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Brain, D., & Webb, G. On the effect of data set size on bias and variance in classification learning. In D. Richards, G. Beydoun, A. Hoffmann, & P. Compton (Eds.), Proc. of the Fourth Australian Knowledge Acquisition Workshop (1999), 117--128.Google ScholarGoogle Scholar
  9. Brodley, C. E., & Friedl, M. A. Identifying mislabeled training data. Journal of Artificial Intelligence Research 11 (1999), 131--167.Google ScholarGoogle ScholarCross RefCross Ref
  10. Bshouty, N. H., Eiron, N., & Kushilevitz, E. PAC learning with nasty noise. Theoretical Computer Science 288, 2 (2002), 255--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Carterette, B., Bennett, P. N., Chickering, D. M., & Dumais, S. T. Here or there. Advances in Information Retrieval (2008), 16--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Conway, D., & White, J. M. Machine Learning for Email: Spam Filtering and Priority Inbox. O'Reilly (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cunningham, P., Nowlan, N., Delany, S. J., & Haahr, M. A case-based approach to spam filtering that can track concept drift. The ICCBR 3 (2003).Google ScholarGoogle Scholar
  14. Czerwinski, M., Dumais, S., Robertson, G., Dziadosz, S., Tiernan, S., & Van Dantzich, M. Visualizing implicit queries for information management and retrieval. In Proc.CHI, ACM (1999), 560--567. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gabrilovich, E., Dumais, E., & Horvitz, E. NewsJunkie: Providing personalized newsfeeds via analysis of information novelty. In Proc. WWW (2004), 482--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Google. Search quality rating guidelines. Online: http://google.com/insidesearch/howsearchworks/assets/searchqualityevaluatorguidelines.pdf (2012).Google ScholarGoogle Scholar
  17. Hubert, L., & Arabie, P. Comparing partitions. Journal of classification 2, 1 (1985), 193--218.Google ScholarGoogle Scholar
  18. Kotsiantis, S. B., Zaharakis, I. D., & Pintelas, P. E. Supervised machine learning: A review of classification techniques. Informatica 31 (2007), 249--268.Google ScholarGoogle Scholar
  19. Law, E., Settles, B., & Mitchell, T. Learning to tag using noisy labels. In Proc. ECML (2010), 1--29.Google ScholarGoogle Scholar
  20. McGee, M. A look inside Bing's human search rater guidelines. Online: http://searchengineland.com/bing-search-quality-rating-guidelines-130592 (2012).Google ScholarGoogle Scholar
  21. Paul, S. A., & Morris, M. R. Sensemaking in collaborative web search. Human-Computer Interaction 26, 1-2 (2011), 72--122.Google ScholarGoogle ScholarCross RefCross Ref
  22. Rajaraman, A. & Ullman, J. D. "Data Mining". Mining of Massive Datasets (2011), 1--17.Google ScholarGoogle Scholar
  23. Russell, D. M., Stefik, M. J., Pirolli, P., & Card, S. K. The cost structure of sensemaking. In Proc. of INTERACT and CHI, ACM (1993), 269--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Robertson, G., Czerwinski, M., Larson, K., Robbins, D. C., Thiel, D., & Van Dantzich, M. Data mountain: using spatial memory for document management. In Proc. UIST, ACM (1998), 153--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Santos, J. M., & Embrechts, M. On the use of the adjusted rand index as a metric for evaluating supervised classification. In Artificial Neural Networks - ICANN (2009), 175--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sheng, V. S., Provost, F., & Ipeirotis, P. G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proc. KDD (2008), 614--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Stanley, K. O. Learning concept drift with a committee of decision trees. Tech. Report UT-AI-TR-03-302, University of Texas at Austin (2003).Google ScholarGoogle Scholar
  28. Teevan, J., Cutrell, E., Fisher, D., Drucker, S. M., Ramos, G., André, P., & Hu, C. Visual snippets: summarizing web pages for search and revisitation. In Proc. CHI, ACM (2009), 2023--2032. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Tsymbal, A. The problem of concept drift: definitions and related work. Computer Science Dept., Trinity College Dublin (2004).Google ScholarGoogle Scholar
  30. Valiant, L. G. Learning disjunctions of conjunctions. In IJCAI (1985), 560--566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Westergren, T. The music genome project. Online: http://pandora.com/mgp (2007).Google ScholarGoogle Scholar
  32. Whittaker, S., & Hirschberg, J. The character, value, and management of personal paper archives. ACM TOCHI 8, 2 (2001), 150--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Widmer, G., & Kubat, M. Learning in the presence of concept drift and hidden contexts. Machine learning 23, 1 (1996), 69--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yih, W. & Jiang, N. Similarity models for ad relevance measures. In MLOAD - NIPS Workshop on online advertising (2010).Google ScholarGoogle Scholar
  35. Yoshii, K., Goto, M., Komatani, K., Ogata, T., & Okuno, H. G. An efficient hybrid music recommender system using an incrementally trainable probabilistic generative model. IEEE Transactions on Audio, Speech, and Language Processing 16, 2 (2008), 435--447. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Structured labeling for facilitating concept evolution in machine learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
      April 2014
      4206 pages
      ISBN:9781450324731
      DOI:10.1145/2556288

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 April 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '14 Paper Acceptance Rate465of2,043submissions,23%Overall Acceptance Rate6,199of26,314submissions,24%

      Upcoming Conference

      CHI '24
      CHI Conference on Human Factors in Computing Systems
      May 11 - 16, 2024
      Honolulu , HI , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader