research-article

Structured labeling for facilitating concept evolution in machine learning

Authors:
Todd Kulesza

Oregon State University, Corvallis, OR, USA

Oregon State University, Corvallis, OR, USA
View Profile

,
Saleema Amershi

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Rich Caruana

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Danyel Fisher

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Denis Charles

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsApril 2014Pages 3075–3084https://doi.org/10.1145/2556288.2557238

Published:26 April 2014Publication History

CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Pages 3075–3084

ABSTRACT

Labeling data is a seemingly simple task required for training many machine learning systems, but is actually fraught with problems. This paper introduces the notion of concept evolution, the changing nature of a person's underlying concept (the abstract notion of the target class a person is labeling for, e.g., spam email, travel related web pages) which can result in inconsistent labels and thus be detrimental to machine learning. We introduce two structured labeling solutions, a novel technique we propose for helping people define and refine their concept in a consistent manner as they label. Through a series of five experiments, including a controlled lab study, we illustrate the impact and dynamics of concept evolution in practice and show that structured labeling helps people label more consistently in the presence of concept evolution than traditional labeling.

References

Amershi, S., Cakmak, M., Knox, W. B., & Kulesza, T. Power to the people: The role of humans in interactive machine learning. AI Magazine (under review).Google Scholar
Amershi, S., Lee, B., Kapoor, A., Mahajan, R., & Christian, B. CueT: Human-guided fast and accurate network alarm triage. In Proc. CHI, ACM (2011), 157--166. Google ScholarDigital Library
Basu, S., Fisher, D., Drucker, S. M., & Lu, H. Assisting users with clustering tasks by combining metric learning and classification. In Proc. AAAI (2010), 394--400.Google ScholarCross Ref
Bennett, P. N., Chickering, D. M., & Mityagin, A. Learning consensus opinion: mining data from a labeling game. In Proc. of WWW (2009), 121--130. Google ScholarDigital Library
Billsus, D., & Pazzani, M. J. A hybrid user model for news story classification. In Proc. UM (1999), 99--108. Google ScholarDigital Library
Blackwell, A. F. First steps in programming: A rationale for attention investment models. In Proc. HCC, IEEE (2002), 2--10. Google ScholarDigital Library
Borlund, P. The concept of relevance in IR. Journal of the American Society for information Science and Technology 54, 10 (2003), 913--925. Google ScholarDigital Library
Brain, D., & Webb, G. On the effect of data set size on bias and variance in classification learning. In D. Richards, G. Beydoun, A. Hoffmann, & P. Compton (Eds.), Proc. of the Fourth Australian Knowledge Acquisition Workshop (1999), 117--128.Google Scholar
Brodley, C. E., & Friedl, M. A. Identifying mislabeled training data. Journal of Artificial Intelligence Research 11 (1999), 131--167.Google ScholarCross Ref
Bshouty, N. H., Eiron, N., & Kushilevitz, E. PAC learning with nasty noise. Theoretical Computer Science 288, 2 (2002), 255--275. Google ScholarDigital Library
Carterette, B., Bennett, P. N., Chickering, D. M., & Dumais, S. T. Here or there. Advances in Information Retrieval (2008), 16--27. Google ScholarDigital Library
Conway, D., & White, J. M. Machine Learning for Email: Spam Filtering and Priority Inbox. O'Reilly (2011). Google ScholarDigital Library
Cunningham, P., Nowlan, N., Delany, S. J., & Haahr, M. A case-based approach to spam filtering that can track concept drift. The ICCBR 3 (2003).Google Scholar
Czerwinski, M., Dumais, S., Robertson, G., Dziadosz, S., Tiernan, S., & Van Dantzich, M. Visualizing implicit queries for information management and retrieval. In Proc.CHI, ACM (1999), 560--567. Google ScholarDigital Library
Gabrilovich, E., Dumais, E., & Horvitz, E. NewsJunkie: Providing personalized newsfeeds via analysis of information novelty. In Proc. WWW (2004), 482--490. Google ScholarDigital Library
Google. Search quality rating guidelines. Online: http://google.com/insidesearch/howsearchworks/assets/searchqualityevaluatorguidelines.pdf (2012).Google Scholar
Hubert, L., & Arabie, P. Comparing partitions. Journal of classification 2, 1 (1985), 193--218.Google Scholar
Kotsiantis, S. B., Zaharakis, I. D., & Pintelas, P. E. Supervised machine learning: A review of classification techniques. Informatica 31 (2007), 249--268.Google Scholar
Law, E., Settles, B., & Mitchell, T. Learning to tag using noisy labels. In Proc. ECML (2010), 1--29.Google Scholar
McGee, M. A look inside Bing's human search rater guidelines. Online: http://searchengineland.com/bing-search-quality-rating-guidelines-130592 (2012).Google Scholar
Paul, S. A., & Morris, M. R. Sensemaking in collaborative web search. Human-Computer Interaction 26, 1-2 (2011), 72--122.Google ScholarCross Ref
Rajaraman, A. & Ullman, J. D. "Data Mining". Mining of Massive Datasets (2011), 1--17.Google Scholar
Russell, D. M., Stefik, M. J., Pirolli, P., & Card, S. K. The cost structure of sensemaking. In Proc. of INTERACT and CHI, ACM (1993), 269--276. Google ScholarDigital Library
Robertson, G., Czerwinski, M., Larson, K., Robbins, D. C., Thiel, D., & Van Dantzich, M. Data mountain: using spatial memory for document management. In Proc. UIST, ACM (1998), 153--162. Google ScholarDigital Library
Santos, J. M., & Embrechts, M. On the use of the adjusted rand index as a metric for evaluating supervised classification. In Artificial Neural Networks - ICANN (2009), 175--184. Google ScholarDigital Library
Sheng, V. S., Provost, F., & Ipeirotis, P. G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proc. KDD (2008), 614--622. Google ScholarDigital Library
Stanley, K. O. Learning concept drift with a committee of decision trees. Tech. Report UT-AI-TR-03-302, University of Texas at Austin (2003).Google Scholar
Teevan, J., Cutrell, E., Fisher, D., Drucker, S. M., Ramos, G., André, P., & Hu, C. Visual snippets: summarizing web pages for search and revisitation. In Proc. CHI, ACM (2009), 2023--2032. Google ScholarDigital Library
Tsymbal, A. The problem of concept drift: definitions and related work. Computer Science Dept., Trinity College Dublin (2004).Google Scholar
Valiant, L. G. Learning disjunctions of conjunctions. In IJCAI (1985), 560--566. Google ScholarDigital Library
Westergren, T. The music genome project. Online: http://pandora.com/mgp (2007).Google Scholar
Whittaker, S., & Hirschberg, J. The character, value, and management of personal paper archives. ACM TOCHI 8, 2 (2001), 150--170. Google ScholarDigital Library
Widmer, G., & Kubat, M. Learning in the presence of concept drift and hidden contexts. Machine learning 23, 1 (1996), 69--101. Google ScholarDigital Library
Yih, W. & Jiang, N. Similarity models for ad relevance measures. In MLOAD - NIPS Workshop on online advertising (2010).Google Scholar
Yoshii, K., Goto, M., Komatani, K., Ogata, T., & Okuno, H. G. An efficient hybrid music recommender system using an incrementally trainable probabilistic generative model. IEEE Transactions on Audio, Speech, and Language Processing 16, 2 (2008), 435--447. Google ScholarDigital Library

Index Terms

Structured labeling for facilitating concept evolution in machine learning
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Concept evolution detection based on noise reduction soft boundary
Abstract
Concept evolution detection is an important but difficult task in streaming data analysis, and further the noise may seriously limit the detection performance gains. This paper proposed a concept evolution detection method based on noise ...
Highlights
- Noise reduction soft boundary is proposed and then the category distribution can be described reasonably.
- The negative effect of noise sample located near category boundary will be reduced effectively.
- The proposed can effectively ...
Read More
Rapidly Labeling and Tracking Dynamically Evolving Concepts in Data Streams
ICDMW '13: Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops

Data mining research has produced a significant repertoire of algorithms to predict the classification of data instances with reasonable accuracy. However, data quantity and availability is continuing to rapidly expand such that we no longer have fixed ...
Read More
Concept evolution analysis based on the Dissipative Structure of Concept Semantic Space

In the domain of text semantic processing, concept semantic evolution is a common phenomenon involved in the lasting process of a concepts formation and development at different stages, which leads concept evolution analysis to be difficult in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
April 2014
4206 pages
ISBN:9781450324731
DOI:10.1145/2556288
General Chairs:
Matt Jones
Swansea University, Wales, UK
,
Philippe Palanque
Université Paul Sabatier, France
,
Program Chairs:
Albrecht Schmidt
University of Stuttgart, Germany
,
Tovi Grossman
Autodesk Research, Canada
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 April 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
concept evolution
interactive machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '14 Paper Acceptance Rate465of2,043submissions,23%Overall Acceptance Rate6,199of26,314submissions,24%
More
Upcoming Conference
CHI '24

Sponsor:

sigchi

CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

Honolulu , HI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 71
  Total Citations
  View Citations
- 1,196
  Total Downloads
- Downloads (Last 12 months)105
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Structured labeling for facilitating concept evolution in machine learning

CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Concept evolution detection based on noise reduction soft boundary

Rapidly Labeling and Tracking Dynamically Evolving Concepts in Data Streams

Concept evolution analysis based on the Dissipative Structure of Concept Semantic Space