ABSTRACT
In many data mining applications, online labeling feedback is only available for examples which were predicted to belong to the positive class. Such applications includespam filtering in the case where users never checkemails marked "spam", document retrieval where users cannotgive relevance feedback on unretrieved documents,and online advertising where user behavior cannot beobserved for unshown advertisements. One-sided feedback can cripple the performance of classical mistake-driven online learners such as Perceptron. Previous work under the Apple Tasting framework showed how to transform standard online learners into successful learners from one sided feedback. However, we find in practice that this transformation may request more labels than necessary to achieve strong performance. In this paper,we employ two active learning methods which reduce the number of labels requested in practice. One method is the use of Label Efficient active learning. The other method,somewhat surprisingly, is the use of margin-based learners without modification, which we show combines implicit active learning and a greedy strategy to managing the exploration exploitation tradeoff. Experimental results show that these methods can be significantly more effective in practice than those using the Apple Tasting transformation, even on minority class problems.
- N. Abe and T. Kamba. A web marketing system with automatic pricing. Comput. Networks, 33(1--6): 775--788, 2000. Google ScholarDigital Library
- N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research, 7: 1205--1230, 2006. Google ScholarDigital Library
- D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Mach. Learn., 15(2):201--221, 1994. Google ScholarDigital Library
- G. V. Cormack. TREC 2006 spam track overview. In The Fifteenth Text REtrieval Conference (TREC 2006) Proceedings, 2006.Google Scholar
- G. V. Cormack and T. R. Lynam. TREC 2005 spam track overview. In The Fourteenth Text REtrieval Conference (TREC 2005) Proceedings, 2005.Google Scholar
- S. Dasgupta. Analysis of a greedy active learning strategy. NIPS: Advances in Neural Information Processing Systems, 2004.Google Scholar
- Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2--3):133----168, 1997. Google ScholarDigital Library
- D. Helmbold and S. Panizza. Some label efficient learning results. In COLT '97: Proceedings of the tenth annual conference on Computational learning theory, pages 218--230, 1997. Google ScholarDigital Library
- D. P. Helmbold, N. Littlestone, and P. M. Long. Apple tasting. Inf. Comput., 161(2): 85--139, 2000. Google ScholarDigital Library
- S. Hettich and S. D. Bay. The UCI KDD archive. Technical report, 1999.Google Scholar
- W. Krauth and M. Mézard. Learning algorithms with optimal stability in neural networks. Journal of Physics A, 20(11):745--752, 1987.Google ScholarCross Ref
- D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3--12, 1994. Google ScholarDigital Library
- N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Mach. Learn., 2(4): 285--318, 1988. Google ScholarDigital Library
- J. Platt. Sequenital minimal optimization: A fast algorithm for training support vector machines. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods -- Support Vector Learning. MIT Press, 1998.Google Scholar
- F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386--407, 1958.Google ScholarDigital Library
- N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In ICML '01: Proceedings of the Eighteenth International Conference on Machine Learning, pages 441--448, 2001. Google ScholarDigital Library
- G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Readings in information retrieval, pages 355--364, 1997. Google ScholarDigital Library
- G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In ICML'00: Proceedings of the Seventeenth International Conference on Machine Learning, pages 839--846, 2000. Google ScholarDigital Library
- B. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001. Google ScholarDigital Library
- D. Sculley and G. Wachman. Relaxed online support vector machines for spam filtering. In To appear in The Thirtieth Annual ACM SIGIR Conference Proceedings, 2007. Google ScholarDigital Library
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998. Google ScholarDigital Library
Index Terms
- Practical learning from one-sided feedback
Recommendations
Transductive Multilabel Learning via Label Set Propagation
The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Active learning using on-line algorithms
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningThis paper describes a new technique and analysis for using on-line learning algorithms to solve active learning problems. Our algorithm is called Active Vote, and it works by actively selecting instances that force several perturbed copies of an on-...
Cost‐effective multi‐instance multilabel active learning
AbstractMulti‐instance multi‐label (MIML) Active Learning (M2AL) aims to improve the learner while reducing the cost as much as possible by querying informative labels of complex bags composed of diverse instances. Existing M2AL solutions suffer high ...
Comments